ANTLR grammar for XKB, and Relax NG schema (draft)

I com­pleted the ANTLRv3 gram­mar for sym­bols/ con­fig­ur­a­tion files of XKB. The gram­mar can parse and create the abstract syntax tree (AST) for all key­board lay­outs in xkeyboard-​config.

ANTLRv3 helps you create pars­ers for domain spe­cific lan­guages (DSL), an example of which is the con­fig­ur­a­tion files in XKB.

Having the ANTLRv3 gram­mar for a con­fig­ur­a­tion file allows to gen­er­ate code in any of the sup­por­ted target lag­nuages (C, C++, Java, Python, C#, etc), so that you easily include a parser that reads those files. Essen­tially you avoid using custom pars­ers which can be dif­fi­cult to main­tain, or pars­ers that were gen­er­ated with flex/bison.

On a sim­ilar note, here is the gram­mar to parse Com­pose files (such as en_US.UTF-8/Compose.pre). I am not going to be using in the pro­ject for now, but it was fun writ­ing it. The Python target takes 18s to create the AST for the >5500 lines of the en_US.UTF-8 com­pose file, on a typ­ical modern laptop.

I am also work­ing on cre­at­ing a RelaxNG schema for the XKB con­fig­ur­a­tion files (those under sym­bols/). There is a draft avail­able, which needs much more work.The Relax NG book by Eric van de Vlist is very useful here.

The imme­di­ate goal is to use the code gen­er­ated by ANTLR to parse the XKB files and create XML files based on the Relax NG schema. I am using Python, and there are a few options; the libxml2 bind­ings for Python, and PyXML. The latter has more vis­ible doc­u­ment­a­tion, but I think that I should better be using the former.

Update: lxml appears to be the nice way to use libxml2 (instead of using dir­ectly libxml2).

Discussion Area - Leave a Comment