Parsing XKB files with antlr

antlr (well, antlr3) is an amaz­ing tool that replaces lex/flex, yacc/bison.

One would use antlr3 if they want to deal with Domain-​Specific Lan­guages (DSL), an example of which are the text con­fig­ur­a­tion files.

In our case, we use antlr3 to parse some of the XKB con­fig­ur­a­tion files, those found in /etc/X11/xkb/symbols/??.

Our aim is to be able to easily read and write those con­fig­ur­a­tion files. Of course, once we have them read, we do all sorts of processing.

The stable ver­sion of antlr3 is 3.0.1, which happened to give lots of internal errors. It has not been very useful, so I tried a few times the latest beta ver­sion 3.1b, and even­tu­ally man­aged to get it to work. If I am not mis­taken, 3.1 stable should be announced in a few days.

When using antlr, you have the choice of sev­eral target lan­guages, such as Java, C, C++ and Python. I am using the Python target, and the latest ver­sion that is avail­able from the antlr3 repository.

Here is the tree of the gb layout file,

tree = (SEC­TION (MAP­TYPE (MAP­OP­TIONS par­tial default alphanumeric_​keys xkb_​symbols) (MAP­NAME “basic”)) (MAP­MA­TER­IAL (TOKEN_​INCLUDE “latin”) (TOKEN_​NAME Group1 (VALUE “United Kingdom”)) (TOKEN_​KEY (KEY­CO­DEX AE02) (KEY­SYMS 2 quotedbl twosu­per­ior oneeighth)) (TOKEN_​KEY (KEY­CO­DEX AE03) (KEY­SYMS 3 ster­ling three­su­per­ior ster­ling)) (TOKEN_​KEY (KEY­CO­DEX AE04) (KEY­SYMS 4 dollar EuroSign onequarter)) (TOKEN_​KEY (KEY­CO­DEX AC11) (KEY­SYMS apo­strophe at dead_​circumflex dead_​caron)) (TOKEN_​KEY (KEY­CO­DEX TLDE) (KEY­SYMS grave not­sign bar bar)) (TOKEN_​KEY (KEY­CO­DEX BKSL) (KEY­SYMS num­ber­sign asciitilde dead_​grave dead_​breve)) (TOKEN_​KEY (KEY­CO­DEX LSGT) (KEY­SYMS back­slash bar bar broken­bar)) (TOKEN_​INCLUDE “level3(ralt_switch_multikey)”))) (SEC­TION (MAP­TYPE (MAP­OP­TIONS par­tial alphanumeric_​keys xkb_​symbols) (MAP­NAME “intl”)) (MAP­MA­TER­IAL (TOKEN_​INCLUDE “latin”) (TOKEN_​NAME Group1 (VALUE “United King­dom - Inter­na­tional (with dead keys)”)) (TOKEN_​KEY (KEY­CO­DEX AE02) (KEY­SYMS 2 dead_​diaeresis twosu­per­ior one­half)) (TOKEN_​KEY (KEY­CO­DEX AE03) (KEY­SYMS 3 ster­ling three­su­per­ior onethird)) (TOKEN_​KEY (KEY­CO­DEX AE04) (KEY­SYMS 4 dollar EuroSign onequarter)) (TOKEN_​KEY (KEY­CO­DEX AE06) (KEY­SYMS 6 dead_​circumflex NoSym­bol onesixth)) (TOKEN_​KEY (KEY­CO­DEX AC11) (KEY­SYMS dead_​acute at apo­strophe bar)) (TOKEN_​KEY (KEY­CO­DEX TLDE) (KEY­SYMS dead_​grave not­sign bar bar)) (TOKEN_​KEY (KEY­CO­DEX BKSL) (KEY­SYMS num­ber­sign dead_​tilde bar bar)) (TOKEN_​KEY (KEY­CO­DEX LSGT) (KEY­SYMS back­slash bar bar bar)) (TOKEN_​INCLUDE “level3(ralt_switch)”))) (SEC­TION (MAP­TYPE (MAP­OP­TIONS par­tial alphanumeric_​keys xkb_​symbols) (MAP­NAME “dvorak”)) (MAP­MA­TER­IAL (TOKEN_​INCLUDE “us(dvorak)”) (TOKEN_​NAME Group1 (VALUE “United King­dom - Dvorak”)) (TOKEN_​KEY (KEY­CO­DEX BKSL) (KEY­SYMS num­ber­sign asciitilde)) (TOKEN_​KEY (KEY­CO­DEX AE02) (KEY­SYMS 2 quotedbl twosu­per­ior NoSym­bol)) (TOKEN_​KEY (KEY­CO­DEX AE03) (KEY­SYMS 3 ster­ling three­su­per­ior NoSym­bol)) (TOKEN_​KEY (KEY­CO­DEX AE04) (KEY­SYMS 4 dollar EuroSign NoSym­bol)) (TOKEN_​KEY (KEY­CO­DEX LSGT) (KEY­SYMS back­slash bar)) (TOKEN_​KEY (KEY­CO­DEX AD01) (KEY­SYMS apo­strophe at)))) (SEC­TION (MAP­TYPE (MAP­OP­TIONS par­tial alphanumeric_​keys xkb_​symbols) (MAP­NAME “mac”)) (MAP­MA­TER­IAL (TOKEN_​INCLUDE “latin”) (TOKEN_​NAME Group1 (VALUE “United King­dom - Macintosh”)) (TOKEN_​KEY (KEY­CO­DEX AE02) (KEY­SYMS 2 at EuroSign)) (TOKEN_​KEY (KEY­CO­DEX AE03) (KEY­SYMS 3 ster­ling num­ber­sign)) (TOKEN_​INCLUDE “level3(ralt_switch)”)))

When tra­vers­ing the tree, we can then pretty-​print the layout at wish:

par­tial default alphanumeric_​keys xkb_​symbols “basic” {
name[Group1] = “United Kingdom”;
include “latin”
include “level3(ralt_switch_multikey)”
key <AE02> = { [ 2 , quotedbl , twosu­per­ior , oneeighth ] };
key <AE03> = { [ 3 , ster­ling , three­su­per­ior , ster­ling ] };
key <AE04> = { [ 4 , dollar , EuroSign , onequarter ] };
key <AC11> = { [ apo­strophe , at , dead_​circumflex , dead_​caron ] };
key <TLDE> = { [ grave , not­sign , bar , bar ] };
key <BKSL> = { [ num­ber­sign , asciitilde , dead_​grave , dead_​breve ] };
key <LSGT> = { [ back­slash , bar , bar , broken­bar ] };
};
… snip …

The code is cur­rently hosted at code.​google.com (key­board­lay­outed­itor) and I intend to move it shortly to FDO.

One Response to “Parsing XKB files with antlr”

  1. [...] In the pre­vi­ous post, we talked about the ANTLR gram­mar that parses the XKB layout files. [...]

Discussion Area - Leave a Comment