Parsing XKB files with antlr

antlr (well, antlr3) is an amazing tool that replaces lex/flex, yacc/bison.

One would use antlr3 if they want to deal with Domain-Specific Languages (DSL), an example of which are the text configuration files.

In our case, we use antlr3 to parse some of the XKB configuration files, those found in /etc/X11/xkb/symbols/??.

Our aim is to be able to easily read and write those configuration files. Of course, once we have them read, we do all sorts of processing.

The stable version of antlr3 is 3.0.1, which happened to give lots of internal errors. It has not been very useful, so I tried a few times the latest beta version 3.1b, and eventually managed to get it to work. If I am not mistaken, 3.1 stable should be announced in a few days.

When using antlr, you have the choice of several target languages, such as Java, C, C++ and Python. I am using the Python target, and the latest version that is available from the antlr3 repository.

Here is the tree of the gb layout file,

tree = (SECTION (MAPTYPE (MAPOPTIONS partial default alphanumeric_keys xkb_symbols) (MAPNAME “basic”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 quotedbl twosuperior oneeighth)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior sterling)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign onequarter)) (TOKEN_KEY (KEYCODEX AC11) (KEYSYMS apostrophe at dead_circumflex dead_caron)) (TOKEN_KEY (KEYCODEX TLDE) (KEYSYMS grave notsign bar bar)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign asciitilde dead_grave dead_breve)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar bar brokenbar)) (TOKEN_INCLUDE “level3(ralt_switch_multikey)”))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “intl”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom – International (with dead keys)”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 dead_diaeresis twosuperior onehalf)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior onethird)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign onequarter)) (TOKEN_KEY (KEYCODEX AE06) (KEYSYMS 6 dead_circumflex NoSymbol onesixth)) (TOKEN_KEY (KEYCODEX AC11) (KEYSYMS dead_acute at apostrophe bar)) (TOKEN_KEY (KEYCODEX TLDE) (KEYSYMS dead_grave notsign bar bar)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign dead_tilde bar bar)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar bar bar)) (TOKEN_INCLUDE “level3(ralt_switch)”))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “dvorak”)) (MAPMATERIAL (TOKEN_INCLUDE “us(dvorak)”) (TOKEN_NAME Group1 (VALUE “United Kingdom – Dvorak”)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign asciitilde)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 quotedbl twosuperior NoSymbol)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior NoSymbol)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign NoSymbol)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar)) (TOKEN_KEY (KEYCODEX AD01) (KEYSYMS apostrophe at)))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “mac”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom – Macintosh”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 at EuroSign)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling numbersign)) (TOKEN_INCLUDE “level3(ralt_switch)”)))

When traversing the tree, we can then pretty-print the layout at wish:

partial default alphanumeric_keys xkb_symbols “basic” {
name[Group1] = “United Kingdom”;
include “latin”
include “level3(ralt_switch_multikey)”
key <AE02> = { [ 2 , quotedbl , twosuperior , oneeighth ] };
key <AE03> = { [ 3 , sterling , threesuperior , sterling ] };
key <AE04> = { [ 4 , dollar , EuroSign , onequarter ] };
key <AC11> = { [ apostrophe , at , dead_circumflex , dead_caron ] };
key <TLDE> = { [ grave , notsign , bar , bar ] };
key <BKSL> = { [ numbersign , asciitilde , dead_grave , dead_breve ] };
key <LSGT> = { [ backslash , bar , bar , brokenbar ] };
};
… snip …

The code is currently hosted at code.google.com (keyboardlayouteditor) and I intend to move it shortly to FDO.

Permanent link to this article: https://blog.simos.info/parsing-xkb-files-with-antlr/

1 ping

  1. […] In the previous post, we talked about the ANTLR grammar that parses the XKB layout files. […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.