Converting between XKB and XML

I com­pleted the stage that takes key­board layout files from XKB (X.Org) and con­verts them to XML doc­u­ments, based on a key­board layout Relax NG schema. Then, these XML doc­u­ments can also be con­ver­ted back to key­board layout files.

Here is an ima­gin­ary example of a key­board layout file.

// Keyboard layout for the Zzurope country (code: zz).
// Yeah.

partial alphanumeric_keys alternate_group hidden
xkb_symbols "bare" {
   key <AE01> { [        1, exclam,      onesuperior,  exclamdown      ] };
};

partial alphanumeric_keys alternate_group
xkb_symbols "basic" {
   name[Group1] = "ZZurope";

   include "zz(bare)"

   key <AD04> { [        r, R,           ediaeresis,   Ediaeresis      ] };
   key <AC07> { [        j, J,           idiaeresis,   Idiaeresis      ] };
   key <AB02> { [        x, X,           oe,           OE              ] };
   key <AB04> { [        v, V,           registered,   registered      ] };
};

partial alphanumeric_keys alternate_group
xkb_symbols "extended" {
    include "zz(basic)"
    name[Group1] = "ZZurope Extended";
    key.type = "THREE_LEVEL"; // We use three levels.
    override key <AD01> {   type[Group1] = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC",
[ U1C9, U1C8], [  any,   U1C7 ]   }; // q
    override key <AD02> {   [ U1CC, U1CB, any,U1CA ],
type[Group1] = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC" }; // w
    key <BKSP> {
        type[Group1]="CTRL+ALT",
        symbols[Group1]= [ BackSpace,   Terminate_Server ]
    };
    key <BKSR> { virtualMods = AltGr, [ 1, 2 ] };
    modifier_map Control { Control_L };
    modifier_map Mod5   { <LVL3>, <MDSW> };
    key <BKST> { [1, 2,3, 4] };
};

When con­ver­ted to an XML doc­u­ment, it looks like

<?xml version="1.0" encoding="UTF-8"?>
<layout layoutname="zz">
  <symbols>
    <mapoption>hidden</mapoption>
    <mapoption>xkb_symbols</mapoption>
    <mapname>bare</mapname>
    <mapmaterial>
      <tokenkey override="False">
        <keycodename>AE01</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>1</symbol>
            <symbol>exclam</symbol>
            <symbol>onesuperior</symbol>
            <symbol>exclamdown</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
    </mapmaterial>
  </symbols>
  <symbols>
    <mapoption>xkb_symbols</mapoption>
    <mapname>basic</mapname>
    <mapmaterial>
      <tokenname name="ZZurope"/>
      <tokeninclude>zz(bare)</tokeninclude>
      <tokenkey override="False">
        <keycodename>AD04</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>r</symbol>
            <symbol>R</symbol>
            <symbol>ediaeresis</symbol>
            <symbol>Ediaeresis</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>AC07</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>j</symbol>
            <symbol>J</symbol>
            <symbol>idiaeresis</symbol>
            <symbol>Idiaeresis</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>AB02</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>x</symbol>
            <symbol>X</symbol>
            <symbol>oe</symbol>
            <symbol>OE</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>AB04</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>v</symbol>
            <symbol>V</symbol>
            <symbol>registered</symbol>
            <symbol>registered</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
    </mapmaterial>
  </symbols>
  <symbols>
    <mapoption>xkb_symbols</mapoption>
    <mapname>extended</mapname>
    <mapmaterial>
      <tokenname name="ZZurope Extended"/>
      <tokeninclude>zz(basic)</tokeninclude>
      <tokentype>THREE_LEVEL</tokentype>
      <tokenmodifiermap state="Control">
        <keycode value="Control_L"/>
      </tokenmodifiermap>
      <tokenmodifiermap state="Mod5">
        <keycodex value="LVL3"/>
        <keycodex value="MDSW"/>
      </tokenmodifiermap>
      <tokenkey override="True">
        <keycodename>AD01</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>U1C9</symbol>
            <symbol>U1C8</symbol>
          </symbolsgroup>
          <symbolsgroup>
            <symbol>any</symbol>
            <symbol>U1C7</symbol>
          </symbolsgroup>
          <typegroup value="SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="True">
        <keycodename>AD02</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>U1CC</symbol>
            <symbol>U1CB</symbol>
            <symbol>any</symbol>
            <symbol>U1CA</symbol>
          </symbolsgroup>
          <typegroup value="SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>BKSP</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>BackSpace</symbol>
            <symbol>Terminate_Server</symbol>
          </symbolsgroup>
          <typegroup value="CTRL+ALT"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>BKSR</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>1</symbol>
            <symbol>2</symbol>
          </symbolsgroup>
          <tokenvirtualmodifiers value="AltGr"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>BKST</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>1</symbol>
            <symbol>2</symbol>
            <symbol>3</symbol>
            <symbol>4</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
    </mapmaterial>
  </symbols>
</layout>

When we con­vert the XML doc­u­ment back to the XKB format, it looks like

hidden xkb_symbols "bare"
{
	key <AE01> { [ 1, exclam, onesuperior, exclamdown ] };
};

xkb_symbols "basic"
{
	name = "ZZurope";
	include "zz(bare)"
	key <AD04> { [ r, R, ediaeresis, Ediaeresis ] };
	key <AC07> { [ j, J, idiaeresis, Idiaeresis ] };
	key <AB02> { [ x, X, oe, OE ] };
	key <AB04> { [ v, V, registered, registered ] };
};

xkb_symbols "extended"
{
	name = "ZZurope Extended";
	include "zz(basic)"
	key.type = "THREE_LEVEL";
	modifier_map Control { Control_L };
	modifier_map Mod5 { <LVL3>, <MDSW> };
	override key <AD01> { [ U1C9, U1C8 ], [ any, U1C7 ], type = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"  };
	override key <AD02> { [ U1CC, U1CB, any, U1CA ], type = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"  };
	key <BKSP> { [ BackSpace, Terminate_Server ], type = "CTRL+ALT"  };
	key <BKSR> { [ 1, 2 ], virtualMods = AltGr  };
	key <BKST> { [ 1, 2, 3, 4 ] };
};

Some things are miss­ing such as par­tial, alphanumeric_​keys and alternate_​group, which I dis­cussed with Sergey and he said they should be ok to go away.

In addi­tion, we sim­plify by keep­ing just Group1 (we do not spe­cify it, as it is implied).

I per­formed the round-​trip with all layout files, and all parsed and val­id­ated OK (there is some extra work with the level3 file remain­ing, though).

Some issues that are remain­ing, include

  • Fig­ur­ing out how to use XLink to link to doc­u­ments in the same folder (+provid­ing a para­meter; the name of the vari­ant), and how to rep­res­ent that in the Relax NG schema.
  • Sort the layout entries by key­code value.

4 Responses to “Converting between XKB and XML”

  1. That is a won­der­ful test­a­ment to how shit XML is.

  2. @Anonym­ous: you’ve obvi­ously never tried to edit files in the old com­pact syntax, which is so brittle and error-​inducing you always spend at least 30min fig­ur­ing where a typo intro­duced break­age (that is if you do not give up as the aver­age human does). It’s a pretty mine­field.

    @Simos

    Nice to see this kind of pro­gress!

    Now, I think you still need to work a lot on your xml gram­mar. You’re fall­ing in the trap of every machine-​generated file (XML or not) which is excess­ive verb­os­ity. To be suc­cess­ful and get adop­tion you need to work harder at having con­cise and read­able files.

    Some remarks :
    — you have a sym­bols ele­ment with map­op­tion, map­name, mapmaterial… inside. That sort of screams your symbol ele­ment should be named map (same for tokentype… inside map­ma­ter­ial. Take care to have con­sist­ent naming please)

    — XML is a struc­tured lan­guage. You should not need to name the child of foo foo­op­tion. You can infer an option is a foo­op­tion by the fact its parent is foo (for example, map­op­tion inside sym­bols, key­code­name inside token­key)

    – as a rule, when you can have only one bar chil­dren of foo, it’s more com­pact and human-​friendly to have it as attrib­ute () that as chil­dren ele­ment. Though opin­ion on attrib­utes vary in the XML world and some people recom­mend to just sup­port both and have users choose the most appro­pri­ate to them. But anyway smart use of attrib­utes should kill some of your XML com­plex­ity

    – a nice prop­erty of your XML layout is that each symbol is its own ele­ment. That means that unlike the legacy syntax, you can allow sev­eral symbol syn­taxes. For example :
    й
    0439
    Cyrillic_​shorti
    01000439
    This alone would make the files edit­able by normal beings

    — you don’t really need to use this syntax SEPARATE_CAPS_AND_SHIFT_ALPHABETIC when the start and end of the name is nicely delim­ited with “”

    – zz(basic) is really a two level ele­ment. Do you really want to keep another kind of token­iz­a­tion inside your XML syntax?

    — it would prob­ably sim­plify your files to have an over­ride value at the map­ma­ter­ial level and only spe­cify it in token­key when it’s dif­fer­ent from the global one

    — you should ask for syntax advice on the xml-​dev ML if you’ve not already done so

  3. I meant
    – a nice prop­erty of your XML layout is that each symbol is its own ele­ment. That means that unlike the legacy syntax, you can allow sev­eral symbol syn­taxes. For example :
    [unicodevalue]й[unicodevalue]
    [unicodepoint]0439[unicodepoint]
    [magicnamenooneknows]Cyrillic_shorti[magicnamenooneknows]
    [magicnonstandardvalue]01000439[magicnonstandardvalue]
    This alone would make the files edit­able by normal beings

  4. Many thanks for the com­ments Nic­olas!
    I am look­ing into these.

Discussion Area - Leave a Comment