guadec, gsoc l10n-el, ellak-conf
guadec
I am attending GUADEC this year, thanks to the sponsorship by the GNOME Foundation!

I am organising the GNOME Localisation BoF, which takes place on Friday, 10th July, 2009, at 17:00. I am also having a session on the GNOME translator command line tool gnome-i18n-manage-vcs on the same day at 15:00.
gsoc
A few months ago, there was a program in Greece, along the lines of the Google Summer of Code, to help Greek developers in FLOSS projects. The program was organised by EELLAK, a Greek non-profit, composed of 25 institutions of the tertiary education and research centres. As it took place during the spring, it was nicknamed Greek Spring of Code (gsoc).
Apart from developing software, the program had a localisation angle, and we applied for the localisation of GNOME 2.26 to the Greek language. In practice, this meant that we had to lift the documentation translations from 32% to 100%, complete the remaining UI translations.

Many contributors helped in this effort; Jennie Petoumenou (also co-organiser in the effort), Marios Zindilis, Fotis Tsamis, Kostas Papadimas, Nikos Charonitakis, Sterios Prosiniklis, Giannis Katsampiris, Michalis Kotsarinis, Vasilis Kontogiannis and Socratis Vavilis.The overall task was difficult, and our team did an amazing task to complete the translations on time. Thank you all, and especially Jennie and Marios for undertaking huge chunks of the translation effort for this release.
Here are the GNOME EL 2.26 deliverables in HTML, PDF.
ellak-conf
The fourth Greek FOSS (ELLAK) conference took place in Athens on the 19-20th June 2009.

We had our annual localisation meetup!
I organised a workshop on git, with a focus on how to use when starting into software development. There was emphasis on using github.com to host and manage the development. In addition, services such as github.com allow to cooperate during the development, making programming a more social and interesting task.
Finally, there was a presentation of the Greek GNOME team efforts for the last year.
Συνέδριο ΕΛΛΑΚ: Εξελληνισμός GNOME 2.26
Στις 19 Ιουνίου 2009 έγινε παρουσίαση του έργου εξελληνισμού του GNOME στο συνέδριο δημιουργών ΕΛ/ΛΑΚ.
Είχαμε την ευκαιρία να μιλήσουμε για το αποτέλεσμα του τελευταίου έργου εξελληνισμού του GNOME όπου ολοκληρώσαμε τη μετάφραση του GNOME 2.26 για το γραφικό περιβάλλον και την τεκμηρίωση στα ελληνικά.
Πριν ξεκινήσουμε στις αρχές της άνοιξης, είχαμε μεταφρασμένο ήδη το 32% της τεκμηρίωσης και το 87% του γραφικού περιβάλλοντος. Με το τέλος του έργου (πλήρης μετάφραση), για την τεκμηρίωση έχουμε μεταφράσει 343.000 λέξεις περίπου και για το γραφικό περιβάλλον 190.000 λέξεις.
Οι μεταφραστές που βοήθησαν στην έκδοση αυτή είναι
- Μάριος Ζηντίλης
- Τζένη Πετούμενου
- Στέργιος Προσινικλής
- Φώτης Τσάμης
- Γιάννης Κατσαμπίρης
- Μιχάλης Κοτσαρίνης
- Βασίλης Κοντογιάννης
- Σωκράτης Βαβύλης
- Κώστας Παπαδήμας (pkst)
- Νίκος Χαρωνιτάκης (frolix68)
- Σίμος Ξενιτέλλης (simosx)
- (κάποια μέλη δεν έδωσαν το πλήρες όνομά τους, παρακαλώ επικοινωνήστε)
Από τα μεγάλα πακέτα της τεκμηρίωσης, έχουμε τα
- Οδηγός διαχείρισης (Τζένη)
- Οδηγός προσιτότητας (Τζένη)
- Τεκμηρίωση Evolution Mail (Μάριος)
- Τεκμηρίωση Aisleriot (Τζένη)
- Τεκμηρίωση gedit (Μιχάλης)
- Τεκμηρίωση gdm (Στέργιος)
Το μεγαλύτερο μέρος από τα στιγμιότυπα οθόνης (screenshots) τα ανέλαβαν οι Φώτης Τσάμης και Μάριος Ζηντίλης.
Το παραδοτέο είναι διαθέσιμο στο http://www.gnome.gr/files/gnome226/ και οι συντονιστές έργου ήταν οι Τζένη Πετούμενου και Σίμος Ξενιτέλλης. Οι commiters ήταν οι Νίκος Χαρωνιτάκης, Κώστας Παπαδήμας και Σίμος Ξενιτέλλης.
Το αρχείο της τεκμηρίωσης είναι ELLAK_Conf2009-GNOME-L10n (.odp, για OpenOffice.org Impress).
Towards a GNOME CLI translation management tool
Update 7/June/2009: The repository was moved to https://github.com/simos/gnome-i18n-manage-vcs/. There was some confusion between this script and intltools, which now is a general localisation tool, not tied to GNOME.
In Designing a command-line translation tool for GNOME, I described how a CLI translation management tool would be used to ease the work of a translator with commit access. The discussion was continued with Leonardo’s post Parsing damned-lies’ releases.xml.in in the command line.
The stage we are now is that we have a tool (not official GNOME tool, but rather at beta testing phase!) that can manage the repositories for us, so that the checking out and committing can be fairly automated. The source is available at https://github.com/simos/gnome-i18n-manage-vcs/.
We show two working examples.
Let’s say we want to update the documentation for gcalctool. We run
$ ./intltool-manage-vcs --language el --release gnome-2-26 \ --username simos --module gcalctool --transtype doc --init Release : gnome-2-26 Language : el Category: admin-tools Category: dev-tools Category: dev-platform Category: desktop Module: gcalctool, Branch: gnome-2-26 Download completed successfully.
$ _
In the PO/ subdirectory there is a PO file for gcalctool. We update it using our favourite translation tool, and then
$ ./intltool-manage-vcs --language Greek --commit Sending el/el.po Transmitting file data . Committed revision 2475.
$ _
Let’s see another example. We want to update the gnome-games documentation. These are several individual PO files, for each of the games.
$ ./intltool-manage-vcs --language el --release gnome-2-26 \ --username simos --module gnome-games --transtype doc --init Release : gnome-2-26 Language : el Category: admin-tools Category: dev-tools Category: dev-platform Category: desktop Module: gnome-games, Branch: gnome-2-26 Download completed successfully. $ _
There are several files,
$ ls PO
aisleriot.gnome-2-26.el.po gnibbles.gnome-2-26.el.po
gnotravex.gnome-2-26.el.po README
blackjack.gnome-2-26.el.po gnobots2.gnome-2-26.el.po
gnotski.gnome-2-26.el.po same-gnome.gnome-2-26.el.po
glchess.gnome-2-26.el.po gnome-sudoku.gnome-2-26.el.po
gtali.gnome-2-26.el.po START
glines.gnome-2-26.el.po gnometris.gnome-2-26.el.po
iagno.gnome-2-26.el.po gnect.gnome-2-26.el.po
gnomine.gnome-2-26.el.po mahjongg.gnome-2-26.el.po
$ _
We enter the PO/ subdirectory and we update those files we wish. We can also run scripts on the PO files. For example, all these documentation files contain the same fragment of the FDL license, so we can translate the license once, and then merge automatically to all translations.
Finally,
$ ./intltool-manage-vcs --language Greek --commit Sending el/el.po Transmitting file data . Committed revision 9014. Sending el/el.po Transmitting file data . Committed revision 9015. Sending el/el.po Transmitting file data . Committed revision 9016. $ _
In the above example, we updated the documentation of three of the games.
Here are tips when using this tool
- There is a –dry-run option that is useful when experimenting or trying for the first time.
- You can filter which group of a release to download, based on category. Existing categories are desktop, admin-tools, dev-tools, dev-platform. Also, on translation type, either documentation or UI (if you do not specify, we get both). On module, by providing the module name.
And the current limitations
- We currently only support SVN. This will change once the repositories move to git.gnome.org, in about two weeks time.
- You need to have at least an initial translation (currently, the script does not svn add files). To be fixed once we move to git.
- We do not currently update ChangeLog files. That’s why gnome-games is so cool for these experiments. Due to the git move, we would not need to mess with ChangeLog files.
- We are dependent on the http://l10n.gnome.org/languages/el/gnome-2-26/xml URLs (replace el with your language). These URLs expose the release modules information in a nice XML file. Previously, the information used to exist in an XML file in the repository of damned-lies. Now, the information lies in the mysql database of damned-lies+vertimus, and is exposed through the above type of URL.
- Due to the previous point, we commit to branch or trunk, depending on what is available in the latest release (gnome-2-26). That means, my translation fixes in gnome-games have not made it to trunk (HEAD). This is something that can be fixed with a workaround. It would be actually cool to use this tool to commit to both gnome-2-xx and master at the same time.
- We currently do not deal with figures.
Considering that damned-lies+vertimus will be having commit functionality soon, I think that having more than one option for easy commiting translations is good.
Should UI strings in source code have non-ASCII characters?
There is a discussion going on at desktop-devel about whether the UI strings in the source code should also have non-ASCII characters. For example, should typical strings with double-quotes have those fancy Unicode double quotes?
printf(_("Could not find file “%s”n"));
instead of
printf(_("Could not find file "%s"n"));
The general view from the replies is to go ahead and add those nice Unicode characters.
Actually, there are UI messages already with non-ASCII characters (the ellipsis character, …) in GNOME 2.22:
- glade3
- epiphany
In GNOME 2.24, there are even more (with ellipsis):
- gucharmap
- epiphany
- gnome-terminal
- gedit
- glade3
Regarding the fancy Unicode double quotes, there are UI strings in GNOME 2.22 (same list for 2.24) in the following packages:
- evince
- cheese
- epiphany
- eog
- gnome-doc-utils
What are the arguments against having non-ASCII characters in UI strings?
- There might be systems that still use 8-bit legacy encodings. In this case, the UTF-8 encoded may not be displayed properly. However, when I tried to demonstrate this on my system (Ubuntu 8.04), I failed miserably. I downloaded a small GTK2 text editor (called tea), I changed a source UI string to include “” and ellipsis, compiled and installed. I then opened a shell, set LANG to POSIX (or C), and ran the text editor. The UI message was proper Unicode and I could even type non-ASCII in the text editor. I resorted to changing a system locale (I picked en_IN) to ISO-8859-1, then logged out. In the login screen it did not show the 8-bit encoding. If someone has a proper legacy 8-bit encoding system with GNOME (OpenBSD, FreeBSD, etc), could you please try it out?
- As Alan Cox mentioned in the thread, the canonical way to deal with UI strings in the source code should be to keep as ASCII, and put any fancy Unicode characters in the translation files (even for en_US, get an en_US translation file).
Is GNOME (or components) used in a legacy 7-bit/8-bit environment?
If there is any reason to keep UI strings in the source code as plain ASCII, speak now, or the Unicode flood gates are about to open.
Update 16 May 2008:There is a document at the ISO/IEC 9899 website (C programming language), that mentions the issue of character sets in C. It is http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf.
On page 26, section 5.2.1, it says
The C89 Committee ultimately came to remarkable unanimity on the subject of character set requirements. There was strong sentiment that C should not be tied to ASCII, despite its heritage and despite the precedent of Ada being defined in terms of ASCII. Rather, an implementation is required to provide a unique character code for each of the printable graphics used by C, and for each of the control codes representable by an escape sequence. (No particular graphic representation for any character is prescribed; thus the common Japanese practice of using the glyph “¥” for the C character “” is perfectly legitimate.) Translation and execution environments may have different character sets, but each must meet this requirement in its own way. The goal is to ensure that a conforming implementation can translate a C translator written in C.
For this reason, and for economy of description, source code is described as if it undergoes the same translation as text that is input by the standard library I/O routines: each line is terminated by some newline character regardless of its external representation.
With the concept of multibyte characters, “native” characters could be used in string literals and character constants, but this use was very dependent on the implementation and did not usually work in heterogenous environments. Also, this did not encompass identifiers.
It then goes on with an addition to C99:
A new feature of C99: C99 adds the concept of universal character name (UCN) (see §6.4.3) in order to allow the use of any character in a C source, not just English characters. The primary goal of the Committee was to enable the use of any “native” character in identifiers, string literals and character constants, while retaining the portability objective of C.
Both the C and C++ committees studied this situation, and the adopted solution was to introduce a new notation for UCNs. Its general forms are unnnn and Unnnnnnnn, to designate a given character according to its short name as described by ISO/IEC 10646. Thus, unnnn can be used to designate a Unicode character. This way, programs that must be fully portable may use virtually any character from any script used in the world and still be portable, provided of course that if it prints the character, the execution character set has representation for it.
Of course the notation unnnn, like trigraphs, is not very easy to use in everyday programming; so there is a mapping that links UCN and multibyte characters to enable source programs to stay readable by users while maintaining portability. Given the current state of multibyte encodings,
10 this mapping is specified to be implementation-defined; but an implementation can provide the users with utility programs that do the conversion from UCNs to “native” multibytes or vice versa, thus providing a way to exchange source files between implementations using the UCN notation.
Update 7 Aug 2008: According to PEP 8, Style Guide for Python Code, under Encodings, says
For Python 3.0 and beyond, the following policy is prescribed for
the standard library (see PEP 3131): All identifiers in the Python
standard library MUST use ASCII-only identifiers, and SHOULD use
English words wherever feasible (in many cases, abbreviations and
technical terms are used which aren't English). In addition,
string literals and comments must also be in ASCII. The only
exceptions are (a) test cases testing the non-ASCII features, and
(b) names of authors. Authors whose names are not based on the
latin alphabet MUST provide a latin transliteration of their
names.
Open source projects with a global audience are encouraged to
adopt a similar policy.
(Emphasis mine)
Using Anjuta in Ubuntu 8.04 to develop a GNOME C++ application (gtkmm)
You can install Anjuta 2.4.1 from the Synaptic package manager. You also need to install a few development packages. I do not know if there is a nice meta-package such as build-essential (used to install compilers et al), so I’ll just ask you to install the packages by hand. A more elegant way would be very much appreciated to see in the comments.
$ sudo apt-get install build-essential libgtkmm-2.4-dev autogen automake libtool intltool libglademm-2.4-dev
That is the order of installation when you go trial by error inside Anjuta to compile a project. Each package draws in several other packages. Also, if you have the Ubuntu 8.04 DVD in your drive, most of these packages will be installed in a jiffy. We have the Greek localisation enabled, so bear with us. Thanks to Giannis Katsampiris for completing the recent update of the Anjuta 2.4 localisation.

Once Anjuta is installed, you are presented with the Anjuta main window.
We then click on File/New/Project (Αρχείο/Νέο/1. Έργο),
We click on Forward here.
There are many many project types. We wade through and we pick to use C++ and GTKMM (C++ bindings for GTK+). We could pick any other variation; GTKMM was a request from the Ubuntu-gr mailing list.
We then fill in some contact details.
There is an option to specify at this stage external packages. We opt not to specify them now.
Once you click Apply (Εφαρμογή) – the button with the green tick, Anjuta will create an initial dummy package (actually a hello world application), and will run automatically the equivalent of ./configure for you.
Now, this is the final screen, when you start working. Here you would click on Κατασκευή/Κατασκευή έργου (Build/Build Project), so that the project gets compiled.
Then, you would click on Κατασκευή/Εκτέλεση προγράμματος… (Build/Run program…) to run the program!
Here is shows that we have located the source file (main.cc), and we see main().
It takes about 3 second to compile a program with g++ (at least on my system). Therefore, the dead time between (a) Let’s compile it and (b) Oh, I am running my program!, is under 5 seconds, which is good.
Timezones, clock applet and marketing dangers
It is great to receive feedback from users that try out the development versions of distributions (such as Ubuntu and Fedora). Usually, these are small bugs that can easily get fixed. However, there is this bug that looks potent to lead to political dissatisfaction and bad publicity to GNOME.
The clock applet (gnome-panel) now shows the timezones of cities that one selects. You click on the Edit button, you select the city (it comes from Locations.xml – libgweather, which has the coordinates of each city entry), and the applet makes a guess of what is your timezone (each timezone comes with longitude information).
So, if a city is far away from the capital city of your country (and closer to the capital city of a neighboring country), then the applet often proposes the wrong timezone. Considering that in some (=many) cases there is some animosity between neighboring countries, this makes users unhappy.
Launchpad bug report: Bug #185190, Clock applet chooses wrong timezone for many cities (eg Pittsburgh, Beijing)
GNOME Bugzilla bug report: Bug 519823 – Cities associated with wrong timezone
Updated (8Apr2008): The bug has been fixed upstream (thanks Dan!) and most likely makes it in GNOME 2.22.1, which means Ubuntu 8.04 and other distributions will get the update as well. Some countries with regions that have more than one timezone may want to check that the correct timezone is selected for each region.
How to easily modify a program in Ubuntu (updated)?
Some time ago we talked about how to modify easily a program in Ubuntu. We gave as an example the modification of gucharmap; we got the deb source package, made the change, compiled, created new .deb files and installed them.
We go the same (well, similar) route here, by modifying the gtk+ library (!!!). The purpose of the modification is to allow us to type, by default, all sort of interesting Unicode characters, including ⓣⓗⓘⓢ , ᾅᾷ, ṩ, and many more.
The result of this exercise is to create replacement .deb packages for the gtk+ library that we are going to install in place of the system libraries. Because these new libraries will not be original Ubuntu packages, the update manager will be pestering us to rollback to the official gtk+ packages. This is actually good in case you want to switch back; you will have the enhanced functionality for as long as you postpone that update.
There is a chance we might screw up our system, so please make backups, or have a few drinks first and come back. I take no responsibility if something bad happens on your system. If you are having any second thoughts, do not follow the next steps; use the safer alternative procedure. You may try however this guide just for the kicks; up to the dpkg command below, no changes are being made to your system.
We use Ubuntu 7.10 here. This should work in other versions, though your mileage may vary.
The compilation procedure takes time (about 30 minutes) and space. Make sure you use a partition with >2GB of free space. We are not going to use up 2GB (a bit less than 1GB), but it’s nice not to fill up partitions.
We are going to use the generic instructions on how to recompile a debian package by ducea.
First of all, install the development packages,
sudo apt-get install devscripts build-essential
Next, we use the apt-get source command to get the source code of the GTK+ 2 library,
cd /home/ubuntu/bigpartition_over2GB/apt-get source libgtk2.0-0
We then pull in any dependencies that GTK+ may require. They are normally about a dozen packages, but we do not have to worry for the details.
apt-get build-deplibgtk2.0-0
At this stage we need to touch up the source code of GTK+ before we go into the compilation phase. Visit the bug report #321896 – Synch gdkkeysyms.h/gtkimcontextsimple.c with X.org 6.9/7.0 and download the patch (look under the Attachment section). You should get a file named gtk-compose-update.patch. If you have a look at the patch, you will notice that it expects to find the source of gtk+ in a directory called gtk+. Making a link solves the problem,
ln -s libgtk2.0-0 gtk+
We then attempt to apply the patch (perform a dry run), just in case.
patch -p0 --dry-run < /tmp/gtk-compose-update.patch
If this does not show an error message, you can the command again without the –dry-run.
patch -p0 < /tmp/gtk-compose-update.patch
Finally, we are ready to build our fresh GTK+ library.
cd libgtk2.0-0debuild -us -uc
This will take time to complete, so go and do some healthy cooking.
At the end of the compilation, if all went OK, you should have about a dozen .deb files created. These are one directory higher (do a “cd ..“). To install, use dpkg,
dpkg -i *.deb
If you have any other deb files in this directory, it’s good to move them away before running the command. If all went ok, the .deb files should install without a hitch.
The final step is to restart your system. To test the new support, see the last section at this post. Use Firefox and OpenOffice.org to type those Unicode characters.
If you managed to wade through all these steps, I would appreciate it if you could post a comment.
Good luck!
Designing a command-line translation tool for GNOME
One messy task with GNOME translations is the whole workflow of getting the PO files, translating/updating/fixing them, and then uploading them back. One would need to use command line, and several different commands to accomplish this.
KDE and KBabel has a nice feature that allows you to easily grab all translation files, work on them, then commit through SVN. All through the GUI! It helps a bit here that the translation files for a specific language are located under a single directory.
The current workflow in GNOME translations typically consists of
- Getting the PO file from the L10n server (for example, GNOME 2.22 Greek) (also possible to use intltool-update within po/)
- Translate using KBabel, POEdit, GTranslator, vim, emacs, etc.
- svn co the package making sure you have the correct branch. One may limit to the po/ directory.
- Put the updated file in po/
- Update the ChangeLog (either with emacs, or with that Perl script)
- Commit the translation.
- (If you committed on a branch, also commit on HEAD)
Tools such as Transifex (used currently in Fedora) take away altogether the use of command line tools, and one works here through a web-based interface. Apparently, Transifex is having a command-line tool in the TODO list.
What I would like to see in GNOME translations, is a tool that one can use to
- Grab all or a section of the PO files from GNOME 2.22. Put them in a local folder.
- Use the tools of my preference (translation tools, scripts, etc) to update those translations I need to update.
- Commit those translation files that changed (using my SVN account), automatically add ChangeLog entries, also commit to HEAD if required.
I would prefer to have a command-line tool for this, for now, though it would be great if GUI tools would get the same functionality at some point. For a command line tool, the workflow would look like
The workflow would be something like
$ ssh-add Enter passphrase for /home/simos/.ssh/id_rsa: Identity added: /home/simos/.ssh/id_dsa (/home/simos/.ssh/id_dsa)
$ tsfx --project=gnome-2.22 --language=el --collection=gnome-desktop --user=simos --action=checkout
Reading from http://svn.gnome.org/svn/damned-lies/trunk/releases.xml.in... done.
Getting alacarte (HEAD)... done.
Getting bug-buddy (branch: xyz)... done.
...
Completed in 4:11s.
$ _
Now we translate any of the files we downloaded, and we push back upstream (of course, only those files that were changed).
$ tsfx --action=commit
Found local repository, Project: gnome-2.22, Language: el, Collection: gnome-desktop, User: simos
Reading local files...
Found 6 changed files.
Uploading alacarte (HEAD)... done.
Uploading bug-buddy (branch:xyz, HEAD)... done.
...
Completed uploading translation files to gnome-2.22, language el.
$ _
Typing squiggles and dots in GNOME and GTK+ applications
Garrett asks how to type squiggles and dots in GNOME; that is, how to type characters such as á à ä ã â ą ȩ ę ő ǰ ǩ ǒ ġ ṅ ȯ ṁ ė.
There are several ways, and one can choose depending on how frequently they need to type them or how much time they need to invest learning.
① One option is to start the Character Map (Applications/Accessories/Character Map), pick the character, copy and paste it. This is good for rare characters and weird situations such as
┏━━━━━━━━━━━━━━━━━━━━━━━┓
⟁⟁⟁⟁♥♀★★▶◀☆♀░░░▒▒▒▓▓▓▙▚▛▙▙▙▞
The Unicode standard, apart from defining characters for languages, it also defines symbols, dingbats and all sort of things. If your distribution is based on the DejaVu fonts (such as Ubuntu), then you are probably covered for many of these symbols. If you do not have a suitable font, or you use Windows, you will be wondering what the hell I am talking about.
② Another option is to use the Character Palette applet which shows an applet on the panel with a configurable small repertoire of characters such as áàéíñó½©ث€. You select one of the characters with the mouse, and wherever you middle-click, this character is typed. This is an improvement over ①, and good when you want to type often rare characters. It is not convenient to type characters found normally on a keyboard layout.
③ To type characters normally found in a specific language(s), it is good to setup a suitable keyboard layout. For this, it is good to add the Keyboard Indicator applet; right click on the panel, click Add to panel… and choose the Keyboard Indicator from the Utilities section. The US English keyboard layout (Default variant) does not provide any interesting characters apart from those shown printed on the keys of a US Keyboard.

The US English International (with dead keys) variant might be a better option,
Or the United Kingdom layout.
You can get a similar image for your layout when you right-click on the Keyboard Indicator applet, then click Show Current Layout.
Each key in the images contain up to four letters. Starting from bottom-left and going clock-wise, these are the keys produced when
ⓐ you press the key
ⓑ you press the key with Shift (or Caps Lock)
ⓒ you press the key with AltGr and Shift (or Caps Lock)
ⓓ you press the key with AltGr
For example, with the UK keyboard layout, the key G produces g, G, Ŋ, ŋ.
If AltGr + Shift + letter does not work for you, see the FDO Bug #2871 Different results for shift-altgr and altgr-shift.
Using the appropriate keyboard layout is the way to go when writing text that require squiggles. You can either choose a layout with dead keys (meaning that some keys lose their normal functionality), or you can pick a layout that still allows you to have dead keys but are available when you press AltGr + key. For example, in the UK Keyboard layout – Default variant, AltGr + ; + a produces á, or AltGr+Shift+]+e produces ē.
Photo by titanas.The OLPC uses those four level for the keyboard layout. You can see the all the variations printed on the keyboard. Click on the image, choose Large size for the details.
④ Another option to produce more characters on the keyboard is to enable the compose key, and use compose sequences. A compose sequence looks similar to what we described above (i.e. AltGr+Shift+]+e to ē) but the idea is that we use it for characters we want to be available across different keyboard layouts that you may have enabled.

The compose key is very powerful functionality, thus it is not enabled by default, and lays hidden in the Layout Options tab. I prefer to set it to Menu, but every person has their own preference.
For example,
- Compose key + – + a produces ã,
- Compose key + < + c produces č
- Compose key + 1 + s produces ¹ (Superscript on 1. Try to replace 1 with 2.)
- Compose key + + + – procudes ±
Currently, GTK+ provides 640 such compose sequences involving the Compose key, and hopefully soon it will increase to over 3000.
The Compose key is known as Multi_key in the source code (Xorg, GTK+, etc).
The Compose key compose sequences offer the ability to define smart mnemonics on how to produce characters. It is much easier to type ComposeKey + 1 + s rather than remembering the codepoint value of ¹ (1 superscript). As with many things open-source, there are too many options, and with the Compose key there is the issue of which shall we pick as a sensible default, and how to make it prominent for those who might want to use it.
It appears to me that there should be more effort to promote the functionality that is provided with the standard keyboard layouts (choose a better keyboard layout, produce characters provided in the third and fourth levels, etc). In this respect, Compose key compose sequences should complement after the main discussion on keyboard layouts take place.
⑤ There is a last issue on switching keyboard layouts to cover in a separate post.
Improving input method support in GTK+-based apps
When a bug report gets long with many comments, it gets more difficult for someone to get the full picture of what is going on. I’ll attempt to summarise here what’s being said in Bug 321896, Synch gdkkeysyms.h / gtkimcontextsimple.c with X.org 6.9/7.0.
GTK+-based applications use by default the GTK+ Input Method in order to let users type in different languages. Some scripts are very complex (such as SE Asian scripts) and in this case SCIM is used, replacing the GTK+ Input Method. One can even disable GTK+ IM altogether and use the basic X Input Method (XIM) which is provided by the Xorg server, by setting GTK_IM_MODULE to xim. However, the majority of the users have GTK+ IM enabled.
Between GTK+ IM and XIM, the keyboard layouts are being managed by the xkeyboard-config project and Sergey Udaltsov. A keyboard layout is simply a mapping of keyboard keys to Unicode characters, but you can also have compose sequences for some characters using what we call dead keys. When you press a dead key nothing appears on screen but when you press a letter immediately afterwards, you can get an á. This functionality is common to add accents, and there is a big table for these compose sequences (1.3MB) and what Unicode characters they produce.
If you change your keyboard layout (System/Preferences/Keyboard/Layout) to something like U.S. English International (with dead keys), then the ‘ key on your keyboard becomes dead_acute, and the compose sequence
<dead_acute> <a> : "á" U00E1 # LATIN SMALL LETTER A WITH ACUTE
works when you press ‘ and then a.
There is an issue with compose sequences and input methods; XIM maintains the official upstream version of the compose sequences, and projects such as GTK+ and SCIM carry their own copies of that table.
The issue with GTK+ regarding the compose sequences is that it has a very old version compared to what is available upstream. This is what Bug 321896 is about.
The bug would be have been resolved much much earlier if it wasn’t for the insistence of the GTK+ maintainers to cut the fat and reduce the size of the table (~6000 entries) with clever optimisations.
Tor suggested a clever optimisation; a good number of compose sequences (which looks like <dead_acute> <a> : “á”) resemble the decomposed form (a la Unicode) of those characters. Thus, we can let the user type what she wants, and we can try Unicode normalisation to see if the sequence is composed to a single Unicode character. Lets demonstrate in Python,
$ python
>>> import unicodedata
>>> sequence=[65, 0x301] # That's 'a' and acute
>>> result = unicodedata.normalize('NFC',"".join(map(unichr, sequence)))
>>> result
u'\xc1'
>>> print len(result)
1
>>> print result
Á
That long line above takes the array, applies the unichr() function on each member so that they become Unicode characters and then joins them in a single string. Finally, it normalises the (decomposed) string to a single character. The fact that the resulting string has length 1 (single character) is key to this optimisation. Over 1000 compose sequences can be removed from the compose table through this optimisation. This includes a big chunk of the Latin Unicode blocks, about a few dozens of Cyrillic characters, all of modern Greek and Greek polytonic, some Indic languages (are they actually used?) and other misc sequences.
Matthias laid out the requirements for the optimisation of the remaining compose sequences; ① it has to be static const so a single copy is shared all over the place, ② the first column (out of six) is repeated too often, thus use subtables, and ③ each row ends with a varying number of zeroes, so cut on those zeroes as well. This also required the automatic generation of the optimised table using a script.
The work has not finished yet, and requires testing of the patch. The high priority testing is that keyboard layouts do not get any regressions (that is, compose sequences with dead keys must continue to work along with any new sequences).
With an updated compose table in GTK+, one can write things like ⒼⓃⓄⓂⒺ and all variations of accents on characters, in an easier way.
I’ld like to thank Matthias and Tor for their support in this work. And Jeff for adding this blog to Planet GNOME!








