Google News στα ελληνικά!
To Google News έχει τώρα έκδοση ειδικά για την Ελλάδα.
Η υπηρεσία είναι σε δοκιμαστική κατάσταση (Beta).
Περιλαμβάνει ειδήσεις από τις ηλεκτρονικές εκδόσεις εφημερίδων, ειδησεογραφικούς δικτυακούς τόπους και άλλες πηγές.
Αν κάποιος δικτυακός τόπος θέλει να προσφέρει τις ειδήσεις του, μπορεί να δει τη σχετική σελίδα για του εκδότες.
Important MO file optimisation for en_* locales, and partly others
During GUADEC, Tomas Frydrych gave a talk on exmap-console, a cut-down version of exmap that can work well on mobile devices.
During the presentation, Tomas showed how to use the tool to find the culprits in memory (ab)use on the GNOME desktop. One issue that came up was that the MO files taking up space though the desktop showed English. Why would the MO translation files loaded in memory be so big in size?
gtk20.mo : VM 61440 B, M 61440 B, S 61440 B atk10.mo : VM 8192 B, M 8192 B, S 8192 B libgnome-2.0.mo : VM 28672 B, M 24576 B, S 24576 B glib20.mo : VM 20480 B, M 16384 B, S 16384 B gtk20-properties.mo : VM 128 KB, M 116 KB, S 116 KB launchpad-integration.mo : VM 4096 B, M 4096 B, S 4096 B
A translation file looks like
msgid "File"
msgstr ""
When translated to Greek it is
msgid "File"
msgstr "Αρχείο"
In the English UK translation it would be
msgid "File"
msgstr "File"
This actually is not necessary because if you leave those messags untranslated, the system will use the original messages that are embedded in the executable file.
However, for the purposes of the English UK, English Canadian, etc teams, it makes sense to copy the same messages in the translated field because it would be an indication that the message was examined by the translation. Any new messages would appear as untranslated and the same process would continue.
Now, the problem is that the gettext tools are not smart enough when they compile such translation files; they replicate without need those messages occupying space in the generated MO file.
Apart from the English variants, this issue is also present in other languages when the message looks like
msgid "GConf"
msgstr "GConf"
Here, it does not make much sense to translate the message in the locale language. However, the generated MO file contains now more than 10 bytes (5+5) , plus some space for the index.
Therefore, what's the solution for this issue?
One solution is to add to msgattrib the option to preprocess a PO file and remove those unneeded copies. Here is a patch,
--- src.ORIGINAL/msgattrib.c 2007-07-18 17:17:08.000000000 +0100
+++ src/msgattrib.c 2007-07-23 01:20:35.000000000 +0100
@@ -61,7 +61,8 @@
REMOVE_FUZZY = 1 << 2,
REMOVE_NONFUZZY = 1 << 3,
REMOVE_OBSOLETE = 1 << 4,
- REMOVE_NONOBSOLETE = 1 << 5
+ REMOVE_NONOBSOLETE = 1 << 5,
+ REMOVE_COPIED = 1 << 6
};
static int to_remove;
@@ -90,6 +91,7 @@
{ "help", no_argument, NULL, 'h' },
{ "ignore-file", required_argument, NULL, CHAR_MAX + 15 },
{ "indent", no_argument, NULL, 'i' },
+ { "no-copied", no_argument, NULL, CHAR_MAX + 19 },
{ "no-escape", no_argument, NULL, 'e' },
{ "no-fuzzy", no_argument, NULL, CHAR_MAX + 3 },
{ "no-location", no_argument, &line_comment, 0 },
@@ -314,6 +316,10 @@
to_change |= REMOVE_PREV;
break;
+ case CHAR_MAX + 19: /* --no-copied */
+ to_remove |= REMOVE_COPIED;
+ break;
+
default:
usage (EXIT_FAILURE);
/* NOTREACHED */
@@ -436,6 +442,8 @@
--no-obsolete remove obsolete #~ messages\n"));
printf (_("\
--only-obsolete keep obsolete #~ messages\n"));
+ printf (_("\
+ --no-copied remove copied messages\n"));
printf ("\n");
printf (_("\
Attribute manipulation:\n"));
@@ -536,6 +544,21 @@
: to_remove & REMOVE_NONOBSOLETE))
return false;
+ if (to_remove & REMOVE_COPIED)
+ {
+ if (!strcmp(mp->msgid, mp->msgstr) && strlen(mp->msgstr)+1 >= mp->msgstr_len)
+ {
+ return false;
+ }
+ else if ( strlen(mp->msgstr)+1 < mp->msgstr_len )
+ {
+ if ( !strcmp(mp->msgstr + strlen(mp->msgstr)+1, mp->msgid_plural) )
+ {
+ return false;
+ }
+ }
+ }
+
return true;
}
However, if we only change msgattrib, we would need to adapt the build system for all packages.
Apparently, it would make sense to change the default behaviour of msgfmt, the program that compiles PO files into MO files.
An e-mail was sent to the email address for the development team of gettext regarding the issue. The development team does not appear to have a Bugzilla to record these issues. If you know of an alternative contact point, please notify me.
Update #1 (23Jul07): As an indication of the file size savings, the en_GB locale on Ubuntu in the installation CD occupies about 424KB where in practice it should have been 48KB.
A full installation of Ubuntu with some basic KDE packages (only for the basic libraries, i.e. KBabel - (ls k* | wc -l = 499)) occupies about 26MB of space just for the translation files. When optimising in the MO files, the translation files occupy only 7MB. This is quite important because when someone installs for example the en_CA locale, all en_?? locales are added.
The reason why the reduction is more has to do with the message types that KDE uses. For example,
msgid ""
"_: Unknown State\n"
"Unknown"
msgstr "Unknown"
I cannot see a portable way to code the gettext-tools so that they understand that the above message can be easily omitted. For the above reduction to 7MB, KDE applications (k*) occupy 3.6MB. The non-KDE applications include GNOME, XFCE and GNU traditional tools. The biggest culprits in KDE are kstars (386KB) and kgeography (345KB).
Update #2 (23Jul07): (Thanks Deniz for the comment below on gweather!) The po-locations translations (gnome-applets/gweather) of all languages are combined together to generate a big XML file that can be found at usr/share/gnome-applets/gweather/Locations.xml (~15MB).
This file is not kept in memory while the gweather applet is running.
However, the file is parsed when the user opens the properties dialog to change the location.
I would say that the main problem here is the file size (15.8MB) that can be easily reduced when stripping copied messages. This file is included in any Linux distribution, whatever the locale.
The po-locations directory currently occupies 107MB and when copied messages are eliminated it occupies 78MB (a difference of 30MB). The generated XML file is in any case smaller (15.8MB without optimisation) because it does not include repeatedly the msgid lines for each language.
I regenerated the Locations.xml file with the optimised PO files and the resulting file is 7.6MB. This is a good reduction in file space and also in packaging size.
Update #3 (25Jul07): Posted a patch for gettext-tools/msgattrib.c. Sent an e-mail to the kde-i18n-doc mailing list and got good response and a valid argument for the proposed changes. Specifically, there is a case when one gives custom values to the LANGUAGE variable. This happens when someone uses the LANGUAGE variable with a value such as "es:fr" which means show me messages in Spanish and if something is untranslated show me in French. If a message has msgid==msgstr for Spanish but not for French, then it would show in French if we go along with the proposed optimisation.
Google Groups: Member Invite Request Approved
When creating a Google Group, you have the option of auto-subscribing a list of e-mails. That is, the owner of the email address does not have perform the subscription task. To avoid the apparent spamming opportunity, Google Groups puts a human to review those requests. After you pasted the e-mail addresses, you press Submit and then get a text box where you can write a message to help this person decide.
While filling such a request, I made a gross mistake and I added 140 more email addresses than I should. In the text box I write with capitals, PLEASE CANCEL THIS REQUEST, MISTAKE.
Just now I got a reply, and that requst got approved. On the positive side, the auto-subscription request was thankfully converted to a notification request, so all these people received a request to join the group.
Thank you all for not complaining!
p.s.
My regular blog is offline for a few days so I am using this one for now.
