Mi blog lah! Το ιστολόγιό μου

22Nov/082

Help make «DocBook XML to PDF» work for Greek

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<article lang="en">
<section><title>Title</title>
<para>ëãáâṩẫëĝéõōåőȩçą</para>
<para>ЁЂыњѨѬѺѸѶѦщЖЊЌЍШЩзф</para>
<para>ᾶᾳὰέᾁᾂδϕϟϸϡϸϸαϷϕϲδϕϛ€ϕ©ϖϐͻ©ϖϐ</para>
</section>
</article>

This is an issue that I would appreciate if someone could help in solving.

The above document (mytestfile.xml) is a DocBook XML document with text in many scripts (latin, cyrillic and greek). Normally it was difficult to convert to PDF, until recently.

Now, one can run

dblatex --backend=xetex --verbose mytestfile.xml

(requires to install the dblatex package and any dependencies) and it creates mytestfile.pdf. If you have a fresh installation of Ubuntu 8.10 and you go through the process of installing these packages, please make a list of them, to use as advice for new users.

Generated PDF document with lang=en

Generated PDF document with lang=en

Since we use XeTeX as a backend, we can work with Unicode text directly, which is the proper thing to do. Above you can see that all characters are shown (except a few obscure ones that are not found in DejaVu Sans and are shown as boxes). You can see Latin (+Extended), Cyrillic (+Extended), Greek (+Extended) in the same document.

Generated PDF document with lang=el

Generated PDF document with lang=el

The issue arises when we change the lang modifier in the document above, from en to el. Here you see Τιτλε, which in fact is Title but with the characters replaced with their Greek equivalent. This is a sign for non-Unicode, 8-bit encoding conversion issue. In addition, some of the rest of the characters are shown, and apparently a strange conversion took place.

What we need to do is figure out is how to fix xetex when 'lang=el'. There is some work to get Greek XeTeX support upstream, and there are instructions on how to add local Greek XeTeX support in your distribution.

What we need is instructions on how to fix the Greek XeTeX support in Ubuntu 8.10, and test that dblatex can generate documents correctly when lang=el.

For your testing, here are the files mytestfile-en.pdf, mytestfile-el.pdf, mytestfile-en.xml, mytestfile-el.xml.

Comments (2) Trackbacks (2)
  1. testing this.

    Like or Dislike: Thumb up 0 Thumb down 0

  2. Eγω Oasis εψαχνα βρε μαγκες. You know… http://www.discoogle.com/wiki/Oasis
    Very good

    Like or Dislike: Thumb up 0 Thumb down 0


Leave a comment

Comments are filtered through Akismet for spam detection. Use of a non-personal web site or blog URL in the field below and/or comments that are off-topic or personal attacks will likely be removed at my discretion. Your opinions are welcome but please keep them polite and constructive.