Stuff I have written/presented
Encodings, fonts, PDF and pain
Submitted by gwolf on Sun, 07/19/2009 - 10:37
I volunteered to work on producing the DebConf nametags, and worked on it closely with César (cek) for most of the afternoon. The process clearly shows no database is comprehensive enough to base DebConf on it - All in all, we managed a very good advance percentage, integrating the data on who sleeps where, how each person eats, and so on. And slightly before lunchtime, we had the final listing. Joy!
…Until I asked César what script would we use for turning the data into nice, printable nametags. He plainly replied, «none that I know of». Ok, so on to produce the layout. The first idea was, as it has been done at other DebConfs, to make a LaTeX layout - but both our LaTeX-fu is heavily limited. Well, to the hell with it, that's why I recently packaged and uploaded Prawn, which is currently sitting in the NEW queue — Fast, Nimble PDF Generation For Ruby. Prawn has two main characteristics which are making me migrate some systems away from PDF::Writer into it:
So, yes, populating the page with the ten nametags each will take is quite simple:
Yay, nice, isn't it? Of course, inside generate_nametag_for() we have all the needed magic to position the text, resize the images and so on. All in all, a cute and nice library, even with Ruby's often strangely idiosincratic culture.
Until we started checking for correctness. First, we hit Eddy Petrişor — The ş was showing as an unknown character. Of course, even though Prawn correctly understands UTF8, the built-in font does not handle Eastern European alphabets. No worries, pdf.font "/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf" got us out of the predicament. But then, Andrew Lee (李健秋) appeared as a second case of undisplayable characters. And yes, Andrew has all the right in the world to expect his name, a proper UTF8 encoded string, to appear on the nametag!
…The problem is that all the fonts we could find that work for CJK fail for non-US-ASCII Latin characters. Isn't Unicode supposed to solve this? Yes, fonts need to properly implement the correct encoding… Jonas explained that Prawn (as well as any libraries dealing with fonts) should really use Fontconfig so multiple fonts can be specified, falling back in case some codepoints are not specified in them. But Prawn does not support Fontconfig.
To make matters worse, most Asian fonts (the Arphic family) are now shipped as TrueType Collections (TTC) instead of TrueType Fonts (TTF), in order to save space due to the tremendous similarity they have. And, you guessed it, Prawn does not yet understand TTCs (or I couldn't find how to).
All sorts of ideas were brought up. After playing a bit trying to change the font being used when detecting Asian encoding (and failing at it), I threw up my hands and decided I'd just change the font whenever the name contained the "Andrew Lee" string. Dirty and ugly idea, yes, but would work.
Just as I was about to do it, Andrew came back jumping with joy — He gave me an Arphic font which contains TTF files. And, lo, it properly renders Eastern European characters. All set, yay!
Honestly. What a pain. I hope humanity loses alphabets forever and goes back to the stone age. That's the only sane way to leave all the multialphabet, multiencoding, multifont, multipain behind.
Talks, papers and documents by category
Blog posts by category