Encodings, fonts, PDF and pain

Submitted by gwolf on Sun, 07/19/2009 - 10:37

I volunteered to work on producing the DebConf nametags, and worked on it closely with César (cek) for most of the afternoon. The process clearly shows no database is comprehensive enough to base DebConf on it - All in all, we managed a very good advance percentage, integrating the data on who sleeps where, how each person eats, and so on. And slightly before lunchtime, we had the final listing. Joy!

…Until I asked César what script would we use for turning the data into nice, printable nametags. He plainly replied, «none that I know of». Ok, so on to produce the layout. The first idea was, as it has been done at other DebConfs, to make a LaTeX layout - but both our LaTeX-fu is heavily limited. Well, to the hell with it, that's why I recently packaged and uploaded Prawn, which is currently sitting in the NEW queueFast, Nimble PDF Generation For Ruby. Prawn has two main characteristics which are making me migrate some systems away from PDF::Writer into it:

  • Proper UTF8 support
  • Cool, easy relative positioning in bounding boxes

So, yes, populating the page with the ten nametags each will take is quite simple:

  1. pdf = Prawn::Document.new(:page_size => 'A4')
  2. pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
  3. while !people.empty?
  4. 5.times do |row|
  5. 2.times do |col|
  6. person = people.delete(people.keys.sort.first)
  7. next if person.nil?
  8. pdf.bounding_box([pdf.bounds.left + (col) * 8.6.cm,
  9. pdf.bounds.top - (row) * 5.4.cm],
  10. :width => 8.6.cm, :height => 5.4.cm) do
  11. generate_nametag_for(person, pdf)
  12. end
  13. end
  14. end
  15. end

Yay, nice, isn't it? Of course, inside generate_nametag_for() we have all the needed magic to position the text, resize the images and so on. All in all, a cute and nice library, even with Ruby's often strangely idiosincratic culture.

Until we started checking for correctness. First, we hit Eddy Petrişor — The ş was showing as an unknown character. Of course, even though Prawn correctly understands UTF8, the built-in font does not handle Eastern European alphabets. No worries, pdf.font "/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf" got us out of the predicament. But then, Andrew Lee (李健秋) appeared as a second case of undisplayable characters. And yes, Andrew has all the right in the world to expect his name, a proper UTF8 encoded string, to appear on the nametag!

…The problem is that all the fonts we could find that work for CJK fail for non-US-ASCII Latin characters. Isn't Unicode supposed to solve this? Yes, fonts need to properly implement the correct encoding… Jonas explained that Prawn (as well as any libraries dealing with fonts) should really use Fontconfig so multiple fonts can be specified, falling back in case some codepoints are not specified in them. But Prawn does not support Fontconfig.

To make matters worse, most Asian fonts (the Arphic family) are now shipped as TrueType Collections (TTC) instead of TrueType Fonts (TTF), in order to save space due to the tremendous similarity they have. And, you guessed it, Prawn does not yet understand TTCs (or I couldn't find how to).

All sorts of ideas were brought up. After playing a bit trying to change the font being used when detecting Asian encoding (and failing at it), I threw up my hands and decided I'd just change the font whenever the name contained the "Andrew Lee" string. Dirty and ugly idea, yes, but would work.

Just as I was about to do it, Andrew came back jumping with joy — He gave me an Arphic font which contains TTF files. And, lo, it properly renders Eastern European characters. All set, yay!

Honestly. What a pain. I hope humanity loses alphabets forever and goes back to the stone age. That's the only sane way to leave all the multialphabet, multiencoding, multifont, multipain behind.

જલધર's picture

Arphic doesn't correctly

Arphic doesn't correctly support Indic scripts. Aren't you glad I didn't make it to debconf this year :-)

gwolf's picture

Never!

I'll never be glad you didn't come. I'd always be able to special-case you!

James Healy's picture

Thanks

Thanks for the constructive feedback.

It would be *awesome* if we found time to add support for more font formats and fontconfig. In the short term, it seems unlikely, but one can hope!

Cek's picture

Clean and clear ;)

It was a very clean and clear way of generating the PDFs, RoR r00lez!

gwolf's picture

Note that this was not RoR

...It is just Plain Ol' Ruby :)

James's picture

The state of text rendering

Behdad wrote an interesting paper http://behdad.org/text/ on the state of text rendering at the moment, and where it's going in the future, albeit with a focus on GUI apps.

yosch's picture

the debian weekly font review

Hi Gunnar,

Alphabets are quite a challenge :-) and no single open font can cater to the needs of an international community...

You may find it useful to dig into the weekly review we run in the pkg-fonts team (Debian Fonts Task Force):
http://pkg-fonts.alioth.debian.org/review/

Especially the part about the Unicode coverage for each font currently in the archive.

Also for testing the availability of a glyph in a particular font there's http://davyd.livejournal.com/234390.html Quite handy.

cheers.

Shot (Piotr Szotkowski)'s picture

Ruby nitpick

Apologies for nitpicking, but I really love Ruby (and especially its iterators) – and so propose a humble patch to the above, assuming people is a Hash:

@@ -1,10 +1,8 @@
 pdf = Prawn::Document.new(:page_size => 'A4')
 pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
-while !people.empty?
+people.sort.each do |key, person|
   5.times do |row|
     2.times do |col|
-      person = people.delete(people.keys.sort.first)
-      next if person.nil?
       pdf.bounding_box([pdf.bounds.left + (col) * 8.6.cm,
                        pdf.bounds.top - (row) * 5.4.cm],
                        :width => 8.6.cm, :height => 5.4.cm) do
gwolf's picture

Not exactly right

…I felt quite strange/dirty when writing a while in Ruby. At the time, my brain was toasted, so I didn't look at it again. Still, this change would have the effect of printing 10 nametags for each name — A little bit too much. That's the reason I just checked for the loop to still have elements, and went shifting (with delete) from it.
Of course, insteead of having a while for people it could be for the columns/rows position, or I could just use modulo arithmetics (i.e. to set position to 1 at the beginning and after each nametag increment by one, using modulo 2 to determine the column and modulo 5 to determine the row). I didn't think about it earlier on ;-) But if you get me into thinking about it, proof is there is a better way :)

Shot (Piotr Szotkowski)'s picture

Argh. :)

Argh, right, stupid me. That’s what you get for nitpicking after a hard day’s work in PHP, the one you do on your last GSoC day. :)

What would you say to the below? I’m not sure how Prawn does pagination, but from your snippet it looks like it’s auto-handled anyway.

@@ -1,15 +1,10 @@
 pdf = Prawn::Document.new(:page_size => 'A4')
 pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
-while !people.empty?
-  5.times do |row|
-    2.times do |col|
-      person = people.delete(people.keys.sort.first)
-      next if person.nil?
+people.sort.each_with_index do |(key, person), i|
+  row, col = i.divmod 2
       pdf.bounding_box([pdf.bounds.left + (col) * 8.6.cm,
                        pdf.bounds.top - (row) * 5.4.cm],
                        :width => 8.6.cm, :height => 5.4.cm) do
         generate_nametag_for(person, pdf)
                        end
-    end
-  end
 end
gwolf's picture

Wow, divmod is <em>kewl</em>!

I had never before used divmod. It is perfect for this task! Well, almost — i > 10 will yield rows over 5, which is not useful. So, I came up with this one — The diff is longer than the code, so I'll just post the code ;-)

  1. pdf = Prawn::Document.new(:page_size => 'A4')
  2. pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
  3. i = 0
  4. people.sort.each_with_index do |key, person|
  5. next if person.nil?
  6. pdf.bounding_box([pdf.bounds.left + (i%2) * 8.6.cm,
  7. pdf.bounds.top - (i%5) * 5.4.cm],
  8. :width => 8.6.cm, :height => 5.4.cm) do
  9. generate_nametag_for(person, pdf)
  10. i += 1
  11. pdf.start_new_page if i % 10 == 0
  12. end
  13. end

(BTW, in this blog you can say <code lang="ruby>(...)</code> and it will be nicely colored and all. Or lang="diff" in your case).

I didn't test it, but it should do. Ah, and why I am still using an explicitly handled iterator instead of each_with_index? Because the next if person.nil? — Several records in the input were void, so I just skipped them.

Oh, and regarding the new page: Well, I half-copied, half-made-up the code for the original posting. I did miss the pdf.start_new_page oustide the dirty disgusting double-nested cycle, it is done now via a nicer modulo.

Any further comments, if at all for academic purposes? :-}

Shot (Piotr Szotkowski)'s picture

Academic purposes are the

Academic purposes are the best, of course. :)

First, i % 5 coupled with i % 2 (which were my first guesses as well) won’t work:

  1. (0..7).map { |i| [i, i % 2, i % 5] }
  2. # => [[0, 0, 0], [1, 1, 1], [2, 0, 2], [3, 1, 3], [4, 0, 4], [5, 1, 0], [6, 0, 1], [7, 1, 2]]

Second, it would make more sense to me if nil people were pre-deleted (with people.delete_if { |k,p| p.nil? }) or rejected on-the-fly – hence another attempt, with a clearer logic:

  1. people.reject{|k,p| p.nil?}.sort.each_with_index do |(key, person), i|
  2. row, col = i.divmod 2
  3. row /= 5
  4. pdf.bounding_box() do
  5. generate_nametag_for person, pdf
  6. end
  7. pdf.start_new_page if row == 4 and col == 1
  8. end

gwolf's picture

Ok, lets keep disecting this bugger…

Regarding your (0..7) map — Precisely, that is what I intended to do. Each nametag appears in its col×row position in the page, where 0≤col≤1 and 0≤row≤4 (for a total of 10 possible locations).
As for the reject: I agree, that would perfectly do. It is even clearer if we select instead. So, merging, I think we can reach finally to:

  1. pdf = Prawn::Document.new(:page_size => 'A4')
  2. pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
  3. people.select {|k,p| p}.sort.each_with_index do |(key, person),i|
  4. pdf.bounding_box([pdf.bounds.left + (i%2) * 8.6.cm,
  5. pdf.bounds.top - (i%5) * 5.4.cm],
  6. :width => 8.6.cm, :height => 5.4.cm) do
  7. generate_nametag_for(person, pdf)
  8. pdf.start_new_page if i % 10 == 0
  9. end
  10. end

Agree?

Shot (Piotr Szotkowski)'s picture

‘Regarding your (0..7) map —

‘Regarding your (0..7) map — Precisely, that is what I intended to do. Each nametag appears in its col×row position in the page, where 0≤col≤1 and 0≤row≤4 (for a total of 10 possible locations)’ – that’s exactly what I assumed, but I did assume you want to have them in order (why sort otherwise?).

I.e., I assumed you want to have (col 0, row 0), (col 1, row 0), (col 0, row 1), (col 1, row 1) sequence, not – check my example – (col 0, row 0), (col 1, row 1), (col 0, row 2), (col 1, row 3) yielded by your approach…

Hence my row, col = i.divmod 2; row /= 5 approach instead of yours (implied) row, col = i % 5, i % 2.

gwolf's picture

Oh, the order of the nametags

Yes, I keep mistaking the generic problem to solve to the specific problem I had to solve ;-)

In this case, having them pseudo-sorted was more than enough — Each page is cut in 10, so it is enough to know more or less where each nametag is. As they are cut and gathered by hand, any specific ordering is lost anyway.

Bah... I would even be glad if it were that easy — At DebConf, nametags got all messed up, so they were 100% unsorted at some point in time. Not fun, believe me.

Shot (Piotr Szotkowski)'s picture

Argh, you probably want to

Argh, you probably want to reset row to 0 for every page; a ‘row /= 5’ placed after the divmod call should do that.

Also, if you happen to use ActiveRecord, there’s an in_groups_of method that could be useful here.