Encodings, fonts, PDF and pain
I volunteered to work on producing the DebConf nametags, and worked on it closely with César (cek) for most of the afternoon. The process clearly shows no database is comprehensive enough to base DebConf on it - All in all, we managed a very good advance percentage, integrating the data on who sleeps where, how each person eats, and so on. And slightly before lunchtime, we had the final listing. Joy!
…Until I asked César what script would we use for turning the data into nice, printable nametags. He plainly replied, «none that I know of». Ok, so on to produce the layout. The first idea was, as it has been done at other DebConfs, to make a LaTeX layout - but both our LaTeX-fu is heavily limited. Well, to the hell with it, that’s why I recently packaged and uploaded Prawn, which is currently sitting in the NEW queue — Fast, Nimble PDF Generation For Ruby. Prawn has two main characteristics which are making me migrate some systems away from PDF::Writer into it: <ul><li>Proper UTF8 support</li><li>Cool, easy relative positioning in bounding boxes</li></ul>
So, yes, populating the page with the ten nametags each will take is quite simple:
pdf = Prawn::Document.new(:page_size => 'A4')
pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
while !people.empty?
5.times do |row|
2.times do |col|
person = people.delete(people.keys.sort.first)
next if person.nil?
pdf.bounding_box([pdf.bounds.left + (col) * 8.6.cm,
pdf.bounds.top - (row) * 5.4.cm],
:width => 8.6.cm, :height => 5.4.cm) do
generate_nametag_for(person, pdf)
end
end
end
end
Yay, nice, isn’t it? Of course, inside generate_nametag_for() we have all the needed magic to position the text, resize the images and so on. All in all, a cute and nice library, even with Ruby’s often strangely idiosincratic culture.
Until we started checking for correctness. First, we hit Eddy Petrişor — The ş was showing as an unknown character. Of course, even though Prawn correctly understands UTF8, the built-in font does not handle Eastern European alphabets. No worries, pdf.font “/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf” got us out of the predicament. But then, Andrew Lee (李健秋) appeared as a second case of undisplayable characters. And yes, Andrew has all the right in the world to expect his name, a proper UTF8 encoded string, to appear on the nametag!
…The problem is that all the fonts we could find that work for CJK fail for non-US-ASCII Latin characters. Isn’t Unicode supposed to solve this? Yes, fonts need to properly implement the correct encoding… Jonas explained that Prawn (as well as any libraries dealing with fonts) should really use Fontconfig so multiple fonts can be specified, falling back in case some codepoints are not specified in them. But Prawn does not support Fontconfig.
To make matters worse, most Asian fonts (the Arphic family) are now shipped as TrueType Collections (TTC) instead of TrueType Fonts (TTF), in order to save space due to the tremendous similarity they have. And, you guessed it, Prawn does not yet understand TTCs (or I couldn’t find how to).
All sorts of ideas were brought up. After playing a bit trying to change the font being used when detecting Asian encoding (and failing at it), I threw up my hands and decided I’d just change the font whenever the name contained the “Andrew Lee” string. Dirty and ugly idea, yes, but would work.
Just as I was about to do it, Andrew came back jumping with joy — He gave me an Arphic font which contains TTF files. And, lo, it properly renders Eastern European characters. All set, yay!
Honestly. What a pain. I hope humanity loses alphabets forever and goes back to the stone age. That’s the only sane way to leave all the multialphabet, multiencoding, multifont, multipain behind.
Comments
Cek 2009-07-19 09:46:42
Clean and clear ;)
It was a very clean and clear way of generating the PDFs, RoR r00lez!
gwolf 2009-07-19 14:51:00
Never!
I’ll never be glad you didn’t come. I’d always be able to special-case you!
gwolf 2009-07-19 14:52:00
Note that this was not RoR
…It is just Plain Ol’ Ruby :)
gwolf 2009-08-11 21:49:51
Not exactly right
…I felt quite strange/dirty when writing a while in Ruby. At the time, my brain was toasted, so I didn’t look at it again. Still, this change would have the effect of printing 10 nametags for each name — A little bit too much. That’s the reason I just checked for the loop to still have elements, and went shifting (with delete) from it. Of course, insteead of having a while for people it could be for the columns/rows position, or I could just use modulo arithmetics (i.e. to set position to 1 at the beginning and after each nametag increment by one, using modulo 2 to determine the column and modulo 5 to determine the row). I didn’t think about it earlier on ;-) But if you get me into thinking about it, proof is there is a better way :)
gwolf 2009-08-12 19:52:56
Wow, divmod is kewl!
I had never before used divmod. It is perfect for this task! Well, almost — i > 10 will yield rows over 5, which is not useful. So, I came up with this one — The diff is longer than the code, so I’ll just post the code ;-)
pdf = Prawn::Document.new(:page_size => 'A4')
pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
i = 0
people.sort.each_with_index do |key, person|
next if person.nil?
pdf.bounding_box([pdf.bounds.left + (i%2) * 8.6.cm,
pdf.bounds.top - (i%5) * 5.4.cm],
:width => 8.6.cm, :height => 5.4.cm) do
generate_nametag_for(person, pdf)
i += 1
pdf.start_new_page if i % 10 == 0
end
end
(BTW, in this blog you can say <code lang=”ruby>(…)</code> and it will be nicely colored and all. Or lang=”diff” in your case).
I didn’t test it, but it should do. Ah, and why I am still using an explicitly handled iterator instead of each_with_index? Because the next if person.nil? — Several records in the input were void, so I just skipped them.
Oh, and regarding the new page: Well, I half-copied, half-made-up the code for the original posting. I did miss the pdf.start_new_page oustide the dirty disgusting double-nested cycle, it is done now via a nicer modulo.
Any further comments, if at all for academic purposes? :-}
gwolf 2009-08-13 16:08:59
Ok, lets keep disecting this bugger…
Regarding your (0..7) map — Precisely, that is what I intended to do. Each nametag appears in its col×row position in the page, where 0≤col≤1 and 0≤row≤4 (for a total of 10 possible locations).
As for the reject: I agree, that would perfectly do. It is even clearer if we select instead. So, merging, I think we can reach finally to:
pdf = Prawn::Document.new(:page_size => 'A4')
pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf'
people.select {|k,p| p}.sort.each_with_index do |(key, person),i|
pdf.bounding_box([pdf.bounds.left + (i%2) * 8.6.cm,
pdf.bounds.top - (i%5) * 5.4.cm],
:width => 8.6.cm, :height => 5.4.cm) do
generate_nametag_for(person, pdf)
pdf.start_new_page if i % 10 == 0
end
end
Agree?
gwolf 2009-08-15 19:44:08
Oh, the order of the nametags
Yes, I keep mistaking the generic problem to solve to the specific problem I had to solve ;-)
In this case, having them pseudo-sorted was more than enough — Each page is cut in 10, so it is enough to know more or less where each nametag is. As they are cut and gathered by hand, any specific ordering is lost anyway.
Bah… I would even be glad if it were that easy — At DebConf, nametags got all messed up, so they were 100% unsorted at some point in time. Not fun, believe me.
James Healy 2009-07-19 09:00:23
Thanks
Thanks for the constructive feedback.
It would be awesome if we found time to add support for more font formats and fontconfig. In the short term, it seems unlikely, but one can hope!
James 2009-07-19 14:17:00
The state of text rendering
Behdad wrote an interesting paper http://behdad.org/text/ on the state of text rendering at the moment, and where it’s going in the future, albeit with a focus on GUI apps.
Shot (Piotr Szotkowski) 2009-08-11 08:26:07
Ruby nitpick
Apologies for nitpicking, but I really love Ruby (and especially its iterators) – and so propose a humble patch to the above, assuming people is a Hash:
@@ -1,10 +1,8 @@ pdf = Prawn::Document.new(:page_size => 'A4') pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf' -while !people.empty? +people.sort.each do |key, person| 5.times do |row| 2.times do |col| - person = people.delete(people.keys.sort.first) - next if person.nil? pdf.bounding_box([pdf.bounds.left + (col) * 8.6.cm, pdf.bounds.top - (row) * 5.4.cm], :width => 8.6.cm, :height => 5.4.cm) do
Shot (Piotr Szotkowski) 2009-08-11 23:11:49
Argh. :)
Argh, right, stupid me. That’s what you get for nitpicking after a hard day’s work in PHP, the one you do on your last GSoC day. :)
What would you say to the below? I’m not sure how Prawn does pagination, but from your snippet it looks like it’s auto-handled anyway.
@@ -1,15 +1,10 @@ pdf = Prawn::Document.new(:page_size => 'A4') pdf.font '/usr/share/fonts/truetype/arphic/uming.ttf' -while !people.empty? - 5.times do |row| - 2.times do |col| - person = people.delete(people.keys.sort.first) - next if person.nil? +people.sort.each_with_index do |(key, person), i| + row, col = i.divmod 2 pdf.bounding_box([pdf.bounds.left + (col) * 8.6.cm, pdf.bounds.top - (row) * 5.4.cm], :width => 8.6.cm, :height => 5.4.cm) do generate_nametag_for(person, pdf) end - end - end end
Shot (Piotr Szotkowski) 2009-08-12 05:23:14
Argh, you probably want to
Argh, you probably want to reset row to 0 for every page; a ‘row /= 5’ placed after the divmod call should do that.
Also, if you happen to use ActiveRecord, there’s an in_groups_of method that could be useful here.
Shot (Piotr Szotkowski) 2009-08-13 01:30:00
Academic purposes are the
Academic purposes are the best, of course. :)
First, i % 5
coupled with i % 2
(which were my first guesses as well) won’t work:
(0..7).map { |i| [i, i % 2, i % 5] }
=> [[0, 0, 0], [1, 1, 1], [2, 0, 2], [3, 1, 3], [4, 0, 4], [5, 1, 0], [6, 0, 1], [7, 1, 2]]
</code>
Second, it would make more sense to me if nil
people were pre-deleted (with people.delete_if { |k,p| p.nil? }
) or reject
ed on-the-fly – hence another attempt, with a clearer logic:
people.reject{|k,p| p.nil?}.sort.each_with_index do |(key, person), i|
row, col = i.divmod 2
row /= 5
pdf.bounding_box(…) do
generate_nametag_for person, pdf
end
pdf.start_new_page if row == 4 and col == 1
end
Shot (Piotr Szotkowski) 2009-08-14 06:09:59
‘Regarding your (0..7) map —
‘Regarding your (0..7) map — Precisely, that is what I intended to do. Each nametag appears in its col×row position in the page, where 0≤col≤1 and 0≤row≤4 (for a total of 10 possible locations)’ – that’s exactly what I assumed, but I did assume you want to have them in order (why sort otherwise?).
I.e., I assumed you want to have (col 0, row 0), (col 1, row 0), (col 0, row 1), (col 1, row 1) sequence, not – check my example – (col 0, row 0), (col 1, row 1), (col 0, row 2), (col 1, row 3) yielded by your approach…
Hence my row, col = i.divmod 2; row /= 5
approach instead of yours (implied) row, col = i % 5, i % 2
.
yosch 2009-07-20 03:55:44
the debian weekly font review
Hi Gunnar,
Alphabets are quite a challenge :-) and no single open font can cater to the needs of an international community…
You may find it useful to dig into the weekly review we run in the pkg-fonts team (Debian Fonts Task Force): http://pkg-fonts.alioth.debian.org/review/
Especially the part about the Unicode coverage for each font currently in the archive.
Also for testing the availability of a glyph in a particular font there’s http://davyd.livejournal.com/234390.html Quite handy.
cheers.
જલધર 2009-07-19 08:57:17
Arphic doesn’t correctly
Arphic doesn’t correctly support Indic scripts. Aren’t you glad I didn’t make it to debconf this year :-)