Search

Search this site:

Reporting progress on the translation infrastructure

Some days ago, I blogged asking for pointers to get started with the translation of Made with Creative Commons. Thank you all for your pointers and ideas! To the people that answered via private mail, via IRC, via comments on the blog. We have made quite a bit of progress so far; I want to test some things before actually sending a call for help. What do we have?

Git repository set up
I had already set up a repository at GitLab; right now, the contents are far from useful, they merely document what I have done so far. I have started talking with my Costa Rican friend Leo Arias, who is also interested in putting some muscle behind this translation, and we are both the admins to this project.
Talked with the authors
Sarah is quite enthusiastic about us making this! I asked her to hold a little bit before officially announcing there is work ongoing... I want to get bits of infrastructure ironed out first. Important — Talking with her, she discussed the tools they used for authoring the book. It made me less of a purist :) Instead of starting from something "pristine", our master source will be the PDF export of the Google Docs document.
Markdown conversion
Given that translation tools work over the bits of plaintext, we want to work with the "plainest" rendition of the document, which is Markdown. I found that Pandoc does a very good approximation to what we need (that is, introduces very little "ugly" markup elements). Converting the ODT into Markdown is as easy as:
$ pandoc -f odt MadewithCreativeCommonsmostup-to-dateversion.odt -t markdown > MadewithCreativeCommonsmostup-to-dateversion.md
Of course, I want to fine-tune this as much as possible.
Producing a translatable .po file
I have used Gettext to translate user interfaces; it is a tool very well crafted for that task. Translating a book is quite different: How and where does it break and join? How are paragraphs "strung" together into chapters, parts, a book? That's a task for PO 4 Anything (po4a). As simple as this:
po4a-gettextize -f text -m MadewithCreativeCommonsmostup-to-dateversion.md -p MadewithCreativeCommonsmostup-to-dateversion.po -M utf-8
I tested the resulting file with my good ol' trusty poedit, and it works... Very nicely!

What is left to do?