Reporting progress on the translation infrastructure
Some days ago, I blogged asking for pointers to get started with the translation of Made with Creative Commons. Thank you all for your pointers and ideas! To the people that answered via private mail, via IRC, via comments on the blog. We have made quite a bit of progress so far; I want to test some things before actually sending a call for help. What do we have?
- Git repository set up
- I had already set up a repository at GitLab; right now, the contents are far from useful, they merely document what I have done so far. I have started talking with my Costa Rican friend Leo Arias, who is also interested in putting some muscle behind this translation, and we are both the admins to this project.
- Talked with the authors
- Sarah is quite enthusiastic about us making this! I asked her to hold a little bit before officially announcing there is work ongoing... I want to get bits of infrastructure ironed out first. Important — Talking with her, she discussed the tools they used for authoring the book. It made me less of a purist :) Instead of starting from something "pristine", our master source will be the PDF export of the Google Docs document.
- Markdown conversion
- Given that translation tools work over the bits of plaintext, we want to work with the "plainest" rendition of the document, which is Markdown. I found that Pandoc does a very good approximation to what we need (that is, introduces very little "ugly" markup elements). Converting the ODT into Markdown is as easy as:
$ pandoc -f odt MadewithCreativeCommonsmostup-to-dateversion.odt -t markdown > MadewithCreativeCommonsmostup-to-dateversion.md
Of course, I want to fine-tune this as much as possible.
- Producing a translatable
- I have used Gettext to translate user interfaces; it is a tool very well crafted for that task. Translating a book is quite different: How and where does it break and join? How are paragraphs "strung" together into chapters, parts, a book? That's a task for PO 4 Anything (po4a). As simple as this:
po4a-gettextize -f text -m MadewithCreativeCommonsmostup-to-dateversion.md -p MadewithCreativeCommonsmostup-to-dateversion.po -M utf-8
I tested the resulting file with my good ol' trusty
poedit, and it works... Very nicely!
What is left to do?
- I made an account and asked for hosting at Weblate. I have not discussed this with Leo, so I hope he will agree ;-) Weblate is a Web-based infrastructure for collaborative text translation, provided by Debian's Michal Čihař. It integrates nicely with version control systems, preserves credit for each translated string (and I understand, but might be mistaken, that it understands the role of "editors", so that Leo and I will be able to do QA on the translation done by whoever joins us, trying to have a homogeneous-sounding result. I hope the project is approved for Weblate soon!
- Work on reconstructing the book. One thing is to deconstruct, find paragraphs, turn them into translatable strings... And a very different one is to build a book again from there! I have talked with some people to help me get this in shape. It is basically just configuring Pandoc — But as I have never done that, any help will be most, most welcome!
- Setting translation policies. What kind of language will we use? How will we refer to English names and terms? All that important stuff to give proper quality to our work
- Of course, the long work itself: Performing the translations ☺