Reporting progress on the translation infrastructure

Submitted by gwolf on Mon, 06/12/2017 - 23:28

Some days ago, I blogged asking for pointers to get started with the translation of Made with Creative Commons. Thank you all for your pointers and ideas! To the people that answered via private mail, via IRC, via comments on the blog. We have made quite a bit of progress so far; I want to test some things before actually sending a call for help. What do we have?

Git repository set up
I had already set up a repository at GitLab; right now, the contents are far from useful, they merely document what I have done so far. I have started talking with my Costa Rican friend Leo Arias, who is also interested in putting some muscle behind this translation, and we are both the admins to this project.
Talked with the authors
Sarah is quite enthusiastic about us making this! I asked her to hold a little bit before officially announcing there is work ongoing... I want to get bits of infrastructure ironed out first. Important — Talking with her, she discussed the tools they used for authoring the book. It made me less of a purist :) Instead of starting from something "pristine", our master source will be the PDF export of the Google Docs document.
Markdown conversion
Given that translation tools work over the bits of plaintext, we want to work with the "plainest" rendition of the document, which is Markdown. I found that Pandoc does a very good approximation to what we need (that is, introduces very little "ugly" markup elements). Converting the ODT into Markdown is as easy as:
$ pandoc -f odt MadewithCreativeCommonsmostup-to-dateversion.odt -t markdown > MadewithCreativeCommonsmostup-to-dateversion.md
Of course, I want to fine-tune this as much as possible.
Producing a translatable .po file
I have used Gettext to translate user interfaces; it is a tool very well crafted for that task. Translating a book is quite different: How and where does it break and join? How are paragraphs "strung" together into chapters, parts, a book? That's a task for PO 4 Anything (po4a). As simple as this:
po4a-gettextize -f text -m MadewithCreativeCommonsmostup-to-dateversion.md -p MadewithCreativeCommonsmostup-to-dateversion.po -M utf-8
I tested the resulting file with my good ol' trusty poedit, and it works... Very nicely!

What is left to do?

  • I made an account and asked for hosting at Weblate. I have not discussed this with Leo, so I hope he will agree ;-) Weblate is a Web-based infrastructure for collaborative text translation, provided by Debian's Michal Čihař. It integrates nicely with version control systems, preserves credit for each translated string (and I understand, but might be mistaken, that it understands the role of "editors", so that Leo and I will be able to do QA on the translation done by whoever joins us, trying to have a homogeneous-sounding result. I hope the project is approved for Weblate soon!
  • Work on reconstructing the book. One thing is to deconstruct, find paragraphs, turn them into translatable strings... And a very different one is to build a book again from there! I have talked with some people to help me get this in shape. It is basically just configuring Pandoc — But as I have never done that, any help will be most, most welcome!
  • Setting translation policies. What kind of language will we use? How will we refer to English names and terms? All that important stuff to give proper quality to our work
  • Of course, the long work itself: Performing the translations ☺

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <br> <b> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <img> <h1> <h2> <h3> <tt> <pre> <strike> <table> <tr> <th> <td>
  • Lines and paragraphs break automatically.
  • Use <bib>citekey</bib> or [bib]citekey[/bib] to insert automatically numbered references.
  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. The supported tag styles are: <foo>, [foo].

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Keep in mind that all comments will also have to be administrator-moderated. Don't waste your time writing a spam that no one will read.
r
d
G
f
M
j
Enter the code without spaces and pay attention to upper/lower case.