Converting incoherent sets of data between charsets

Submitted by gwolf on Tue, 02/05/2008 - 04:33.

Dato complains that converting changelogs to be UTF8-clean is not always as simple as running iconv - One of the reasons that took me so long to migrate my blog is that, due to having migrated (at different points in time) my previous CMS (Jaws 0.4->0.5->0.7), its underlying database (MySQL 3.4 -> 4.0 -> 5.0 IIRC), the distribution (Debian Woody -> Sarge -> Etch+backports), several reinstalls and all... Well, I had a completely mixed-up database, with some tables in UTF, some tables in latin1, some tables with mixed rows, some tables that for some strange reason had double-mixed rows (that is, that had UTF8 misrepresented as latin1 and then re-encoded into UTF8)... No, it was not fun to sort out.

( categories: )

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <br> <b> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <img> <h1> <h2> <h3> <tt> <pre> <strike>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]".

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
  ____    ____    ____    _      __   __  _____ 
/ ___| | _ \ | _ \ | |__ \ \ / / | ____|
\___ \ | |_) | | |_) | | '_ \ \ V / | _|
___) | | _ < | __/ | | | | | | | |___
|____/ |_| \_\ |_| |_| |_| |_| |_____|
Enter the code depicted in ASCII art style.