Upgrade utf8 content from a latin1 store to a UTF8 store

March 02, 2012

When upgrading to ruby 1.9.3 I noticed that the behaviour of interpreting characters from mysql tables has changes. This is due to the character encoding behaviour differing from 1.8.7.

The symptoms being strange characters appear in your content fetched from the db.

Often in ubuntu mysql server installs the default is latin1 when rails stored everything in utf8. In some cases (like mine) you end up having utf8 data stored in a latin1 table.

Dirty conversion script follows.

  DBFROM=site_production
  DBTO=new_site_production
  LOGIN=site
  PASS=XXX
  mysqldump  --extended-insert=FALSE --default-character-set=latin1  -u $LOGIN -p$PASS $DBFROM >site.sql
  cat site.sql |sed  -e 's/DEFAULT CHARSET=latin1;/DEFAULT CHARSET=utf8 COLLATE utf8_bin;/'>site2.sql
  cat site2.sql |sed  -e 's/SET NAMES latin1/SET NAMES utf8/'>site3.sql
  echo " drop database $DBTO; create database $DBTO character set utf8 collate utf8_bin;"|mysql -u $LOGIN -p$PASS
  mysql -u $LOGIN -p$PASS $DBTO <site3.sql