Many content management systems and forums were originally created using the latin1 character set and the latin1_swedish_ci collation in MySQL. The problem many of these systems are facing today is the growing demand for multi-language content using special characters that cannot be accurately represented in the latin1 character set. This is where utf8 comes in.
The problem with simply converting a database from latin1 to utf8 is in the data itself. When you convert a database or table to a different character set or collation, it does not convert the content held within the tables. What happens is users end up with ‘strange’ characters in their data.
You can prevent this by taking steps to protect the data before you convert the database or tables.
An overview of the steps:
- Backup the database. No, really, do it.
- Alter the string field types to their binary equivalents in all tables. This will protect the data during conversion.
• VARCHAR to VARBINARY
• CHAR to BINARY
• LONGTEXT to LONGBLOB
• MEDIUMTEXT to MEDIUMBLOB
• TEXT to BLOB
• TINYTEXT to TINYBLOB
- Convert the database and tables from latin1 to utf8 character set and latin1_swedish_ci to utf8_general_ci collation.
- Convert the binary field types back to their original string field types.
- Backup the resulting database.
Ex. Generic convert column to BLOB from TEXT
Ex. Generic convert table to new character set
Ex. Generic convert database to new character set utf8