Unicode, PHP and MySQL
This weekend shodan.co.za switched hosts which of course meant that some migration had to take place. There is a quite a platform difference between the hosts, the previous one being Windows/IIS and the next FreeBSD/Apache. Everything went rather smoothly save for one issue- due to the fact that I have a few posts and comments with Japanese characters I need UTF8 Unicode support on the MySQL back-end and the PHP scripts that connect to it. I made sure the configuration was identical between hosts were possible, but I still got garbage characters or plain question marks where the special characters were expected. The MySQL character set and connection collation is set to UTF-8 on the server. It seems a minor hack to the PHP scripts, one of the files of WordPress (2.0.2) in this case, fixed it sufficiently as far as I can tell. I added mysql_query("SET NAMES 'utf8'") in wp-db.php to the database constructor:
//==================================================================
// DB Constructor - connects to the server and selects a database
function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
$this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
mysql_query(”SET NAMES ‘utf8′”); // UTF 8 support
if (!$this->dbh) {
$this->bail(”Error text if connection attempt failed”);
}
$this->select($dbname);
}
SET NAMES indicates what character set the client will use to send SQL statements to the server. Thus, SET NAMES 'utf8' tells the server “future incoming messages from this client are in character set utf8.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.) Keep in mind that this might be replaced every time WordPress is upgraded, until the developers fix it.
If you still don’t get non-ASCII characters despite this, it might because the connection character set is still standard latin1. After a connection is established (with host, name, password), use the following two lines in your application:
SET NAMES utf8;
SET CHARACTER_SET utf8;
If your wondering “Hang on, I still see a lot of question marks” it could be because
• Your browser doesn’t support Unicode
• You browser is not using UTF-8 encoding to view the data
• You are not using fonts that supports Unicode, particularly Japanese.
See Also:
MySQL Connection Character Sets and Collations
Refer to these links if you want enable internationalization on your browser
Character sets and codepages
I18n - Browsers and fonts
Unicode fonts for Windows computers
Setting up Microsoft Windows NT, 2000 or Windows XP to support Unicode supplementary characters
Setting up Windows Internet Explorer 5, 5.5 and 6 for Multilingual and Unicode Support
Encoding Support










