Mysql REPLACE() issue with Latin1 multi-byte character

Problem:

I’ve a database with server collation of utf8 / utf8_bin. DB have arabic utf8 text with accent chars (zabar / kasar etc). I want to get text without accent chars.

For e.g. SELECT replace(field1,0x[CODE],”) — where [CODE] is the accent char.

REPLACE() is working fine when i write static text in place of field1. But it does not replace when it run at database field.

Solution:

After reading an article, i come to know there are 2 (client & server) encodings used with mysql. My server encoding (table charset/collation) was utf8, but client’s encoding was latin1 and the time of INSERT.

So data stored in DB is somewhat mixed, due to which REPLACE() was not working as expected.

FIX:

1. i fetched the data using php,
2. applied bin2hex() function on data,
3. set client’s encoding to utf8 (set names utf8) and
4. updated same record using mysql’s UNHEX function.

Here is sample code …

$q=mysql_query("SELECT field1 FROM table1 where id = 4");
$rs = mysql_fetch_assoc($q);
$v = bin2hex($rs["field1"]);
mysql_query("SET NAMES UTF8");
mysql_query("UPDATE table1 SET field1=UNHEX('$v') WHERE id = 4");

This fixed my client’s encoding to utf8 that matches with server encoding, now REPLACE() is removing that accent hex code and working as expected .

Following article really helped …
http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html