Optimizing Performance of Legacy PHP, MySQL Application

When your webpage load time increases randomly & drastically, there is always a need to benchmark the performance for high scalable websites. The challenge i faced was, of server page generation time, or your can say time to first byte. My application was using servers like Memcache, Sphinx, Mysql and Apache. Now i went to dig down the issues and thanks to Syed Alam & Muhammad Wasif, i discovered a really useful server app, NewRelic. This application have wonderful features but definitely it has some cost too, but i think it worth if you can manage it.

So, for those who want to run their free manual solutions, immediate solution would be to check MYSQL query timings. If you are using some database wrapper function or class, it would be very easy to just track time before and after mysql_query() function (using timer library). It will tell the insights where you are actually losing.

But if are working on tuning legacy code base, in which there us no mysql_query wrapper used, it a big cost to wrap it or apply benchmarks all over the place. This was the challenge i faced recently while working on a project. I searched web for some options and found that there are 2 libraries available, that can override core PHP functions with your own implementations.

Runkit’s runkit_function_redefine()
APD’s overide_function()

My experience with both of them was not really very good, as although they override mysql_query() function with my custom function, but gave segmentation fault and blank page randomly. Definitely not suitable for production. But if you can diagnose issues on local environment, you can take this risk for dev machine.

Solution:

Finally, thanks to Zeeshan Shah, i found another Clone of Runkit, that actually fixed those segmentation fault issues, and much stable than the PHP.net PECL version.

It really gave insights of how many queries are on my page and not performing well on local environment. I won’t recommend to use it on live as it may also have some glitches but one can try on dev machine. This can be integrated with PHP Quick Profiler for more presentable results.

Following is the sample code to play with …

<?php

// Custom implementation of mysql_query
function custom_mysql_query($query, $link = null)
{
// put your timer benchmarking custom code

core_mysql_query($query, $link);
}

// Make a copy of actual mysql_query function
runkit_function_copy(‘mysql_query’,’core_mysql_query’);

// Now redefine base function with custom_mysql_query()

runkit_function_redefine(‘mysql_query’,’$query, $link_identifier = null’,’return custom_mysql_query($query, $link_identifier);’);

Mysql Fulltext Secret Bug … I mean feature.

I was recently doing a full text search in which "eight" keyword was used, and it didn’t showed any single result. It was working for all other cases.
Finally after little google searches … i visited this document.

http://dev.mysql.com/tech-resources/articles/full-text-revealed.html#stopwords

It says ….

Suppose you want to search for a text that contains "you want to".

 SELECT ... FROM table1
WHERE MATCH(textcolumn) AGAINST ('"you want to"' IN BOOLEAN MODE)

This is awful, for two reasons:

  • "You" and "want" and "to" are all stopwords if you’re using the default stopword list (see the list of stopwords at the end of this article). In fact this particular query always fails because all words are stopwords.
  • Even if they weren’t stopwords, there is nothing in a key that tells you what the relative position of the words is within the text (see the index file key structure at the start of this article).

So MySQL can’t get the answer just by looking in the index. The method has to be: first try to filter out the candidate rows by looking for whatever words aren’t stopwords, then examine each candidate row in the data file and make sure the words are there in the correct order. This means exact phrase searching can be a little slower than ordinary searching. On the other hand, MySQL won’t find "Sam was tall" if you search for "Sam grew tall". (If "was" and "grew" are both stopwords, there’s another DBMS that would find "Sam was tall". We think that looks like a bug.)

And … "eight" is also in this stop word list …. (i guess … for no reason)

My Solution:

1. I replaced db value to "_eight" from "eight" …

2. and before doing match against … i replaced "eight" with "_eight".

Mysql REPLACE() issue with Latin1 multi-byte character

Problem:

I’ve a database with server collation of utf8 / utf8_bin. DB have arabic utf8 text with accent chars (zabar / kasar etc). I want to get text without accent chars.

For e.g. SELECT replace(field1,0x[CODE],”) — where [CODE] is the accent char.

REPLACE() is working fine when i write static text in place of field1. But it does not replace when it run at database field.

Solution:

After reading an article, i come to know there are 2 (client & server) encodings used with mysql. My server encoding (table charset/collation) was utf8, but client’s encoding was latin1 and the time of INSERT.

So data stored in DB is somewhat mixed, due to which REPLACE() was not working as expected.

FIX:

1. i fetched the data using php,
2. applied bin2hex() function on data,
3. set client’s encoding to utf8 (set names utf8) and
4. updated same record using mysql’s UNHEX function.

Here is sample code …

$q=mysql_query("SELECT field1 FROM table1 where id = 4");
$rs = mysql_fetch_assoc($q);
$v = bin2hex($rs["field1"]);
mysql_query("SET NAMES UTF8");
mysql_query("UPDATE table1 SET field1=UNHEX('$v') WHERE id = 4");

This fixed my client’s encoding to utf8 that matches with server encoding, now REPLACE() is removing that accent hex code and working as expected .

Following article really helped …
http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

Mysql latin1 client with utf8 server encoding issue

Problem:

I’ve a database with server collation of utf8 / utf8_bin. DB have arabic utf8 text with accent chars (zabar / kasar etc). I want to get text without accent chars.

For e.g. SELECT replace(field1,0x[CODE],”) — where [CODE] is the accent char.

REPLACE() is working fine when i write static text in place of field1. But it does not replace when it run at database field.

Solution:

After reading an article, i come to know there are 2 (client & server) encodings used with mysql. My server encoding (table charset/collation) was utf8, but client’s encoding was latin1 and the time of INSERT.

So data stored in DB is somewhat mixed, due to which REPLACE() was not working as expected.

FIX:

  1. i fetched the data using php,
  2. applied bin2hex() function on data,
  3. set client’s encoding to utf8 (set names utf8) and
  4. updated same record.

This fixed my client’s encoding to utf8 that matches with server encoding, now REPLACE() is removing that accent hex code and working as expected .

Following article really helped …
http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

Distributed searching with SPHINX

When we performing full-text searching through a number of large indexes it is important that we get results in shortest time possible. To ensure this SPHINX provides feature of “Distributed Searching”.

Today we will look how we will look on SPHINX’s ability to perform distributed searching. If you don’t know what sphinx is please refer to my other post Introduction to Sphinx Search.

For example if you have a large index you can easily distribute. You can create this index in chunks and assign each chunk to each sphinx agent. You will query to this index and sphinx will do the searching in parallel and give you final results. This dramatically improves the speed of searching. This concept similar to table partitioning in mysql.

Here is the example to setup a distributed index
index mycompleteindex
{
type = distributed
local = chunk0
agent = localhost:3312:chunk2
agent = localhost:3312:chunk3
agent = localhost:3312:chunk4
}

Here type = distributed tells that this is not a normal index. As we only have one searchd instance installed so we are using same instance localhost:3312 and declaring it an agent.

With each agent are have specified an chunk to be served by this agent. The example shows distributed searching in one system the same can be achieved by using separate server for each chunk.

Note that this post is only introduction to this feature for further details refer to sphinx documentation.

http://wasimasif.wordpress.com/2009/09/09/distributed-searching-with-sphinx/

PHP: Mysql Auto Increment Id Issue / Limit

Today, i learned a new limitation of using php function mysql_insert_id() … and it’s behavior with BIGINT data type. Due to the nature of my current project, we’ve BIGINT data type for auto id.

Documentation says something very crucial:

mysql_insert_id() converts the return type of the native MySQL C API function mysql_insert_id() to a type of long (named int in PHP).

If your AUTO_INCREMENT column has a column type of BIGINT, the value returned by mysql_insert_id() will be incorrect.

Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query.

Note: Because mysql_insert_id() acts on the last performed query, be sure to call mysql_insert_id() immediately after the query that generates the value.

Hope it helps someone.