Monday, December 14, 2009

A Hard Look Into Replication

For some time now I've been struggling with a slave that invariably stays behind its master. I have been looking at every detail I can possibly think and in the process discovered a number of replication details I wasn't aware until now. I haven't too much information about them in the documentation, but they can affect the way you look at your slaves.

Seconds Behind Master

This is the first value that to look at when evaluating replication, most of the monitoring systems I know of rely on it. According to the manual:
When the slave SQL thread is actively running
(processing updates), this field is the number of
seconds that have elapsed since the timestamp of the
most recent event on the master executed by that thread.
In fast networks, most of the time, this is an accurate estimate of replication status, but many times you'll see this value to be in the ten of thousands of seconds and not a minute later it falls back to 0. In a chain of master and slaves, the number on the last slave measures how far behind it is from the master at the top of the chain. Under heavy load on the top master, it can even go back and forth wildly. Because of this, I've learned not to trust this value alone. It is a good idea then to compare other variables as well. For example: Master_Log_File / Exec_Master_Log_Pos vs. Relay_Master_Log_File / Read_Master_Log_Pos. The 2nd pair will point to the last statement executed on the slave in relation to the master's binary log file (keep in mind that the statements are actually being executed from the Relay Log file). The first one, will point to the latest statement read from the master and being copied into the Relay Log. Checking all these variables in context will tell you the real status of the slaves.

Sidenote: These are the variables in the slave snapshot in sar-sql, let me know which ones do you monitor to make your slaves are healthy.

Binary Log Format

This item is important and encompasses which format you choose for replication. In the case I am working on, it was set to STATEMENT. An initial look, revealed that the master had bursts of very high traffic, after which the slaves started lagging behind significantly. Most likely (still trying to prove this), because a number of big INSERTs and UPDATEs are being processed at the same time on the master, and inevitably are serialized on the slaves. Without going into the details, switching to ROW solved most of the delays.

Although binlog_format is a dynamic variable, the change will not take place right away. It will be applied to newly created threads/connections. Which means that if you have connection pooling in place  (very common with web applications) , it might take a while until the change actually happens. If you want to force the change as soon as possible, you will have to find a mechanism friendly to your particular environment to regenerate the connections.

Another issue that came up is that, in a replication tree, no matter what the binlog_format variables establishes for the slaves in the middle of the chain. The binary log format of the top master will be used across the chain.

Status Variables and Logs

As you may know, SHOW GLOBAL STATUS includes a number of counters that count how many times a command type was issued. So Com_Insert will tell you how many INSERTs were issued since the server is up. That is, without counting the replication thread. So you may issue thousands of INSERTs on the master, and while Com_Insert will be updated accordingly, it won't change in the slave. Very frustrating when I tried to evaluate if the INSERT rate in the slave matched the rate on the master. The general log has a similar issue, it won't record any statement executed by the slave threads.

Conclusion

Although I understand where these limitations may originate from the way MySQL replication works, it does frustrate me since it really limits the type of tests and diagnostics that can be set up to find what's causing the issues on these servers.

I have to point out that MySQL Sandbox is an invaluable tool to test the different replication scenarios with minimum preparation work.

5 comments:

  1. Hi there,

    I don't want to be a jerk, I like this article but have some notes also:

    - slave delay:
    Its second, every query logged to the binlog with a timestamp. For exampel you can see a delay if the system time is different on the master and the slave.
    But you are right about else

    - binlog format:
    I guess its doesn't solved your problem to move row based because its "better". You were right about the why, partly. If there is a lots of insert/update comes in, the master can execute them in the perfect order as they arrive or if they come all together, then execute them at the same time, but the replication is single thread and have a strict order. And if an update takes 1 sec on the master, but your slave is slower and takes 2 sec, there you got, 2 sec delay :)
    Row based actually not good if you are using queries like update table set smtng = X where y=z limit 100. This is one query in the statement base and 100 in the row based. Even if the execution is fast, in row based replication what is also single thread could take up a lots of times.

    - status variable and logs:
    true, but you can make calculation of questions and/or (version dependent) queries variable :)

    Thanks for you article. I think we were missing one of this for a long time and I can only brave you to go on :)

    ReplyDelete
  2. Hi Istvan, thank you for your feedback and you don't sound like a jerk.

    Regarding 'binlog format', I don't favor one format over the other without looking at the transaction profile and understanding the application. In our case ROW format helped because of some TRIGGER processing.

    In the example you mention, STATEMENT might be better if it weren't for the use of LIMIT, which may cause issues with STATEMENT replication.

    ReplyDelete
  3. Very insightful and useful post! Thanks a lot. I added it to the MySQL Librarian as well!

    ReplyDelete
  4. Hi Gerry,

    It is true that the global value of binlog_format only takes effect for new sessions, as is the case for all server variables that have both a global and a session flavor.

    However, if you have a bunch of session threads in a connection pool, you can manually set the value of the session version of binlog_format by just sending the command:

    SET SESSION BINLOG_FORMAT=ROW;

    ReplyDelete
  5. @Mats, that is correct.

    In the case I was working on, the connections were created by the the application server for its connection pool. There was no alternative but wait for those connections to be recycled.

    Thanks for your comment.

    ReplyDelete