Monday, December 14, 2009

A Hard Look Into Replication

For some time now I've been struggling with a slave that invariably stays behind its master. I have been looking at every detail I can possibly think and in the process discovered a number of replication details I wasn't aware until now. I haven't too much information about them in the documentation, but they can affect the way you look at your slaves.

Seconds Behind Master

This is the first value that to look at when evaluating replication, most of the monitoring systems I know of rely on it. According to the manual:
When the slave SQL thread is actively running
(processing updates), this field is the number of
seconds that have elapsed since the timestamp of the
most recent event on the master executed by that thread.
In fast networks, most of the time, this is an accurate estimate of replication status, but many times you'll see this value to be in the ten of thousands of seconds and not a minute later it falls back to 0. In a chain of master and slaves, the number on the last slave measures how far behind it is from the master at the top of the chain. Under heavy load on the top master, it can even go back and forth wildly. Because of this, I've learned not to trust this value alone. It is a good idea then to compare other variables as well. For example: Master_Log_File / Exec_Master_Log_Pos vs. Relay_Master_Log_File / Read_Master_Log_Pos. The 2nd pair will point to the last statement executed on the slave in relation to the master's binary log file (keep in mind that the statements are actually being executed from the Relay Log file). The first one, will point to the latest statement read from the master and being copied into the Relay Log. Checking all these variables in context will tell you the real status of the slaves.

Sidenote: These are the variables in the slave snapshot in sar-sql, let me know which ones do you monitor to make your slaves are healthy.

Binary Log Format

This item is important and encompasses which format you choose for replication. In the case I am working on, it was set to STATEMENT. An initial look, revealed that the master had bursts of very high traffic, after which the slaves started lagging behind significantly. Most likely (still trying to prove this), because a number of big INSERTs and UPDATEs are being processed at the same time on the master, and inevitably are serialized on the slaves. Without going into the details, switching to ROW solved most of the delays.

Although binlog_format is a dynamic variable, the change will not take place right away. It will be applied to newly created threads/connections. Which means that if you have connection pooling in place  (very common with web applications) , it might take a while until the change actually happens. If you want to force the change as soon as possible, you will have to find a mechanism friendly to your particular environment to regenerate the connections.

Another issue that came up is that, in a replication tree, no matter what the binlog_format variables establishes for the slaves in the middle of the chain. The binary log format of the top master will be used across the chain.

Status Variables and Logs

As you may know, SHOW GLOBAL STATUS includes a number of counters that count how many times a command type was issued. So Com_Insert will tell you how many INSERTs were issued since the server is up. That is, without counting the replication thread. So you may issue thousands of INSERTs on the master, and while Com_Insert will be updated accordingly, it won't change in the slave. Very frustrating when I tried to evaluate if the INSERT rate in the slave matched the rate on the master. The general log has a similar issue, it won't record any statement executed by the slave threads.

Conclusion

Although I understand where these limitations may originate from the way MySQL replication works, it does frustrate me since it really limits the type of tests and diagnostics that can be set up to find what's causing the issues on these servers.

I have to point out that MySQL Sandbox is an invaluable tool to test the different replication scenarios with minimum preparation work.

Tuesday, December 8, 2009

sar-sql New Alpha Release

I just uploaded a new tarball for sar-sql containing a few bug fixes, overall code improvements. I also added options to get a partial snapshot of SHOW SLAVE STATUS and SHOW MASTER STATUS. I chose only a few columns to avoid over complicating the project.

I plan one more round of heavy code changes, but no new features until I can stabilize the code enough to release it as beta.


Feel free to visit the project page in Launchpad to comment on the Blueprints, report new bugs and participate through the Answers section.

Thank you very much to Patrick Galbraith who provided some ideas on the best way to solve some of the coding issues.

Enjoy the download.

Wednesday, December 2, 2009

About CSV Tables

As most of MySQL users, I have often ignored what I'd like to call the minor storage engines. MYISAM, InnoDB and NDB (aka MySQL Cluster) are well covered in several articles; but what about engines like CSV or ARCHIVE? As part of some internal projects, we have been playing around with these 2 with some interesting results. On this article I'll concentrate on CSV.

Scenario

Currently we have a few servers that are storing historical data that will eventually be migrated into Oracle. Two things need to happen until we can finally decommission them: 1) export the data to CSV so it can be imported in bulk into Oracle and 2) keep the data online so it can be queried as needed until the migration is finalized. I thought it would be interesting if we could solve both issues simultaneously and decided to try the CSV engine. Here's a description of the process.

Understanding CSV Engine

The CSV engine, as far as I can tell, was and example storage engine that was included with earlier MySQL versions to illustrate the storage engine API. Since the data is stored in plain text files there are some limitations that need to be considered before using it:
  1. No support for indexes
  2. No NULL columns are allowed

Exporting Data To CSV

So, my first step was to determine what would it take to export data from a regular table to CSV. These are the basic steps:

1. Create the seed for the CSV table based on an existing table

CREATE TABLE test LIKE test_csv;

If you feel adventurous, use CREATE TABLE LIKE ... SELECT ... in which case you may be able to skip the next 2 steps. The engine for the new table will be redefined at the end of the process.

2. Get rid of the indexes

ALTER TABLE test_csv DROP PRIMARY KEY, DROP KEY test_idx;

Modify this statement to include all existing indexes.

3. Get rid of NULL columns

ALTER TABLE test_csv MODIFY test_char VARCHAR(10) NOT NULL DEFAULT '', MODIFY test_date TIMESTAMP NOT NULL DEFAULT '0000-00-00';

The DEFAULT values need to be reviewed so the application makes no mistake that these should be NULL. Numeric values could be tricky since you may not find a suitable replacement for NULL. Adapt to your particular case.

4. Convert to CSV

ALTER TABLE test_csv ENGINE CSV;

This step will create an empty CSV file in the schema data directory.

5. Export the data

INSERT INTO test_csv SELECT * FROM test WHERE ...

This would allow you to export the portion of the data from an existing table into the CSV table/file.

At this point your data is all stored in a CSV file called test_csv.CSV under the data subdirectory that corresponds to the schema and the table can be queried as any other regular MySQL table, which is what I was looking for at the beginning of the project.

You could even update the table. If you need to load this file to any other application, just copy the file.

Keep in mind that we are talking about regular text files, so they are not adequate for big number of rows and frequent write operations.

Importing Data From CSV

If you have to import some CSV data from another application, as long as the data formatted properly, you can just create an empty table with the right columns and then copy the CSV file with the proper table name. Example:

In MySQL:
use test
CREATE TABLE test_import LIKE test_csv;
In the OS shell:
cp data.csv /var/lib/mysql/test/test_import.CSV
Now if you do: SELECT * FROM test_import LIMIT 2; it should show you the data on the first 2 lines of the CSV file.

Import Data Example

Many banks allow you to export the list of transactions from your online statement as a CSV file. You could easily use this file as explained above to consult and/or manipulate the data using regular SQL statements.

Conclusion

The CSV engine provides a very flexible mechanism to move data in and out of regular text files as long as you have proper access the data directories. You can easily generate these tables from existing data. You can also easily manipulate the data with any editor and (re) import it at will. I see it as an invaluable tool to move around information, especially in development and testing environments.

Do you have interesting use cases? I'd like to hear about them.

Tuesday, November 24, 2009

ZRM for MySQL Issues

I really like ZRM for MySQL Community Edition for backups. I find
it very simple to install and configure and it gets the job done. It's default settings suits most installations. The challenges arise once you need to adapt it to more complex organizations. Please note that all the issues described on this article refer to Linux installations.

What If DBAs Have No sudo?

This was the 1st challenge that I ever faced. ZRM requires full root access at least during installation and in some cases, it requires some sudo permissions to be able to manage its files as well. Some organizations don't grant these permissions to the DBAs complicating the installation. When I looked into the scripts, I didn't see any reason why they wouldn't run in the user space, other than a few hard coded paths. I posted the question in the Zmanda forums and I got an answer with some instructions on how to patch the scripts. The instructions are pretty straight forward, but I haven't had a chance to test them. May be someone else could try and post the results as a comment? Just follow the link to the original post and read through the thread.

Install From RPM or Tarball?

So far I have always used the tarball since it offers the most flexibility. However, this requires to properly set up permissions for the different directories and subdirectories. Although the locations of the main files are listed in the documentation, the tarball includes a script (run automatically if you use the deb or rpm packages) that sets the ownerships properly. Example:
sudo mysql-zrm-migrate-file-ownership --user mysql --group mysql
For multi-DBA organizations, you can easily modify the script to also grant group access to the directories using chmod as needed.

More on Permissions

Once the right access permissions have been set properly, the scripts will
run OK, however a new problem arises. No matter how you set the
permissions, the directory holding the backup files will only be created with user access. With the example above it means that only the user mysql will have access, effectively leaving the DBAs with no proper access to the files and reports even if they are included into the mysql group. I haven't been alble to locate where to patch the scripts properly to avoid this issue. For now, writing a wrapper that will include the lines:chmod
-R g+rx /path/to/backups
after the backup works as an easy solution.

Crontab Entries

The most traditional method to run any process periodically is to use crontab, but modern distributions are favoring the use of files in the /etc/cron* subdirectories that will be run by run-parts. Chances are your system already has some daily tasks scheduled, which can be listed using:
run-parts --list /etc/cron.daily/
The security in many of these modern systems don't allow the use of crontab anymore. In these cases the mysql-zrm-scheduler utility can't be used. In it's place you'll need to create a file in /etc/cron.d (or a wrapper script in the /etc/cron.daily sub directory) containing the proper entry in a format similar to the traditional crontab. The main difference is that you need to specified the user it will use to execute. A typical file would look like:
# /etc/cron.d/zrm-backup: crontab entries for mysql-zrm

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# add other environment variables you might need, like: PERL5LIB

30 1  * * *   mysql    mysql-zrm --action backup --backup-set backup

Conclusion

ZRM for MySQL continues to be one of my favorite MySQL tools. None of these issues are showstoppers, however the fact remains that Zmanda hasn't released any major update to these scripts in a long time and it's starting to show. If they don't do something about it, someone else will stealing their thunder. Such is the nature of Open Source.

Friday, November 6, 2009

My MySQL Tool Chest

Every time I need to install or reconfigure a new workstation, I review the set of tools I use. It's an opportunity to refresh the list, reconsider the usefulness of old tools and review new ones. During my first week at Open Market I got one of these opportunities. Here is my short list of free (as in 'beer') OSS tools and why they have a place in my tool chest.

Testing Environments

Virtual Box


Of all the Virtual Machines out there, I consider Virtual Box to be the easiest to use. Since I first looking into it while I was still working at Sun/MySQL, this package has been improved constantly. It's a must have to stage High Availability scenarios or run tools that are not available in your OS of choice.

MySQL Sandbox

Did you compile MySQL from source and want to test it without affecting your current installation? Will replication break when you try a new feature? Will the upgrade work as expected? There is no other way to easily test this other than MySQL Sandbox. It's a must have for anyone working with MySQL regularly.

Backup

ZRM for MySQL

Many people have asked me why do I always suggest going this way when using (insert tool of preference) gets the job done. ZRM for MySQL has plenty of features that go beyond taking the actual backup, making it a breeze to actually manage the backup sets. In most cases if you use (insert tool of preference), you are still left with the additional tasks around the backups (ie: scheduling, rotation, copying backup off site, backup reports, etc). Why reinvent the wheel?

Tuning


These are simple scripts that can quickly give you an overview of the current status of any MySQL server and assist you in making proper adjustments to improve efficiency.

mysqlsla


I like to call mysqlsla the Slow Query Reality Check. I found that many times developers and DBAs start scanning the slow query log to find the slowest queries and start optimizing them to increase overall performance. Many fail to recognize that quick queries that are run hundreds or thousands of times in a short period of time, can have a much greater impact on performance than a dozen complex long running ones. mysqlsla can scan the query log, slow or general, and rank the queries based on accumulated run and lock times (among other values). This way it's easy to identify the the queries that will really impact overall performance. It might be a "SELECT COUNT(*) FROM table WHERE status = ?" instead of a query with a 5 table JOIN.

mysqltuner

Running mysqltuner is like taking a physical exam of a MySQL server. Whether you do it the first time you get into a server or after any changes to its configuration and/or environment. The script will very quickly point to the low hanging fruit in terms of configuration parameters. The most common issue I've caught with it is memory over allocation. This is a nasty situation that, by its very nature, if undetected it will show up in the very worst moment: under heavy load.

mytop

mytop will show you in real time what is going on in the server. Doing load tests? Trying to catch deadlocks? Fire up your test case while keeping an eye on mytop's screen.

Other

MySQL Workbench

At this point, I haven't been able to find any tool, other than MySQL Workbench, to get proper DB diagrams for MySQL schemas. The ideal situation would be that every DBA would have these diagrams accessible, but the truth is, they rarely exist.

sar-sql

I know, this is beating my own drum, but it works and combined with some other tools, it can provide a great deal of information with negligible overhead. I wish I had more time to write about more use cases.

Wildcard

myterm

I just read about myterm in a recent blog. I am really intrigued by it, but haven't had any time to test it. If it works as advertised, it is a great companion to sar-sql.

Thursday, October 29, 2009

About This Blog

If you are wondering about the title, it makes reference to a special tea very popular in Argentina: yerba mate (maté \ˈmä-ˌtā\ in English). In Argentina, matear (drinking mate) is a social activity, someting you often do among friends sharing stories. To avoid confusion, I'll write matein italics in rest of this article.

In Between Mates

In Spanish, entre mate y mate refers to the stories that are shared in a circle of friends while drinking the infusion. The person who has the mate, always makes a pause in the conversation, sip the mate and pass it back to the cebador before continuing with the conversation. Everyone else will patiently wait and keep listening while themselves sip from the mate when it's their turn. The conversation then takes place, in between mates: entre mate y mate.

MySQL Mates

This refers to a different topic. This blog will be mostly about MySQL but, as I have been doing for while, I will cover topics and packages that are part of the MySQL ecosysten. In the past I have posted articles about MySQL Sandbox, ZRM for MySQL and SphinxSE as well as topics that, although not MySQL specific, they affect it in one way or another. In short, MySQL companions or "mates".

This blog is intended then as a conversation between you, reader and myself. I don't pretend to know everything and I expect to learn from your comments.

See you around.