Showing posts with label dba. Show all posts
Showing posts with label dba. Show all posts

Wednesday, October 12, 2011

TIL: Lookout For DEFINER

The Issue


I haven't blogged in a while an I have a long TODO list of things to publish: The repository for the SNMP Agent, video and slides of my OSCON talk and a quick overview of MHA master-master support. In the meantime, here's a little fact that I didn't know from MySQL CREATE VIEW documentation:

Although it is possible to create a view with a nonexistent DEFINER account, an error occurs when the view is referenced if the SQL SECURITY value is DEFINER but the definer account does not exist.
How can this be possible?

The Problem

For a number of reasons we don't have the same user accounts on the master than we have on the slaves (ie: developers shouldn't be querying the master). Our configuration files include the following line:
replicate-ignore-table=mysql.user
So if we create a user on the master, the user definition doesn't go through the replication chain.

So a VIEW can be created in the master, but unless we run all the proper GRANT statements on the slave as well, the VIEWs won't be effective on the slaves. Example from our slave (output formatted for clarity):

show create view view3\G
*************************** 1. row ***************************
                View: view3
         Create View: CREATE ALGORITHM=UNDEFINED 
               DEFINER=`app`@`192.168.0.1` 
               SQL SECURITY DEFINER VIEW `view3` AS select 
[...]

show grants for `app`@`192.168.0.1`;
ERROR 1141 (42000): There is no such grant defined 
for user 'app' on host '192.168.0.1'

The Solution

Once again, Maatkit's to the rescue with mk-show-grants on the master:
mk-show-grants | grep 192.168.0.1
-- Grants for 'app'@'192.168.0.1'
GRANT USAGE ON *.* TO 'app'@'192.168.0.1' 
IDENTIFIED BY PASSWORD '*password_hash';
GRANT DELETE, EXECUTE, INDEX, INSERT, SELECT, 
SHOW VIEW, UPDATE ON `pay`.* TO 'app'@'192.168.0.1';
A simple copy from the master and paste onto the slave fixed it.

Conclusion

Every now developers come to me with unusual questions. In this case it was: How come I can access only 2 out of 3 views?. In cases like these, it usually pays off to not overthink the issue and look into the details. A SHOW CREATE PROCEDURE on the 3 views quickly showed that one had a different host for the DEFINER. A quick read through the documentation and an easy test confirmed the mistake. That's why I have 3 mantras that I keep repeating to whomever wants to listen:
  • Keep it simple
  • Pay attention to details
  • RTFM (F is for fine)
It constantly keeps me from grabbing some shears and going into yak shaving mode.

Wednesday, August 17, 2011

MySQL HA Agent Mini HowTo

Why This Post


While testing Yoshinori Matsunobo's MHA agent I found that although the wiki has a very complete documentation, it was missing a some details. This article intends to close that gap and bring up some issues to keep in mind when you do your own installation. At the end of the article I added a Conclusions section, if you're not interested in the implementation details, but to read my take on the project, feel free to jump straight to the end from here.

My Test Case


Most of our production environments can be simplified to match the MHA's agent most simple use case: 1 master w/ 2 or more slaves and at least one more slave in an additional tier:

Master A --> Slave B
         +-> Slave C --> Slave D

As noted in the documentation, in this case the MHA agent will be monitoring A, B & C only. I found that unless you have a dedicated manager node, a slave on the 3rd tier (Slave D above) is suitable for this role. All 4 servers were setup as VMs for my evaluation / tests. It makes it easier to simulate hard failure scenarios in a controlled environment. Once this is in place the fun begins.

1st Step: User Accounts


In all the examples in the documentation it uses root to login into MySQL and the OS. I prefer to create specific users for each application, so I created a specific MySQL user for the MHA agent and used the linux' mysql user (UID/GID = 27/27 in RedHat / CentOS).

MySQL Credentials

Reviewing the code, I was able to determine that the agent requires to run some privileged commands like: SET GLOBAL variable, CHANGE MASTER TO ..., FLUSH LOGS ..., SHOW SLAVE STATUS, etc. and creates internal working tables to be used during the master fail over. The easiest way to set it up was using:
GRANT ALL PRIVILEGES ON *.* TO mha_user@'ip address'  
IDENTIFIED BY password;
This should be repeated on all 4 servers using the IP addresses for all the potential manager nodes. Yes, it would be possible to use wildcards, but I consider restricting access from specific nodes a safer practice.

The MySQL replication user needs to be set up to connect from any other server in the cluster, since any of the slaves in the group could be promoted to be master, and have the rest of them connecting to it.

Linux User

As I mentioned before I use the default RedHat / CentOS definition for the mysql user. Keep in mind that if you installed from the official Oracle packages (ie: RPMs), they may not follow this criteria and could result in mismatching UID/GIDs between servers. The UIDs/GIDs for the mysql user and group have to be identical on all 4 servers. If this is not the case, you may use the following bash sequence/script as root to correct the situation:

#!/bin/bash 
# stop mysql
/etc/init.d/mysql stop
 
# Change ownership for all files / directories
find / -user mysql -exec chown -v 27 {} \;
find / -group mysql -exec chgrp -v 27 {} \;
 
# remove old user / group and rename the new ones
# might complain about not being able to delete group.
groupdel mysql
userdel mysql 

# Add the new user / group
groupadd -g 27 mysql
useradd -c "MySQL User" -g 27 -u 27 -r -d /var/lib/mysql mysql
 
# restart MySQL
/etc/init.d/mysql start

Once the mysql user is properly setup, you'll have to create password-less shared keys and authorize them on all the servers. The easiest way to do it is to create it in one of them, copy the public key to the authorized_keys file under the /var/lib/mysql/.ssh directory and then copy the whole directory to the other servers.

I use the mysql user to run the scripts since for most distributions it can't be used to login directly and there is no need to worry about file permissions, which makes it a safe and convenient user.

2nd Step: Follow The Documentation to Install and Configure


Once all the users have been properly setup, this step is straight forward. Check the Installation and Configuration sections of the wiki for more details.

For the placement of the configuration files I deviated a little bit from documentation, but not much:

  1. Used a defaults file: /etc/masterha_default with access only for user mysql since it includes the MHA agent password:
    -rw------- 1 mysql mysql 145 Aug 11 16:36 masterha_default.cnf
  2. The application settings were placed under /etc/masterha.d/ this way they're easy to locate and won't clutter the /etc directory.
For simplicity, I didn't include any of the optional scripts and checks (ie: secondary check) in the configurate. You may want to check the documentation and source code of these scripts. Some of them are not even code complete (ie: master_ip_failover). Unless you are implementing some of the more complicated use cases, you won't even need them. If you do, you'll need to write your own following the examples provided with the source code.

Once you have everything in place, run the following checks as the mysql user (ie: sudo su - mysql):
  1. masterha_check_ssh: Using my configuration files the command line looks like:
    masterha_check_ssh --conf=/etc/masterha_default.cnf --conf=/etc/masterha.d/test.cnf
  2. masterha_check_repl: This test will determine whether the agent can identify all the servers in the group and the replication topology. The command line parameters are identical to the previous step.

Both should show and OK status at the end. All utilities have verbose output, so if something goes wrong it's easy to identify the issue and correct it.

3rd Step: Run the Manager Script


If everything is OK, on the MHA node (Server D in my tests) run the following command as user mysql (ie: sudo su - mysql):

masterha_manager --conf=/etc/masterha_default.cnf --conf=/etc/masterha.d/test.cnf

You have to keep in mind that should the master fail, the agent will fail over to one of the slaves and stop running. This way it'll avoid split brain situations. You will either have to build the intelligence in the application to connect to the right master when failing or use a virtual IP. In both cases you'll might need to use customized IP failover scripts. The documentation provides more details.

Read the section about running the script in the background to choose the method that best fits your practice.

You will have to configure the notification script to get notified of the master failure. The failed server will have to be removed from the configuration file before re-launching the manager script, otherwise it will fail to start.

You can restart the failed server and set it up as a slave connected to the new master and reincorporate it to the replication group using masterha_conf_host.

Conclusion


This tool solves a very specific (and painful) problem which is: make sure all the slaves are in sync, promote one of them and change the configuration of all remaining slaves to replicate off the new master and it does it fairly quickly. The tool is simple and reliable and requires very little overhead. It's easy to see it is production ready.

The log files are pretty verbose, which makes it really easy to follow in great detail all the actions the agent took when failing over to a slave.

I recommend to any potential users to start with a simple configuration and add the additional elements gradually until it fits your infrastructure needs.

Although the documentation is complete and detailed, it takes some time to navigate and to put all the pieces of the puzzle together.

I would like the agent to support master-master configurations. This way it would minimize the work to re-incorporate the failed server into the pool. Yoshinori, if you're reading this, know that I'll volunteer to test master-master if you decide to implement it.

Thursday, July 21, 2011

My MySQL SNMP Agent

Back in February I wrote an article titled A Small Fix For mysql-agent. Since then we did a few more fixes to the agent and included a Bytes Behind Master (or BBM) chart. For those who can't wait to get their hands on the code, here's the current version: MySQL SNMP agent RPM. For those who'd like to learn about it's capabilities and issues, keep reading.

What to Expect From this Version


The article I quoted above pretty much describes the main differences with the original project, but we went further with the changes while still relying on Masterzen's code for the data collection piece.

The first big change is that we transformed Masterzen's code into a Perl module, this way we can easily plug in a new version without having to do massive editing to ours.

The 2nd change is that we added the code to calculate how many bytes behind is a slave, which should be cross checked always with seconds behind master to get replication's full picture. When a slave is just a few bytes behind, the script calculates the difference straight out of the SHOW SLAVE STATUS information. If the SQL thread is executing statements that are in a binary log file older than the one being updated by the I/O thread, then the script logs into the master to collect the sizes of the previous binary logs and make an accurate calculation of the delta.

For this change we hit another bug in CentOS 5 SNMP agent, by which 64bit counters were being truncated. The solution is to upgrade to CentOS 6 (not anytime soon, but that's another story) or a work around. We decided for the latter and display a variable flagging this value roll over. This is not needed for non-CentOS 5 platforms as far as we know.

By now I expect that many of you would have a question in your mind:

Why Not Branch / Fork?

Why provide an RPM instead of creating a branch/fork in the original project? There are many reasons, but I'll limit myself to a couple. I trust that before you write an enraged comment you'll keep in mind that this is a personal perception, which might be in disagreement with yours.

This code is different enough from the original that creating a branch to the original project would be too complicated to maintain. For example: we are using a completely different SNMP protocol and created a module out of the original code. We don't have the resources to follow behind all of Masterzen's possible patches and I wouldn't expect him to adopt my changes.

If we would've created a fork (a new project derived from the original), I believe at this point, it would divert the attention from the original project or others like PalominoDB's Nagios plugin.

What's Next

We plan to continue maintaining this RPM driven by our specific needs and keep sharing the results this way. If at some point we see it fit to drive the merge into another project or create a new fork of an existing one, we'll do it.

I will be presenting the project at OSCON next week. If you're going to be around, please come to my talk: Monitoring MySQL through SNMP and we can discuss issues like: why use pass_persist, why not use information schema instead of the current method, why not include your personal MySQL instrumentation pet peeve, I'd be glad to sit down with you and personally chat about it.

In the meantime, enjoy, provide feedback and I hope to get to know you at OSCON next Thursday.

Thursday, May 5, 2011

Some More Replication Stuff

Listening to the OurSQL podcast: Repli-cans and Repli-can’ts got me thinking, what are the issues with MySQL replication that Sarah and Sheeri didn’t have the time to include in their episode. Here’s my list:

Replication Capacity Index

This is a concept introduced by Percona in last year’s post: Estimating Replication Capacity which I revisited briefly during my presentation at this year’s MySQL Users Conference. Why is this important? Very simple: If you use your slaves to take backups, they might be outdated and will fall further behind during the backups. If you use them for reporting, your reports may not show the latest data. If you use it for HA, you may not start writing to it until the slave caught up.
Having said that, measuring replication capacity as you set up slaves is a good way to make sure that the slave servers will be able to catch up with the traffic in the master.

More On Mixed Replication

The podcast also discussed how mixed replication works and pointed to the general criteria that the server applies to switch to STATEMENT or ROW based. However there is one parameter that wasn’t mentioned and it might come back and haunt you: Transaction Isolation Level. You can read all about it in the MySQL Documentation: 12.3.6. SET TRANSACTION Syntax and in particular the InnoDB setting innodb_locks_unsafe for binlog.

Keep Binary Logs Handy

Today I found this article from SkySQL on Planet MySQL about Replication Binlog Backup, which is a really clever idea to keep your binary logs safe with the latest information coming out of the master. It offers a method of copying them without the MySQL server overhead. If you purge binary logs automatically to free space using the variable expire_logs_days, you will still have the logs when you need them for a longer time than your disk capacity on the master might allow.

Seconds Behind Master (SBM)

Again, another topic very well explained in the podcast, but here’s another case where this number will have goofy values. Lets say you have a master A that replicates master-master with server B and server C is a regular slave replicating off A. The application writes to A and B serves as a hot stand-by master.
When we have a deployment that requires DDL and/or DML statements, we break replication going from B to A (A to B keeps running to catch any live transactions) and apply the modifications to B. Once we verify that everything is working OK on B, we switch the application to write to B and restore replication going back to A. This offers a good avenue for rolling back in case the deployment breaks the database in any way (ie: rebuild B using the data in A). What we frequently see is, if the DDL/DML statement takes about 30min (1800 sec) on B, once we restore replication as explained, the slave C will show outrageous numbers for SBM (ie: >12hs behind, I really don’t know how does the SBM arithmetic works to explain this). So it’s a good idea to complement slave drifts monitoring with mk-heartbeat, which uses a timestamp to measure replication drifts.

Conclusion

This episode of the OurSQL podcast is a great introduction to replication and its quirks. I also believe that MySQL replication is one of the features that made the product so successful and wide spread. However, you need to understand its limitations if your business depends on it.

These are my $.02 on this topic, hoping to complement the podcast. I wanted to tweet my feedback to @oursqlcast, but it ended up being way more than 140 characters.

Wednesday, September 22, 2010

A Replication Surprise

While working on a deployment we came across a nasty surprise. In hindsight it was avoidable, but it never crossed our minds it could happen. I'll share the experience so when you face a similar situation, you'll know what to expect.

Scenario

To deploy the changes, we used a pair of servers configured to replicate with each other (master-master replication). There are many articles that describe how to perform an ALTER TABLE with minimum or no downtime using MySQL replication. The simple explanation is:
  1. Set up a passive master of the database you want to modify the schema. 
  2. Run the schema updates on the passive master.
  3. Let replication to catch up once the schema modifications are done.
  4. Promote the passive master as the new active master.
The details to make this work will depend on each individual situation and are too extensive for the purpose of this article. A simple Google search will point you in the right direction.

The Plan

The binlog_format variable was set to MIXED. While production was still running on the active master, we stopped replication from the passive to the active master so we would still get all the DML statements on the passive master while running the alter tables. Once the schema modifications were over, we could switch the active and passive masters in production and let the new passive catch up with the table modifications once the replication thread was running again.

The ALTER TABLE statement we applied was similar to this one:
ALTER TABLE tt ADD COLUMN cx AFTER c1;
There were more columns after cx and c1 was one of the first columns. Going through all the ALTER TABLE statements takes almost 2 hour, so it was important to get the sequence of event right.

Reality Kicks In

It turns out that using AFTER / BEFORE or changing column types broke replication when it was writing to the binlog files in row based format, which meant that we couldn't switch masters as planned until we had replication going again. As a result we had to re-issue an ALTER TABLE to revert the changes and then repeat them without the AFTER / BEFORE.

The column type change was trickier and could've been a disaster, fortunately this happened on a small table (~400 rows which meant the ALTER TABLE took less than 0.3sec). In this case we reverted the modification on the passive master and run the proper ALTER TABLE on the active master. Should this have happened with a bigger table, there was no other alternative than either rollback the deployment or deal with the locked table while the modification happened.

Once this was done we were able to restart the slave threads, let it catch up and and everything was running as planned ... but with a 2hr delay.

Unfortunately, using STATEMENT replication wouldn't work in this case for reasons that would need another blog article to explain.

Happy Ending

After the fact, I went back to the manual and I found this article: Replication with Differing Table Definitions on Master and Slave. I guess we should review the documentation more often, the changes happened after 5.1.22. I shared this article with the development team, so next time we won't have surprises.

Friday, March 26, 2010

My Impressions About MONyog

At work we have been looking for tools to monitor MySQL and at the same time provide as much diagnosis information as possible upfront when an alarm is triggered. After looking around at different options, I decided to test MONyog from Webyog, the makers of the better known SQLyog. Before we go on, the customary disclaimer: This review reflects my own opinion and in no way represents any decision that my current employer may or may not make in regards of this product.

First Impression

You know what they say about the first impression, and in this where MONyog started with the right foot. Since it is an agent-less system, it only requires to install the RPM or untar the tarball in the server where you're going to run the monitor and launch the daemon to get started. How much faster or simpler can it be? But in order to start monitoring a server you need to do some preparations on it. Create a MONyog user for both the OS and the database. I used the following commands:

For the OS user run the following command as root (thank you Tom):
groupadd -g 250 monyog && useradd -c 'MONyog User' -g 250 -G mysql -u 250 monyog && echo 'your_os_password' | passwd --stdin monyog
For the MySQL user run:
GRANT SELECT, RELOAD, PROCESS, SUPER on *.* to 'adm_monyog'@'10.%' IDENTIFIED BY 'your_db_password';
Keep in mind that passwords are stored in the clear in the MONyog configuration database, defining a MONyog user helps to minimize security breaches. Although for testing purposes I decided to go with a username/password combination to SSH into the servers, it is possible to use a key which would be my preferred setting in production.

The User Interface

The system UI is web driven using Ajax and Flash which makes it really thin and portable. I was able to test it without any issues using IE 8 and Firefox in Windows and Linux. Chrome presented some minor challenges but I didn't dig any deeper since I don't consider it stable enough and didn't want to get distracted with what could've been browser specific issues.

In order to access MONyog you just point your browser the server where it was installed with an URL equivalent to:
http://monyog-test.domain.com:5555 or http://localhost:5555
You will always land in the List of Servers tab. At the bottom of this page there is a Register a New Server link that you follow and start adding servers at will. The process is straight forward and at any point you can trace your steps back to make any corrections as needed (see screenshot). Once you enter the server information with the credentials defined in the previous section, you are set. Once I went through the motions, the first limitation became obvious: You have to repeat the process for every server, although there is an option to copy from previously defined servers, it can become a very tedious process.

Once you have the servers defined, to navigate into the actual system you need to check which servers you want to review, select the proper screen from a drop down box at the bottom of the screen and hit Go. This method seems straight forward, but at the beginning it is a little bit confusing and it takes some time to get used to it.

Features

MONyog has plenty of features that make it worth trying if you're looking for a monitoring software for MySQL. Hopefully by now you have it installed and ready to go, so I'll comment from a big picture point of view and let you reach your own conclusions.

The first feature that jumps right at me is its architecture, in particular the scripting support. All the variables it picks up from the servers it monitors are abstracted in JavaScript like objects and all the monitors, graphics and screens are based on these scripts. One the plus side, it adds a a lot of flexibility to how you can customize the alerts, monitors, rules and Dashboard display. On the other hand, this flexibility present some management challenges: customize thresholds, alerts and rules by servers or group of servers and backup of customized rules. None of these challenges are a showstopper and I'm sure MONyog will come up with solutions in future releases. Since everything is stored in SQLite databases and the repositories are documented, any SQLite client and some simple scripting is enough to get backups and workaround the limitations.

The agent-less architecture requires the definition of users to log into the database and the OS in order to gather the information it needs. The weak point here is that the credentials, including passwords, are stored in the clear in the SQLite databases. A way to secure this is to properly limit the GRANTs for the MySQL users and ssh using a DSA key instead of password. Again, no showstopper for most installations, but it needs some work from Webyog's side to increase the overall system security.

During our tests we ran against a bug in the SSH library used by MONyog. I engaged their Technical Support looking forward to evaluate their overall responsiveness. I have to say it was flawless, at no point they treated me in a condescending manner, made the most of the information I provided upfront and never wasted my time with scripted useless diagnostic routines. They had to provide me with a couple of binary builds, which they did in a very reasonably time frame. All in all, a great experience.

My Conclusion

MONyog doesn't provide any silver bullet or obscure best practice advice. It gathers all the environment variables effectively and presents it in an attractive and easy to read format. It's a closed source commercial software, the architecture is quite open through scripting and with well documented repositories which provides a lot of flexibility to allow for customizations and expansions to fit any installations needs. For installations with over 100 servers it might be more challenging to manage the servers configurations and the clear credentials may not be viable for some organizations. If these 2 issues are not an impediment, I definitively recommend any MySQL DBA to download the binaries and take it for a spin. It might be the solution you were looking for to keep an eye on your set of servers while freeing some time for other tasks.

Let me know what do you think and if you plan to be at the MySQL UC, look me up to chat. Maybe we can invite Rohit Nadhani from Webyog to join us.

Monday, March 8, 2010

Speaking At The MySQL Users Conference

My proposal has been accepted, yay!

I'll be speaking on a topic that I feel passionate about: MySQL Server Diagnostics Beyond Monitoring. MySQL has limitations when it comes to monitoring and diagnosing as it has been widely documented in several blogs.

My goal is to share my experience from the last few years and, hopefully, learn from what others have done. If you have a pressing issue, feel free to comment on this blog and I'll do my best to include the case in my talk and/or post a reply if the time allows.

I will also be discussing my future plans on sarsql. I've been silent about this utility mostly because I've been implementing it actively at work. I'll post a road map shortly based on my latest experience.

I'm excited about meeting many old friends (and most now fellow MySQL alumni) and making new ones. I hope to see you there!

Tuesday, December 8, 2009

sar-sql New Alpha Release

I just uploaded a new tarball for sar-sql containing a few bug fixes, overall code improvements. I also added options to get a partial snapshot of SHOW SLAVE STATUS and SHOW MASTER STATUS. I chose only a few columns to avoid over complicating the project.

I plan one more round of heavy code changes, but no new features until I can stabilize the code enough to release it as beta.


Feel free to visit the project page in Launchpad to comment on the Blueprints, report new bugs and participate through the Answers section.

Thank you very much to Patrick Galbraith who provided some ideas on the best way to solve some of the coding issues.

Enjoy the download.

Wednesday, December 2, 2009

About CSV Tables

As most of MySQL users, I have often ignored what I'd like to call the minor storage engines. MYISAM, InnoDB and NDB (aka MySQL Cluster) are well covered in several articles; but what about engines like CSV or ARCHIVE? As part of some internal projects, we have been playing around with these 2 with some interesting results. On this article I'll concentrate on CSV.

Scenario

Currently we have a few servers that are storing historical data that will eventually be migrated into Oracle. Two things need to happen until we can finally decommission them: 1) export the data to CSV so it can be imported in bulk into Oracle and 2) keep the data online so it can be queried as needed until the migration is finalized. I thought it would be interesting if we could solve both issues simultaneously and decided to try the CSV engine. Here's a description of the process.

Understanding CSV Engine

The CSV engine, as far as I can tell, was and example storage engine that was included with earlier MySQL versions to illustrate the storage engine API. Since the data is stored in plain text files there are some limitations that need to be considered before using it:
  1. No support for indexes
  2. No NULL columns are allowed

Exporting Data To CSV

So, my first step was to determine what would it take to export data from a regular table to CSV. These are the basic steps:

1. Create the seed for the CSV table based on an existing table

CREATE TABLE test LIKE test_csv;

If you feel adventurous, use CREATE TABLE LIKE ... SELECT ... in which case you may be able to skip the next 2 steps. The engine for the new table will be redefined at the end of the process.

2. Get rid of the indexes

ALTER TABLE test_csv DROP PRIMARY KEY, DROP KEY test_idx;

Modify this statement to include all existing indexes.

3. Get rid of NULL columns

ALTER TABLE test_csv MODIFY test_char VARCHAR(10) NOT NULL DEFAULT '', MODIFY test_date TIMESTAMP NOT NULL DEFAULT '0000-00-00';

The DEFAULT values need to be reviewed so the application makes no mistake that these should be NULL. Numeric values could be tricky since you may not find a suitable replacement for NULL. Adapt to your particular case.

4. Convert to CSV

ALTER TABLE test_csv ENGINE CSV;

This step will create an empty CSV file in the schema data directory.

5. Export the data

INSERT INTO test_csv SELECT * FROM test WHERE ...

This would allow you to export the portion of the data from an existing table into the CSV table/file.

At this point your data is all stored in a CSV file called test_csv.CSV under the data subdirectory that corresponds to the schema and the table can be queried as any other regular MySQL table, which is what I was looking for at the beginning of the project.

You could even update the table. If you need to load this file to any other application, just copy the file.

Keep in mind that we are talking about regular text files, so they are not adequate for big number of rows and frequent write operations.

Importing Data From CSV

If you have to import some CSV data from another application, as long as the data formatted properly, you can just create an empty table with the right columns and then copy the CSV file with the proper table name. Example:

In MySQL:
use test
CREATE TABLE test_import LIKE test_csv;
In the OS shell:
cp data.csv /var/lib/mysql/test/test_import.CSV
Now if you do: SELECT * FROM test_import LIMIT 2; it should show you the data on the first 2 lines of the CSV file.

Import Data Example

Many banks allow you to export the list of transactions from your online statement as a CSV file. You could easily use this file as explained above to consult and/or manipulate the data using regular SQL statements.

Conclusion

The CSV engine provides a very flexible mechanism to move data in and out of regular text files as long as you have proper access the data directories. You can easily generate these tables from existing data. You can also easily manipulate the data with any editor and (re) import it at will. I see it as an invaluable tool to move around information, especially in development and testing environments.

Do you have interesting use cases? I'd like to hear about them.

Friday, November 6, 2009

My MySQL Tool Chest

Every time I need to install or reconfigure a new workstation, I review the set of tools I use. It's an opportunity to refresh the list, reconsider the usefulness of old tools and review new ones. During my first week at Open Market I got one of these opportunities. Here is my short list of free (as in 'beer') OSS tools and why they have a place in my tool chest.

Testing Environments

Virtual Box


Of all the Virtual Machines out there, I consider Virtual Box to be the easiest to use. Since I first looking into it while I was still working at Sun/MySQL, this package has been improved constantly. It's a must have to stage High Availability scenarios or run tools that are not available in your OS of choice.

MySQL Sandbox

Did you compile MySQL from source and want to test it without affecting your current installation? Will replication break when you try a new feature? Will the upgrade work as expected? There is no other way to easily test this other than MySQL Sandbox. It's a must have for anyone working with MySQL regularly.

Backup

ZRM for MySQL

Many people have asked me why do I always suggest going this way when using (insert tool of preference) gets the job done. ZRM for MySQL has plenty of features that go beyond taking the actual backup, making it a breeze to actually manage the backup sets. In most cases if you use (insert tool of preference), you are still left with the additional tasks around the backups (ie: scheduling, rotation, copying backup off site, backup reports, etc). Why reinvent the wheel?

Tuning


These are simple scripts that can quickly give you an overview of the current status of any MySQL server and assist you in making proper adjustments to improve efficiency.

mysqlsla


I like to call mysqlsla the Slow Query Reality Check. I found that many times developers and DBAs start scanning the slow query log to find the slowest queries and start optimizing them to increase overall performance. Many fail to recognize that quick queries that are run hundreds or thousands of times in a short period of time, can have a much greater impact on performance than a dozen complex long running ones. mysqlsla can scan the query log, slow or general, and rank the queries based on accumulated run and lock times (among other values). This way it's easy to identify the the queries that will really impact overall performance. It might be a "SELECT COUNT(*) FROM table WHERE status = ?" instead of a query with a 5 table JOIN.

mysqltuner

Running mysqltuner is like taking a physical exam of a MySQL server. Whether you do it the first time you get into a server or after any changes to its configuration and/or environment. The script will very quickly point to the low hanging fruit in terms of configuration parameters. The most common issue I've caught with it is memory over allocation. This is a nasty situation that, by its very nature, if undetected it will show up in the very worst moment: under heavy load.

mytop

mytop will show you in real time what is going on in the server. Doing load tests? Trying to catch deadlocks? Fire up your test case while keeping an eye on mytop's screen.

Other

MySQL Workbench

At this point, I haven't been able to find any tool, other than MySQL Workbench, to get proper DB diagrams for MySQL schemas. The ideal situation would be that every DBA would have these diagrams accessible, but the truth is, they rarely exist.

sar-sql

I know, this is beating my own drum, but it works and combined with some other tools, it can provide a great deal of information with negligible overhead. I wish I had more time to write about more use cases.

Wildcard

myterm

I just read about myterm in a recent blog. I am really intrigued by it, but haven't had any time to test it. If it works as advertised, it is a great companion to sar-sql.