Showing posts with label diagnosis. Show all posts
Showing posts with label diagnosis. Show all posts

Thursday, July 21, 2011

My MySQL SNMP Agent

Back in February I wrote an article titled A Small Fix For mysql-agent. Since then we did a few more fixes to the agent and included a Bytes Behind Master (or BBM) chart. For those who can't wait to get their hands on the code, here's the current version: MySQL SNMP agent RPM. For those who'd like to learn about it's capabilities and issues, keep reading.

What to Expect From this Version


The article I quoted above pretty much describes the main differences with the original project, but we went further with the changes while still relying on Masterzen's code for the data collection piece.

The first big change is that we transformed Masterzen's code into a Perl module, this way we can easily plug in a new version without having to do massive editing to ours.

The 2nd change is that we added the code to calculate how many bytes behind is a slave, which should be cross checked always with seconds behind master to get replication's full picture. When a slave is just a few bytes behind, the script calculates the difference straight out of the SHOW SLAVE STATUS information. If the SQL thread is executing statements that are in a binary log file older than the one being updated by the I/O thread, then the script logs into the master to collect the sizes of the previous binary logs and make an accurate calculation of the delta.

For this change we hit another bug in CentOS 5 SNMP agent, by which 64bit counters were being truncated. The solution is to upgrade to CentOS 6 (not anytime soon, but that's another story) or a work around. We decided for the latter and display a variable flagging this value roll over. This is not needed for non-CentOS 5 platforms as far as we know.

By now I expect that many of you would have a question in your mind:

Why Not Branch / Fork?

Why provide an RPM instead of creating a branch/fork in the original project? There are many reasons, but I'll limit myself to a couple. I trust that before you write an enraged comment you'll keep in mind that this is a personal perception, which might be in disagreement with yours.

This code is different enough from the original that creating a branch to the original project would be too complicated to maintain. For example: we are using a completely different SNMP protocol and created a module out of the original code. We don't have the resources to follow behind all of Masterzen's possible patches and I wouldn't expect him to adopt my changes.

If we would've created a fork (a new project derived from the original), I believe at this point, it would divert the attention from the original project or others like PalominoDB's Nagios plugin.

What's Next

We plan to continue maintaining this RPM driven by our specific needs and keep sharing the results this way. If at some point we see it fit to drive the merge into another project or create a new fork of an existing one, we'll do it.

I will be presenting the project at OSCON next week. If you're going to be around, please come to my talk: Monitoring MySQL through SNMP and we can discuss issues like: why use pass_persist, why not use information schema instead of the current method, why not include your personal MySQL instrumentation pet peeve, I'd be glad to sit down with you and personally chat about it.

In the meantime, enjoy, provide feedback and I hope to get to know you at OSCON next Thursday.

Tuesday, February 1, 2011

A Small Fix For mysql-agent

If you're already using an SNMP monitoring tool like OpenNMS, mysql-agent is a great way to add a number of graphics using Net-SNMP. However mysql-agent has a small bug that drove me crazy. I will try to highlight the process on how I discovered it (and hence fix it) since it involved learning about SNMP, how to diagnose it and eventually, once all the pieces came together, how simple it is to write your own agents.

Although versions are not that important, just for the sake of completeness we were using CentOS 5.5, MySQL 5.5.8 Community RPMs, Net SNMP version 5.3.22 and OpenNMS Web Console 1.8.7.

The Problem

I followed the directions on the mysql-agent blog only to find that I was facing the only open issue listed on mysql-agent's Github repository (spoiler alert, the solution is at the bottom). The set up has several components, which makes it difficult to diagnose:
  • mysql-agent
  • snmpd +  agentx
  • OpenNMS server
Running snmpwalk on the MySQL host, as suggested in the mysql-agent article, worked fine (as far as we could tell). However, OpenNMS wasn't getting the data and the graphs weren't showing up.

It turns out that, once you completed the OpenNMS configuration as described in the article, it's a good idea to run snmpwalk remotely, from the server running OpenNMS, as well. You need to specify your MySQL hostname instead of localhost:
snmpwalk -m MYSQL-SERVER-MIB -v 2c -c public mysql-host enterprises.20267

In our case, it failed. Unfortunately the logs didn't offer much information and whatever was failing, it was inside agentx.

The Alternative

Since the NetSNMP Perl class hides a lot of the details of the Net SNMP API, we decided to use an alternative method to write the agent using pass_persist. The beauty of this method is that you only need to write a filter script: SNMP requests come through standard input (stdin) and the output needs to be printed to standard output (stdout). In consequence, the agent can be tested straight from the command line before implementing it. A nice article about pass_persist can be found here. The pass_persist protocol is fully documented in the snmpd.conf man page.

To follow this route we had to tweak the script a little. The tweaks included:
  • No daemonize: Since the script used stdin/stdout, it needs to run interactively.
  • All values need to be returned as strings. It was the only work around we found to deal with 64bits values that otherwise weren't interpreted correctly.
  • stderr needed to be redirected to a file to avoid breaking the script's returned values ( add 2>/tmp/agent.log to the end of the command line) while you run it interactively.
  • Use SNMP::Persist Perl module to handle the SNMP protocol.
Once the changes were implemented (I promise to publish the alternative mysql-agent script after some clean up) these are the steps I followed to test it (for now I'll leave the -v option out, along with the stderr redirection).
  1. Invoke the agent as you would've done originally, keeping in mind that now it'll run interactively. On your MySQL server:
    mysql-agent-pp -c /path/to/.my.cnf -h localhost -i -r 30
  2. Test if the agent is working properly (blue -> you type, red -> script output):
    PING
    PONG
  3. Does it actually provide the proper values?
    get
    .1.3.6.1.4.1.20267.200.1.1.0
    .1.3.6.1.4.1.20267.200.1.1.0
    Counter32
    21
    getnext
    .1.3.6.1.4.1.20267.200.1.1.0
    .1.3.6.1.4.1.20267.200.1.2.0
    Counter32
    16
Note that case is important PING needs to be capitalized, get and getnext need to be in small caps. Once you know it works you'll need to add the pass_persist line to the snmpd.conf file and restart snmpd:
# Line to use the pass_persist method
pass_persist .1.3.6.1.4.1.20267.200.1 /usr/bin/perl /path/to/mysql-agent -c /path/to/.my.cnf -h localhost -i -r 30
Now execute snmpwalk remotely and if everything looks OK, you're good to go.

On our first runs, snmpwalk failed after the 31st value. Re-tried the specific values and a few other ones after those with get and getnext and it became obvious that for some, the responses weren't the expected ones.

The Bug and The Fix

So now, having identified the failing values, it was time to dig into the source code.

First the data gathering portion, which fortunately is well documented inside the source code. I found ibuf_inserts and ibuf_merged as the 31st and 32nd values (note that with get you can check other values further down the list, which I did to confirm that the issue was specific to some variables and not a generic problem). A little grepping revealed that these values were populated from the SHOW INNODB STATUS output, which in 5.5 didn't include the the line expected in the program logic, hence, the corresponding values stayed undefined. A patch to line 794 on the original script fixed this particular issue by setting the value to 0 for undefined values.

794c794
<             $global_status{$key}{'value'} = $status->{$key};
---
>             $global_status{$key}{'value'} = (defined($status->{$key}) and $status->{$key} ne '' ? $status->{$key} : 0);
This fix can be used for the original script and the new pass_persist one. I already reported it upstread in GitHub.

The original script still failed. OpenNMS still requires getbulk requests (explained in the Net-SNMP documentation) that agentx fails to convert into getnext. This can be reproduced using snmpbulkwalk instead of snmpwalk (Note: It took some tcpdump + wireshark tricks to catch the getbulk requests). The current beta of the pass_persist version of mysql-agent has been in place for a while without issues.

Conclusion

I'm not highlighting all the process since it was long and complicated, but I learned a few concepts in during this time the I'd like to point out

Look Around Before Looking for New Toys

If you're using OSS, you may already have in house most of what you need. This project started when we decided to use OpenNMS (already in place to monitor our infrastructure) and wanted to add to it the MySQL data we wanted to monitor closely. A simple Google search pointed us to mysql-agent right away.

Embrace OSS

All the tools that we used in this case are Open Source, which made it extremely easy to diagnose the source code when pertinent, try alternatives, benefit from the collective knowledge, make corrections and contribute them back to the community. A full evaluation of commercial software, plus the interaction with tech support to get to the point where we needed a patch would've been as involved as this one and the outcome wouldn't have been guaranteed either. I'm not against commercial software, but you need evaluate if it will add any real value as opposed to the open source alternatives.

SNMP is Your Friend

Learning about the SNMP protocol, in particular the pass_persist method was very useful. It removed the mystery out of it and writing agents in any language (even bash) is far from difficult. I'm looking forward to go deeper into MySQL monitoring using this technology.

I'm hoping this long post encourages you to explore the use of SNMP monitoring for MySQL on your own.

Credit: I need to give credit to Marc Martinez who did most of the thinking and kept pointing me in the right direction every time I got lost.

NOTE: I'm not entirely satisfied with the current pass_persist version of mysql-agent I have in place, although it gets the job done. Once I have the reviewed version, I plan ... actually promise to publish it either as a branch of the existing one or separately.

Friday, March 26, 2010

My Impressions About MONyog

At work we have been looking for tools to monitor MySQL and at the same time provide as much diagnosis information as possible upfront when an alarm is triggered. After looking around at different options, I decided to test MONyog from Webyog, the makers of the better known SQLyog. Before we go on, the customary disclaimer: This review reflects my own opinion and in no way represents any decision that my current employer may or may not make in regards of this product.

First Impression

You know what they say about the first impression, and in this where MONyog started with the right foot. Since it is an agent-less system, it only requires to install the RPM or untar the tarball in the server where you're going to run the monitor and launch the daemon to get started. How much faster or simpler can it be? But in order to start monitoring a server you need to do some preparations on it. Create a MONyog user for both the OS and the database. I used the following commands:

For the OS user run the following command as root (thank you Tom):
groupadd -g 250 monyog && useradd -c 'MONyog User' -g 250 -G mysql -u 250 monyog && echo 'your_os_password' | passwd --stdin monyog
For the MySQL user run:
GRANT SELECT, RELOAD, PROCESS, SUPER on *.* to 'adm_monyog'@'10.%' IDENTIFIED BY 'your_db_password';
Keep in mind that passwords are stored in the clear in the MONyog configuration database, defining a MONyog user helps to minimize security breaches. Although for testing purposes I decided to go with a username/password combination to SSH into the servers, it is possible to use a key which would be my preferred setting in production.

The User Interface

The system UI is web driven using Ajax and Flash which makes it really thin and portable. I was able to test it without any issues using IE 8 and Firefox in Windows and Linux. Chrome presented some minor challenges but I didn't dig any deeper since I don't consider it stable enough and didn't want to get distracted with what could've been browser specific issues.

In order to access MONyog you just point your browser the server where it was installed with an URL equivalent to:
http://monyog-test.domain.com:5555 or http://localhost:5555
You will always land in the List of Servers tab. At the bottom of this page there is a Register a New Server link that you follow and start adding servers at will. The process is straight forward and at any point you can trace your steps back to make any corrections as needed (see screenshot). Once you enter the server information with the credentials defined in the previous section, you are set. Once I went through the motions, the first limitation became obvious: You have to repeat the process for every server, although there is an option to copy from previously defined servers, it can become a very tedious process.

Once you have the servers defined, to navigate into the actual system you need to check which servers you want to review, select the proper screen from a drop down box at the bottom of the screen and hit Go. This method seems straight forward, but at the beginning it is a little bit confusing and it takes some time to get used to it.

Features

MONyog has plenty of features that make it worth trying if you're looking for a monitoring software for MySQL. Hopefully by now you have it installed and ready to go, so I'll comment from a big picture point of view and let you reach your own conclusions.

The first feature that jumps right at me is its architecture, in particular the scripting support. All the variables it picks up from the servers it monitors are abstracted in JavaScript like objects and all the monitors, graphics and screens are based on these scripts. One the plus side, it adds a a lot of flexibility to how you can customize the alerts, monitors, rules and Dashboard display. On the other hand, this flexibility present some management challenges: customize thresholds, alerts and rules by servers or group of servers and backup of customized rules. None of these challenges are a showstopper and I'm sure MONyog will come up with solutions in future releases. Since everything is stored in SQLite databases and the repositories are documented, any SQLite client and some simple scripting is enough to get backups and workaround the limitations.

The agent-less architecture requires the definition of users to log into the database and the OS in order to gather the information it needs. The weak point here is that the credentials, including passwords, are stored in the clear in the SQLite databases. A way to secure this is to properly limit the GRANTs for the MySQL users and ssh using a DSA key instead of password. Again, no showstopper for most installations, but it needs some work from Webyog's side to increase the overall system security.

During our tests we ran against a bug in the SSH library used by MONyog. I engaged their Technical Support looking forward to evaluate their overall responsiveness. I have to say it was flawless, at no point they treated me in a condescending manner, made the most of the information I provided upfront and never wasted my time with scripted useless diagnostic routines. They had to provide me with a couple of binary builds, which they did in a very reasonably time frame. All in all, a great experience.

My Conclusion

MONyog doesn't provide any silver bullet or obscure best practice advice. It gathers all the environment variables effectively and presents it in an attractive and easy to read format. It's a closed source commercial software, the architecture is quite open through scripting and with well documented repositories which provides a lot of flexibility to allow for customizations and expansions to fit any installations needs. For installations with over 100 servers it might be more challenging to manage the servers configurations and the clear credentials may not be viable for some organizations. If these 2 issues are not an impediment, I definitively recommend any MySQL DBA to download the binaries and take it for a spin. It might be the solution you were looking for to keep an eye on your set of servers while freeing some time for other tasks.

Let me know what do you think and if you plan to be at the MySQL UC, look me up to chat. Maybe we can invite Rohit Nadhani from Webyog to join us.

Monday, March 8, 2010

Speaking At The MySQL Users Conference

My proposal has been accepted, yay!

I'll be speaking on a topic that I feel passionate about: MySQL Server Diagnostics Beyond Monitoring. MySQL has limitations when it comes to monitoring and diagnosing as it has been widely documented in several blogs.

My goal is to share my experience from the last few years and, hopefully, learn from what others have done. If you have a pressing issue, feel free to comment on this blog and I'll do my best to include the case in my talk and/or post a reply if the time allows.

I will also be discussing my future plans on sarsql. I've been silent about this utility mostly because I've been implementing it actively at work. I'll post a road map shortly based on my latest experience.

I'm excited about meeting many old friends (and most now fellow MySQL alumni) and making new ones. I hope to see you there!