Friday, January 14, 2011

About InnoDB Index Size Limitations

This is mostly a reflection on a limitation in InnoDB that, in my opinion, has persisted for too long. I founded while reviewing the Amarok media player. The player uses MySQL in the backend, embedded or regular server, so it makes for a great source of real life data.

The Issue

By default, Amarok uses MyISAM tables. This means that if it crashes or stops unexpectedly (a logout while playing music may cause this), the latest updates to the DB are all lost. So I've been looking into using InnoDB instead to avoid loosing my playlists or player statistics.

The Problem

The limitation that bothers me is this one: "Index key prefixes can be up to 767 bytes" which has been in place for several years.
Take this Amarok table for example:
CREATE TABLE urls (
    id int(11) NOT NULL AUTO_INCREMENT,
    deviceid int(11) DEFAULT NULL,
    rpath varchar(324) COLLATE utf8_bin NOT NULL,
    directory int(11) DEFAULT NULL,
    uniqueid varchar(128) COLLATE utf8_bin DEFAULT NULL,

    PRIMARY KEY (id),
    UNIQUE KEY uniqueid (uniqueid),
    UNIQUE KEY urls_id_rpath (deviceid, rpath),
    KEY urls_uniqueid (uniqueid)
) ENGINE=MyISAM AUTO_INCREMENT=314
DEFAULT CHARSET=utf8 COLLATE=utf8_bin
The result of an ALTER TABLE to convert it to InnoDB:
alter table urls engine=InnoDB;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

The Rant

Note that the maximum key length is in bytes, not characters. So lets review the rpath column. This column stores the path to the media file in the catalog, which could easily be something like: /very/long/path/to/find/a/file/with/music.mp3. If it only uses English alphabet characters it's not very long, but as soon as you start using some multi-byte characters (ie: ñ, ç, ü, etc) the length of the string starts to increase in bytes (ie: ó = 4 bytes). A simple query shows the diference:

select id, rpath, bit_length(rpath) / 8 as bytes, 
char_length(rpath) as chars 
from urls limit 1;
+----+-----------------------------------------+---------+-------+
| id | rpath                                   | bytes   | chars |
+----+-----------------------------------------+---------+-------+
|  1 | ./home/gnarvaja/Music/Dodododódodo.mp3 | 41.0000 |    39 |
+----+-----------------------------------------+---------+-------+

So how big can the index be in bytes?

I let MySQL answer this question for me. I created a similar test table with only a PRIMARY and then recreated the index on rpath. Here's the command sequence and it's output:
CREATE TABLE urls_test (
  id int(11) NOT NULL AUTO_INCREMENT,
  rpath varchar(324) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

CREATE INDEX rpath_idx ON urls_test (rpath);
Query OK, 0 rows affected, 2 warnings (0.32 sec)

show warnings;
+---------+------+---------------------------------------------------------+
| Level   | Code | Message                                                 |
+---------+------+---------------------------------------------------------+
| Warning | 1071 | Specified key was too long; max key length is 767 bytes |
| Warning | 1071 | Specified key was too long; max key length is 767 bytes |
+---------+------+---------------------------------------------------------+
2 rows in set (0.00 sec)

SHOW CREATE TABLE urls_test\G
*************************** 1. row ***************************
       Table: urls_test
Create Table: CREATE TABLE `urls_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `rpath` varchar(324) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `rpath_idx` (`rpath`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
1 row in set (0.00 sec)

So down to 255 characters from the original 324. In this case (a music player) it may not be a significant loss, but in a world where applications should support an increasing number of International character sets, in particular Asian ones, this limitation could potentially become serious.

I'm not a source code expert, so I'm not sure what it would take to remove, or at least expand the maximum key size. InnoDB's maximum key length (see URL quoted at the beginning of the article) seems like a good number: 3500 bytes ... or I may be overestimating the need for keys bigger than 255 bytes ... your call.