The Data Charmer: partitioning

Showing posts with label partitioning. Show all posts

Monday, September 07, 2015

Sample employees database migrated to GitHub

It's migration time. There was another project that I use often and was still in Launchpad. The Sample Employees Database is now on GitHub, under the same license it had before (CC A-SA 3).

Figure 1 - Employees database
This database is interesting because it is not too small (like Sakila) and not too big. It has enough data to allow you to test in a non trivial way.

Replication and sandbox talks on the road - San Francisco, Santa Clara, Orlando

In a few weeks, I will be on the road, for an intense set of presentations in the USA.

San Francisco, April 7th

I will start the tour at the San Francisco MySQL User Group. On April 7, at 6pm I will talk about Advanced MySQL replication for the masses. This talk will explore topics such as bi-directional replication, multiple sources, parallel replication, seamless failover, with the help of Tungsten replicator.

Santa Clara, April 11, 12

The MySQL conference starts on Monday, April 11th, with several three-hour tutorials.

April 11, 9:00amMySQL Replication Advanced Techniques. This will cover MySQL replication beyond the basics. Everything you can do with replication, including, tips, tricks, and gotchas, plus some really advanced things that you can achieve with third party tools.
On tutorial day, at 7pm, there will be the 2011 MySQL community dinner (west), organized by Sarah Novotny and probably attended by the best MySQLers in town. If you want to get the latest gossip and socialize with the top MySQL geeks, this is the place to be.
April 12, 11:55am The Art of Sandboxing - Reducing Complex Systems to Manageable Boxes. After the State of the Dolphin, here's a chance of learning a few practical tips on how to build respectful systems.

Robert Hodges and Edward Archibald will also be presenting at the MySQL Conference.

April 11, 1:30pm Learn how to cure MySQL replication deprivation with Tungsten!
April 13, 2pmPreparing for the Big Oops: How to Build Disaster Recovery Sites for MySQL
TBD Build your own PaaS for MySQL with Tungsten Enterprise

On Tuesday evening I will fly to Orlando, to attend (part of) Collaborate11.

Orlando, April 13

April 13, 1pm Dealing with large data with MySQL partitioning and replication. This talk will explain how to combine replication and partitioning to achieve performance, without any external tool.
April 13, 4pmMySQL Sandbox : a toolkit for productive laziness. One of my favorite subjects. How to use the MySQL Sandbox to achieve quick results with minimal effort

Saturday, June 19, 2010

Welcome googleCL

I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using Google CL a tool that lets you use Google services from the command line.
I am a command line geek, and as soon as I saw the announcement, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.

GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the hood.
Using an easy to understand syntax, it allows you to access your blog, pictures, calendar, contacts, videos, and online documents at your fingertips.
For example, let's query my blog for partitioning:


$  google blogger --blog="The Data Charmer" --title=partitioning list "title,url"

Hmm. No results. The manual doesn't help much, but something happened during this query. The first thing ist that I was asked to authorize the script to access my blog, and that was done by activating a key that I got in the command line. So far, so good. The second thing was a message informing me that a default configuration file was created in my home directory. Looking at that file, I saw an option saying "regex = True". Aha! So the title supports regular expressions. Let's try:


$  google blogger --blog="The Data Charmer" --title=".*partitioning" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1  Improving ARCHIVE performance with partitioning

OK. This gives me everything with the word "partitioning" in the title. But I know that some titles are missing. Comparing with the results that I get online, I see that the titles where "partitioning" is capitalized are not reported. So the search is case sensitive. What I need to do is to tell the regular expression that I want a case insensitive search. Fortunately, I know how to speak regular expressions. Let's try again.


$  google blogger --blog="The Data Charmer" --title="(?i).*partitioning.*" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
Partitioning with non integer values using triggers
Tutorial on Partitioning at the MySQL Users Conference 2009
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1  Improving ARCHIVE performance with partitioning

Now I feel confident enough to do some changes to my online contents.
To create this blog post, I used some of googlecl capabilities. After I created an image, I uploaded it to my Picasa album using this command:


$google picasa post -n "Blogger Pictures" -t googlecl ~/Desktop/google_cl.png

Then I asked Picasa to give me the URL of the image:


$ google picasa list -n "Blogger Pictures" --query googlecl title,url_direct 
google_cl.png,https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigTW09cp-kY2qB2l0zbiCMfM3LTN-ZbjlrhKH0wlxFgOqodQ_Mf5D8aSlq_w2Hv4A2aJiHgqbQR3J8gNiUmmWgTAAPO4AHvyHDHUx6GbCZBlIZQL7QYO6JBxepsk9XuDx8q3Q8/

And then I inserted that URL in this blog post. Finally, I uploaded the blog post with this command:


google blogger --blog="The Data Charmer" --draft --title "Welcome googleCL" --tags="google,mysql,partitioning,command line,blogging" post ~/blog/welcome_googlecl.html

(Now writing online) And after I checked that the post was looking as I wanted it, I hit the "PUBLISH POST" button.
Welcome, GoogleCL!

Thursday, May 06, 2010

Two quick performance tips with MySQL 5.1 partitions

While I was researching for my partitions tutorial, I came across two hidden problems, which may happen often, but are somehow difficult to detect and even more difficult to fix, unless you know what's going on, and why. I presented both cases during my tutorial, but there were no pictures to convey the mechanics of the problem. Here is the full story.

TO_DAYS() prunes two partitions instead of one

If you are partitioning by date, chances are that you are using TO_DAYS(). And depending on how you have partitioned your table, your queries are as fast as you expect them to be. However, there are cases where your query takes twice as long as it should, and of course this will not make you happy.

For example, in a table partitioned by month, when your query searches for values within one specific month, EXPLAIN PARTITIONS tells you that the search involves two partitions (see figure above). This means that, instead of searching through 1,000,000 rows in one partitions, the partitions engine is searching through 2,000,000 rows in two partitions.
But why? The reasoning, as reported from the developers, is that

This is not a bug, since TO_DAYS() returns NULL for invalid dates, it needs to scan the first partition as well (since that holds all NULL values) for ranges.

Bug#49754: Partitioning by RANGE with TO_DAYS always includes first partition when pruning
This makes sense, from a developer's standpoint. From a user's experience, though, it's a bug.
Anyway, it doesn't do us any good to rant about it. Our query is still twice as slow as we want it. We need to take action. The workaround is to create an empty partition in first position. If we are creating a new table, it's simple. Just say

PARTITION p000 VALUES LESS THAN (0)

and all will be well. The partition pruning mechanism will still find two partitions, but since the first one is empty, it won't impact the performance.
If you have an existing table already partitioned, like in our example, then you need to perform a different operation

Now we have a different first partition, with no records. When we issue the same query, the partition pruning will look at partition p0, but it will skip it because there are no records.

Inserting single rows in partitions is slow

Also this bug is sometimes difficult to detect. If you want to test partitions in MySQL 5.1, probably you will take an existing table and convert it to a partitioned one, or you create a new table and load the contents from a dump. Either way, you are unlikely to insert millions of records with single INSERT statements. These single inserts are slower than bulk inserts in the first place, but with partitions there is an additional penalty. Whenever you insert a record, the partitioning engine locks the entire table. When you insert thousands of records, the partitioning engine will lock all partitions before the insert, and unlock them after the insert. If you have 500 partitions, that's 500 locks and 500 unlocks for every statement. Ouch!
It's a design problem, and it is not likely to be fixed without turning around the whole architecture of partitions. Also in this case, there is a bug report, Partitioning performance drops drastically with hundreds of partitions, although nobody says that this is a feature.
What can you do, then? You have several choices:

You can use a bulk insert. Instead of single statements, use INSERT with multiple records, or LOAD DATA INFILE.
Explicitly LOCK the table before inserting and UNLOCK it after you finish with all the inserts. This will avoid the overhead, although it won't make your table concurrently accessible until you finish.
If you use partitioning only to facilitate heavy queries, consider using a non-partitioned table on the master, and partitioned ARCHIVE tables on the slaves (see figure below).

As I have said many times in my presentations, always benchmark before using partitions in production. Whether you think that it will boost your performance or that it will slow things down, don't trust your instincts, and test. You may be up for a surprise.

Monday, January 11, 2010

MySQL user group meeting in Dubai, January 13th

On January 13th I will be in Dubai, UAE, on my way to Wellington, New Zealand.
I will be at the MySQL User Group, hosted at Sun Microsystems offices at 6pm.
I will talk about boosting performance with MySQL 5.1 partitions, covering the recent 5.5. additions.
The meeting is open to all. If you want to attend, please register at the meetup page.

Thursday, December 24, 2009

Holiday gift - A deep look at MySQL 5.5 partitioning enhancements

Half a day into my vacation, I managed to finish an article on a topic that has been intriguing me for a while.
Since several colleagues were baffled by the semantics of the new enhancements of MySQL 5.5 partitions, after talking at length with the creator and the author of the manual pages, I produced this article: A deep look at MySQL 5.5 partitioning enhancements.
Happy holidays!

UPDATE This matter was more tricky than it appeared at first sight. As Bug#49861 shows, several MySQL engineers were initially fooled by the multiple column partitions. Also I wrote something wrong in the article, and I updated the text to explain more accurately the behavior of the partitioning engine.

Tuesday, December 15, 2009

Getting started with MySQL 5.5

Some time go, we announced a new release model for MySQL. As all new things, it had some initial hiccups (with MySQL 5.4 we were still getting acquainted with the new model), but now it seems to be in full swing.
By the time you read these lines, MySQL 5.5 will be available. If the mirrors aren't seeded yet, the impatient can compile and use the new version from the launchpad source tree..

Overview

What's this new release anyway? I'll leave it to Kaj's blog to give you the full description. Suffice it to say that this release is the second milestone of the current installment. It is of beta quality, and it will mature to RC quality. There will be yet another milestone before we release a GA in mid 2010.
One thing that this milestone shows is that there are no dead weights. If a feature doesn't make the deadline, i.e. it doesn't reach beta quality by the scheduled date, it will be dropped, and eventually rescued at the next milestone.
With the introduction of the milestone model, we have also increased our internal QA, especially thanks to the Random Query Generator, which finds bugs in early stages of the code faster than any other method. ⁽¹⁾

Built-in InnoDB plugin

The InnoDB plugin 1.0.5 is included in the distribution, and, unlike MySQL 5.1, it's built-in. There is no need to load and register the plugin. The performance enhancements developed for MySQL 5.4 are now available together with the other enhancements available with the InnoDB plugin. This was already available in the previous milestone, but it's worth mentioning it now, because not many people are aware of that.

Semi-synchronous replication

Of all the new features, this one is probably the most relevant. It is based on a patch made by Google to the InnoDB engine, and adapted by MySQL developers to make it engine-independent.
In short, it's a safety device that establishes some internal communication between master and slaves, and makes sure that at least one slave has received the changes being committed. That is, before committing, the master waits until at least one slave has acknowledged that it has received the portion of binary log necessary to reproduce the transaction.
UPDATE As Harrison Fisk rightly notes, there is a mistake in my description. The acknowledgment of the transaction being sent to the slave only happens after the master's commit to the binary log.
Some caveats apply:

It's called semi-synchronous replication, because it doesn't necessarily apply to all the slaves. Although you can manually check if the replication has worked for all the slaves, it's enough for the master to make sure that at least one slave has got the goods.
Received by a slave doesn't mean "executed". The slave acknowledges when it has got the binary log, even if the SQL thread is busy or stopped.
If there is no slave that can acknowledge receipt (e.g. slaves are all down or stopped), then the master reverts to the normal asynchronous operation.

To use this feature, you need to install two plugins: one for the master and one for each slave. No need to compile anything, though. They are provided with the binaries. All you need to do is load the appropriate plugin for each server.

master >  INSTALL PLUGIN rpl_semi_sync_master SONAME 'libsemisync_master.so';

slave1 > INSTALL PLUGIN rpl_semi_sync_slave SONAME 'libsemisync_slave.so';

Additionally, there are a few variables that you must set, either in the options file or online.

master > SET GLOBAL rpl_semi_sync_master_enabled=1;
slave1 > SET GLOBAL rpl_semi_sync_slave_enabled=1;

Now that the system is ready, let's see how to use it.
Before doing anything, we ask for the value of two important status variables:

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 0     |
+-----------------------------+-------+

The first one is the number of failed synchronized transactions, the second one is the number of successful ones. Since nothing has happened so far, they are both zero.

create table t1 (i int not null primary key) engine=innodb;
Query OK, 0 rows affected (0.13 sec)

 SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 1     |
+-----------------------------+-------+

The first operation (a table creation) was successfully transferred to a slave. Let's do one more.

set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

insert into t1 values (1);
Query OK, 1 row affected (0.00 sec)

COMMIT;
Query OK, 0 rows affected (0.00 sec)

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 2     |
+-----------------------------+-------+

Also this one was successful.
Now, let's try something sneaky. On each slave, we execute "STOP SLAVE SQL_THREAD". Normal replication would not work, but semi-synchronous replication will go on.


insert into t1 values (2);
Query OK, 1 row affected (0.01 sec)

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 3     |
+-----------------------------+-------+
2 rows in set (0.00 sec)

The semi-synch replication has worked. However, if we query both master and slaves, only the master has the new record. The slaves have it only in their relay logs, which you can easily ascertain with mysqlbinlog.

Enhanced partitioning syntax

About one year ago, I briefly announced that this feature was in the making. With some interface improvement, it is now part of the regular partitioning. It's an extension of partitioning BY RANGE. As you know, you can only partition on one column value, and you can only partition on INTEGER columns. Both these restrictions were lifted in 5.5, with a syntax change that makes the code more readable and the overall feature more usable.
You can now partition by date, datetime, varchar, and char columns, not just integers, and you can use more than one column in your list. The most immediate usage of this extension is the ability of using dates without resorting to functions that convert the dates into integers. For example:

CREATE TABLE t2 
(dt date,a int, b int, c int)  
PARTITION BY RANGE COLUMNS (dt)  
(
PARTITION p0 VALUES LESS THAN ('2007-01-01'),
PARTITION p1 VALUES LESS THAN ('2008-01-01'),
PARTITION p2 VALUES LESS THAN ('2009-01-01'),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);

The COLUMNS keyword does the trick. The manual has more examples.

The partition helper has been updated to handle this new feature and generate partitions accordingly.

SIGNAL and RESIGNAL

If you have used stored routines extensively, you will certainly have asked yourself "why isn't there any way of raising an exception?" In the SQL standard, exception handling is implemented using the SIGNAL and RESIGNAL keywords, which were notably missing in MySQL 5.0 and 5.1 stored routines.
There have been many clever hacks by several community members to emulate the missing SIGNAL, but none were quite satisfactory. After long waiting here we have SIGNAl and RESIGNAL, which make stored routines programming much more robust and easier to debug. An authoritative example on how to use the new syntax is available in Roland Bouman's blog.

There is more. For the complete list of features, have a look at the official manual.
Happy hacking!

UPDATE Added more partitions to the example, as suggested by Jon.

⁽¹⁾ For the more technologically savvy, here's how Philip Stoev, one of my distinguished QA colleagues, describes the enhancements:

Historically, most of the MySQL tests have been manually created, however a modern database is so complex that it is impossible to test manually even a tiny percentage of the available functionality. Therefore for Betony [codename for MySQL 5.5], and the upcoming Celosia [5.6], the majority of our testing effort was concentrated around stochastic testing, using random data and millions of random queries to validate the behavior of the server across a wide range of scenarios and workloads.
For each new feature, and some existing ones, we automatically generated tests that attempt to cover all relevant SQL constructs, including the interaction between the feature being tested and existing code within the server. For features that have concurrency implications, we ran the random queries as a stress test or along with concurrent DDL statements. For areas such as the partitioning, we used the random queries to functionally validate the new code, by comparing the result from each query to a reference source, such as a previous version of the server.

Thursday, October 15, 2009

Spider and vertical partition engines with new goodies

The Spider storage engine should be already known to the community. Its version 2.5 has recently been released, with new features, the most important of which is that you can execute remote SQL statements in the backend servers. The method is quite simple. Together with Spider, you also get an UDF that executes SQL code in a remote server. You send a query with parameters saying how to connect to the server, and check the result (1 for success, 0 for failure). If the SQL involves a SELECT, the result can be sent to a temporary table. Simple and effective.

In addition to the Spider engine, Kentoku SHIBA has also created the vertical partitioning engine. Instead of splitting tables by record, you split them by columns. You can define a table with column A and column B, with primary key K, and another table with column C and column D, with primary key K. The vertical partition engine allows you to define a table with columns K, A, B, C, D, which looks to the user like a regular column. The backend tables can be of any engine.
There is a MySQL University session about the Spider and VP engines on November 26th at 15:00 CEST. Free attendance!
The slides are online: Sharding for the masses

Tuesday, September 15, 2009

Partitioning with non integer values using triggers

Looking at Bug#47310, which is a feature request that I hear frequently when I talk about partitions, I wrote a comment, suggesting triggers to work around the limitation.
The reason for the limitation is that allowing arbitrary functions for partitioning was too complex and it was provoking crashes and other unpleasant side effects (see the discussion under bug#18198).
But if you use a trigger, the resulting column is a plain integer, and many of the side effects disappear. The drawback is that you need to add a column to your table, and you need to use that column when searching for data. With that in mind, you can implement the workaround quite easily.


USE test;
DROP TABLE IF EXISTS users;

CREATE TABLE users (
        user_id int(10) NOT NULL,
        username varchar(25) DEFAULT NULL,
        dummy INT not null,
        PRIMARY KEY (user_id, dummy),
        UNIQUE KEY username(username,dummy)
) ;

CREATE TRIGGER users_bi
BEFORE INSERT ON users
FOR EACH ROW
SET NEW.dummy = ASCII(LOWER(LEFT(NEW.username,1))); 

ALTER TABLE users PARTITION BY RANGE (dummy) (
        PARTITION p0 VALUES LESS THAN  (96),  #being f
        PARTITION p1 VALUES LESS THAN (109),  #being m
        PARTITION p2 VALUES LESS THAN (115),  #being s
        PARTITION p3 VALUES LESS THAN (122)   #being z
); 

INSERT INTO users (user_id, username)
VALUES (1,'Joe'), (2,'Sam'),(3,'Abe'),(4,'Rich');

EXPLAIN PARTITIONS SELECT * FROM users 
where username = 'Abe'; 
# This simple query doesn't use partition pruning. 
# This is to be expected.

EXPLAIN PARTITIONS SELECT * FROM users 
where dummy = ASCII('a') and username = 'Abe'; 
# Here, the partition pruning kicks in, at the price of an extra
# condition in the query.

Thursday, August 20, 2009

MySQL Sandbox and Spider at FrOSCon and OpenSQLCamp

FrOSCon and the OpenSQLCamp are about to start.
I am packing for Sankt Augustin, where I will attend the fourth edition of FrOSCon and the second OpenSQLCamp. I will have two sessions, Sharding for the masses, about the Spider storage engine and MySQL Sandbox 3, about one of my favorite tools.

The program is very rich. There will be several tracks in the main event and in the associated conferences. If you have any involvement or simply some curiority in open source matters, You will find something interesting at FrOSCon.

Monday, July 06, 2009

Sharding for the masses: the spider storage engine

In my previous article about the Spider storage engine, I made some tests and I saw that the engine has potential. I did also identify some problems, which were promptly addressed by the author. I have looked at the latest version (0.12), and the results are very much encouraging.

Installing the Spider storage engine is not trivial but not extremely difficult either. My previous article has a step-by-step explanation. Before installing, though, you need to apply one patch to the server, to enable condition pushdown to the partitions engine. What is it about? In short, when you issue a query from the Spider table, with a WHERE clause, the query is passed to a backend database without the WHERE clause. True, it's filtered by partition, but each partition may have a large data set, and this means a full table scan on the backend server. In a few cases, this is not a big deal, because the general performance of the Spider engine is better than a non-partitioned table, but it can be improved.
The lack of conditions pushdown is a problem already known in the Federated engine. With the above patch applied, the Spider engine pushes the condition down to the back end, and the query is executed with the WHERE clause intact.
Let's consider this simple query:

select * from salaries where from_date='1996-01-10' limit 10;

Before applying the patch, the backend database would execute this:

select `emp_no`,`salary`,`from_date`,`to_date`
from `employees`.`salaries` limit 0,9223372036854775807

After the patch, the backend receives and execute the more sensible

select `emp_no`,`salary`,`from_date`,`to_date`
from `employees`.`salaries`
where (`from_date` = '1996-01-10')
limit 0,9223372036854775807

A final observation. While I was testing the latest version, I noticed something that should have been clear from the beginning, i.e. that a spider engine set of backends can have multiple front-ends. Meaning that you can create a Spider table from a different host, and access the same backends that another host is already using. In theory, it means more flexibility when dealing with large amount of data. In practice, we'll have to test it more and see what happens.

Monday, June 01, 2009

MySQL University - Boosting performance with partitions

Mark your calendars: A MySQL University session about Boosting performance with MySQL 5.1 will take place on Thursday, June 4th at 13:00 UTC ( 8am CDT (Central) / 9am EDT (Eastern) / 14:00 BST / 15:00 CET / 17:00 MDT (Moscow) / 18:30 IST (India))
The session will be conducted through DimDim, a system that allows you to follow the audio and visuals of a presentation from your browser, without any additional settings.

Attendance is free. Please follow the instructions given in the MySQL University main page.

Wednesday, April 29, 2009

Test driving the Spider storage engine - sharding for the masses

At the MySQL Conference 2009 I attended a session about the Spider storage engine, an engine with built-in sharding features.
The talk was notable for the speaker wearing a spiderman costume, and for some language barrier that made the talk less enjoyable than it should be. That's a pity, because the engine is very intriguing, and deserves some exploration.

What is the Spider engine, then? In short, it's an extension to the partitioning engine with the ability of connecting to remote servers. Basically, partitions + federated, except that Federated is explicitly removed during the compilation. Additionally, the spider engine should remove current limitations, such as single thread for replication and single source replication, but due to lack of specific documentation, I will limit my current experiment to the sharding features.
The only documentation available is the slide deck from the presentation, and some very reference parameters that come with the source code. I show here what I found by practical inspection.

building the engine

To compile the engine, you need the source code for MySQL 5.1.31 (as required by the documentation, but it works fine with later versions as well).
Download the source code from the launchpad repository and expand it. You will get a ./spider directory, which you need to move under the ./storage directory in the source. Then you compile, with these instructions:

autoconf
automake
./configure \
 --prefix=/usr/local/mysql \
 --localstatedir=/usr/local/mysql/data \
 --libexecdir=/usr/local/mysql/bin \
 --enable-thread-safe-client \
 --enable-local-infile --with-pic \
 --with-fast-mutexes \
 --with-client-ldflags=-static \
 --with-mysqld-ldflags=-static \
 --with-zlib-dir=bundled \
 --with-big-tables --with-ssl \
 --with-readline --with-embedded-server \
 --with-partition --without-innodb \
 --without-ndbcluster \
 --without-archive-storage-engine \
 --without-blackhole-storage-engine \
 --with-csv-storage-engine \
 --without-example-storage-engine \
 --without-federated-storage-engine \
 --with-extra-charsets=complex
make
./scripts/make_binary_distribution

Now we will use the MySQL Sandbox to create a testing environment. Let's start with a simple case: 1 main server and 4 remote ones.

make_sandbox $PWD/mysql-5.1.31-osx10.5-i386.tar.gz

This creates a sandbox under $HOME/sandboxes/msb_5_1_31, which is our main server. Before using it, we need to create some additional tables and load the plugin. (These queries are in the spider instructions, but they are hard to cut and paste. This is much easier for that purpose.)


create table if not exists mysql.spider_xa(
  format_id int not null default 0,
  gtrid_length int not null default 0,
  bqual_length int not null default 0,
  data char(128) not null default '',
  status char(8) not null default '',
  primary key (data, format_id, gtrid_length),
  key idx1 (status)
) engine=MyISAM default charset=utf8 collate=utf8_bin;
create table if not exists mysql.spider_xa_member(
  format_id int not null default 0,
  gtrid_length int not null default 0,
  bqual_length int not null default 0,
  data char(128) not null default '',
  scheme char(64) not null default '',
  host char(64) not null default '',
  port char(5) not null default '',
  socket char(64) not null default '',
  username char(64) not null default '',
  password char(64) not null default '',
  primary key (data, format_id, gtrid_length, host, port, socket)
) engine=MyISAM default charset=utf8 collate=utf8_bin;
create table if not exists mysql.spider_tables(
  db_name char(64) not null default '',
  table_name char(64) not null default '',
  priority bigint not null default 0,
  server char(64) default null,
  scheme char(64) default null,
  host char(64) default null,
  port char(5) default null,
  socket char(64) default null,
  username char(64) default null,
  password char(64) default null,
  tgt_db_name char(64) default null,
  tgt_table_name char(64) default null,
  primary key (db_name, table_name),
  key idx1 (priority)
) engine=MyISAM default charset=utf8 collate=utf8_bin;

install plugin spider soname 'ha_spider.so';
select engine,support,transactions,xa 
from information_schema.engines;
+------------+---------+--------------+------+
| engine     | support | transactions | xa   |
+------------+---------+--------------+------+
| SPIDER     | YES     | YES          | YES  | 
| MRG_MYISAM | YES     | NO           | NO   | 
| CSV        | YES     | NO           | NO   | 
| MyISAM     | DEFAULT | NO           | NO   | 
| MEMORY     | YES     | NO           | NO   | 
+------------+---------+--------------+------+

Spider is now enabled

preparing the remote servers

The servers used as remote shards can be conveniently replaced, for this experiment, by sandboxes. Let's create three of them, to illustrate the concept.

make_multiple_sandbox --group_directory=spider_dir \
  --sandbox_base_port=6000 --check_base_port  5.1.31

Now we have three sandboxes under $HOME/sandboxes/spider_dir, with ports ranging from 6001 to 6003.
What we need to do is to create, in each server, a table with the same structure as the one that is being sharded in the main server.

$ cd $HOME/sandboxes/spider_dir
$ cat tablea.sql 
drop schema if exists   myspider;
create schema myspider;
use myspider;

Create table tbl_a( 
        col_a int, 
        col_b int, 
        primary key(col_a) 
);

$ ./use_all "source tablea.sql"

The main server

Finally, we have all the components in place, we can create the table for the main server.

drop schema if exists   myspider;
create schema myspider;
use myspider;

Create table tbl_a( 
        col_a int, 
        col_b int, 
        primary key(col_a) 
) engine = Spider 
Connection ' table "tbl_a", user "msandbox", password "msandbox" ' 
partition by range( col_a ) 
( 
        partition pt1 values less than (1000) 
          comment 'host "127.0.0.1", port "6001"', 
        partition pt2 values less than (2000)
          comment 'host "127.0.0.1", port "6002"', 
        partition pt3 values less than (MAXVALUE)
          comment 'host "127.0.0.1", port "6003"' 
);

Compared to classic partitions, there is some new ground to cover. The "CONNECTION" clause is used to define the table name in the remote server. The schema name is assumed to be the same as the one in the main server.
For each partition, we can add a "COMMENT" clause, with the connection parameters. Since we are using sandboxes in the same host, we connect to 127.0.0.1, and use the port corresponding to each sandbox.
From now on, we can use this table almost transparently.

Using the spider engine

# on the main server
./use myspider

insert into tbl_a values (1,1);
Query OK, 1 row affected (0.00 sec)

insert into tbl_a values (1001,2);
Query OK, 1 row affected (0.01 sec)

insert into tbl_a values (2001,3);
Query OK, 1 row affected (0.00 sec)

select * from tbl_a;
+-------+-------+
| col_a | col_b |
+-------+-------+
|     1 |     1 | 
|  1001 |     2 | 
|  2001 |     3 | 
+-------+-------+
3 rows in set (0.01 sec)

So far, so good. No surprises, at least in the surface.
Now, where is the data? A close inspection to the files in the data directory shows that the data is not in the main server.
The data is stored in the "remote" servers, as we can check easily:


# in the spider_dir path
./use_all "select * from myspider.tbl_a"

# server: 1: 
col_a col_b
1 1
# server: 2: 
col_a col_b
1001 2
# server: 3: 
col_a col_b
2001 3

Now, let's apply some curiosity. What happens in the remote server when I insert a row in the main server? Probably the general log can give me an answer.

# spider_dir
./use_all 'set global general_log=1'

# main server
insert into tbl_a values (2002,3);
Query OK, 1 row affected (0.00 sec)

# spider_dir
$ tail node3/data/mysql_sandbox6003.log
090429 17:27:28  299 Connect    msandbox@localhost on  
090429 17:27:42  299 Query  set session sql_log_off = 1

Ah! No curious people allowed.
Well. This can stop a common user, but not a determined one.
MySQL Proxy to the rescue! There is a Lua script that handles logs.

./node2/proxy_start --proxy-lua-script=$PDW/logs.lua --log-level=warning

Let's change the main table definition:

...
        partition pt3 values less than (MAXVALUE)
        comment 'host "127.0.0.1", port "4040"'

And now we can see what happens.

# main server
insert into tbl_a values (2004,3);
Query OK, 1 row affected (0.00 sec)

#tail /tmp/mysql/log
2009-04-29 18:03:04    305 -- set session sql_log_off = 1 >{0} 
2009-04-29 18:03:04    305 -- set session autocommit = 1 >{0} 
2009-04-29 18:03:04    305 -- start transaction >{0} 
2009-04-29 18:03:04    305 -- set session transaction isolation level repeatable read >{0} 
2009-04-29 18:03:04    305 -- insert into `myspider`.`tbl_a`(`col_a`,`col_b`)values(2004,3) >{0} 
2009-04-29 18:03:04    305 -- commit >{0}

Hmmm. I don't like the sight of it. autocommit=1 and then start transaction, set session transaction and commit. At the very least, it's wasting three queries. This needs some explanation from the author, I guess. Let's try some data retrieval.

# main server
select * from tbl_a;
+-------+-------+
| col_a | col_b |
+-------+-------+
|     1 |     1 | 
|  1001 |     2 | 
|  2003 |     3 | 
|  2004 |     3 | 
|  1001 |     2 | 
|  2003 |     3 | 
|  2004 |     3 | 
+-------+-------+
7 rows in set (0.01 sec)

$tail /tmp/mysql.log
2009-04-29 18:01:07    303 -- set session sql_log_off = 1 >{0} 
2009-04-29 18:01:07    303 -- set session autocommit = 1 >{0} 
2009-04-29 18:01:07    303 -- start transaction >{0} 
2009-04-29 18:01:07    303 -- set session transaction isolation level repeatable read >{0} 
2009-04-29 18:01:07    303 -- show table status from `myspider` like 'tbl_a' >{0} 
2009-04-29 18:01:07    303 -- select `col_a`,`col_b` from `myspider`.`tbl_a` limit 0,9223372036854775807 >{0} 
2009-04-29 18:01:07    303 -- commit >{0}

Scarier than the previous one. The LIMIT clause spells trouble.
And this latest experiment made me try something more ambitious.
I installed a group of 20 sandboxes and loaded the employees test database (4.2 million records), spreading two partitioned tables across the backend servers.
Performance is better than using a single table, but slower than a normal partitioned table on a single server. And I think I know why.

# main server
select count(*) from salaries where from_date between '1995-01-01' and '1995-12-31';
+----------+
| count(*) |
+----------+
|   201637 | 
+----------+
1 row in set (0.76 sec)

$ tail /tmp/mysql.log
2009-04-29 18:09:51    307 -- set session sql_log_off = 1 >{0} 
2009-04-29 18:09:51    307 -- set session autocommit = 1 >{0} 
2009-04-29 18:09:51    307 -- start transaction >{0} 
2009-04-29 18:09:51    307 -- set session transaction isolation level repeatable read >{0} 
2009-04-29 18:09:51    307 -- show table status from `employees` like 'salaries' >{0} 
2009-04-29 18:09:51    307 -- select `emp_no`,`from_date` from `employees`.`salaries` order by `emp_no`,`from_date` limit 0,9223372036854775807 >{0}

This is definitely a waste. It's a problem that is similar to what is happening when using the Federated engine. But here, we get a "ORDER BY" clause that is unnecessary to say the least.

Bugs

During the tests, I spot at least two serious bugs.
When you drop a partition, the data in the remote server is not removed. If you recreate the partition and insert something, you get a "duplicate key" error.

When you drop a database, the table mysql.spider_tables does not get updated, with the result that you can't recreate the table. with the same name, unless you remove the corresponding entries manually.

That was a lot of information for one session. Please try it and comment. Don't expect me to provide answers to the reasons of the spider engine. I hope the author chimes in and clarifies the muddy matter.

Sunday, January 04, 2009

Q&A on MySQL 5.1

Listening to Sheeri's presentation on MySQL 5.1, I saw that there are a few questions left unanswered. I am listing here some of the questions that I found interesting, plus a few from an early webinar on the same topic.

Q: does Partitioning physically split data?: A: No. Some engines (MyISAM, Archive) do a physical split, but this is not necessary, as you see if you apply partitioning to a InnoDB table. Partitioning is a logical split of data, for easy retrieval. It is completely transparent to the user.
Q: Can you set partitions to different servers?: A: No. Partitions are logical parts of one table within one server. Partitioning through the Federated engine is not supported.
Q: How efficient are Row-Based Replication operations compared to Statement based ones?: A: RBR is faster when the insert or update is the result of an expensive operation. Otherwise, the efficiency for insertion and deletion is roughly equivalent. Updates on multiple records are usually more expensive with Row-Based Replication.
Q: Is the event scheduler polluting the Error Log?: A: yes, unfortunately. But it has been fixed in 5.1.31 (See also Bug#38066. As you can see from the discussion in the bug report page, it was object of a long and intense discussion.
Q: Can you send email through the Event Scheduler?: A: No. But you can integrate it with a hack using MySQL Proxy through Federated tables.
Q: Is there an equivalent to SHOW FULL PROCESSLIST from the INFORMATION SCHEMA?: A: Yes. Actually, SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST is equivalent to SHOW FULL PROCESSLIST. See the manual
Q: Are partitions supported in replication?: A: Yes. Partitions are fully supported in replication. The only problem you may have is when using the "DATA DIRECTORY" and "INDEX DIRECTORY" clauses, if the slave does not have the same directory structure and OS privileges as the master.
Q: Is the event scheduler supported in replication?: A: Yes. The event definition is replicated but left inactive, and the effects of the event scheduler are replicated as any other statement. When promoting a slave to master, it is necessary to manually activate the events.
Q: What happened to RENAME DATABASE? Why is not available anymore?: A: RENAME DATABASE was a command implemented in the early stages of MySQL 5.1, for the specific purpose of helping the upgrade script to set the database name with the appropriate charset. It had a deadly side effect, though. It removed all the objects associated with the database. Thus, it was removed and renamed in such a way that nobody would use it for simply renaming a database.
Q: How does the slow query log work with microseconds?: A: To catch slow queries with duration measured in less than 1 second, you can set the long_query_time global variable to a fractional value. E.g. set global long_query_time=0.5; will enable queries taking more than 1/2 second to be logged to the slow queries log.

Thursday, December 18, 2008

Tutorial on Partitioning at the MySQL Users Conference 2009

Another approval has arrived. I will be co-presenting a tutorial at the MySQL Users Conference 2009.
Partitioning in MySQL 5.1. My co-speaker is Sarah Sproehnle, a well known MySQL trainer. The topic is one of the flagship features of MySQL 5.1.

The tutorial will cover the basics, several practical scenarios, tips and tricks, and some sneak peeks of the latest enhancements being developed.

Wednesday, December 17, 2008

Backing up MySQL partitioned tables is nothing special

In case you were wondering, after seeing this self-defined tutorial, backing up MySQL 5.1 partitioned tables does not require any particular additional technology.
This tutorial presents MySQL 5.1 in a short paragraph, and then starts talking about a commercial backup application, as if it were the only way of backing up partitioned tables.
The title, for sure, is misleading. Backing Up MySQL Partitioned Tables. Reading such a title, readers expect either a list of possible solutions, or being presented with the only existing way of dealing with this task. Instead, the only instructions that you get is how to use the above mentioned tool, which is definitely not the only way of getting the job done. It is also doubly misleading because, since partitioning is a new technology, the casual reader may think "oh? What's peculiar with partitioned tables? do I need to take extra care to back them up?. Instead, there's nothing special about partitioned tables. Depending on the backup method, handling of partitioned tables is completely transparent for the user.
I don't doubt that this tool can do the job, but there are native tools with MySQL that can accomplish the task just as well.
There are several ways of backing up a partitioned table. But the first one that springs to mind is mysqldump.
The backup is as simple as:

mysqldump --opt db_name table_name > file.dump

The restore is equally simple:

mysql db_name < file.dump

Now, to be fair, I know one case where the backup of partitioned tables is problematic, but the one presented in the FUD tutorial was not it. The only possible trouble happens when you define partitions with different data directories and want to restore the table to a server that has a different directory structure. I had this problem two years ago, and I solved it by splitting the dump to keep the DDL and the DML in separate scripts. I wonder if there is a method that can detect and fix this problem on the spot.

Update: Here's the original article I am talking about.

howto_forge article about partition backup

Monday, December 01, 2008

The partition helper - Improving usability with MySQL 5.1 partitioning

I talked several times about partitioning usability. In my many tests of partitioning I found myself in need of generating list of partitions for a given range.
I made the Partition Helper (see code at the end of this post) to scratch this particular itch, i.e. making partitions reasonably quickly and without thinking too much.

The Partition Helper is a Perl script that converts some simple options into lots of partitions.
The detailed reference manual lists all the options and many examples.
Here, I would like to show just an example.
Say, you want to create partitions in table blog_posts by monthly intervals on column published. Given that you have data for the last two years, you use the partition helper like this:

./partitions_helper --table=blog_posts \
--column=published \
--interval=month \
--start=2007-01-01 \
--end=2009-01-01
ALTER TABLE blog_posts
PARTITION by range (to_date(published))
(
partition p001 VALUES LESS THAN (to_days('2007-01-01'))
, partition p002 VALUES LESS THAN (to_days('2007-02-01'))
, partition p003 VALUES LESS THAN (to_days('2007-03-01'))
, partition p004 VALUES LESS THAN (to_days('2007-04-01'))
, partition p005 VALUES LESS THAN (to_days('2007-05-01'))
, partition p006 VALUES LESS THAN (to_days('2007-06-01'))
, partition p007 VALUES LESS THAN (to_days('2007-07-01'))
, partition p008 VALUES LESS THAN (to_days('2007-08-01'))
, partition p009 VALUES LESS THAN (to_days('2007-09-01'))
, partition p010 VALUES LESS THAN (to_days('2007-10-01'))
, partition p011 VALUES LESS THAN (to_days('2007-11-01'))
, partition p012 VALUES LESS THAN (to_days('2007-12-01'))
, partition p013 VALUES LESS THAN (to_days('2008-01-01'))
, partition p014 VALUES LESS THAN (to_days('2008-02-01'))
, partition p015 VALUES LESS THAN (to_days('2008-03-01'))
, partition p016 VALUES LESS THAN (to_days('2008-04-01'))
, partition p017 VALUES LESS THAN (to_days('2008-05-01'))
, partition p018 VALUES LESS THAN (to_days('2008-06-01'))
, partition p019 VALUES LESS THAN (to_days('2008-07-01'))
, partition p020 VALUES LESS THAN (to_days('2008-08-01'))
, partition p021 VALUES LESS THAN (to_days('2008-09-01'))
, partition p022 VALUES LESS THAN (to_days('2008-10-01'))
, partition p023 VALUES LESS THAN (to_days('2008-11-01'))
, partition p024 VALUES LESS THAN (to_days('2008-12-01'))
, partition p025 VALUES LESS THAN (to_days('2009-01-01'))
);

The above command can be also expressed as

./partitions_helper --table=blog_posts --column=published --interval=month \
--start=2007-01-01 --partitions=25

Or you can shorten it, if you remember the small options:

./partitions_helper -t blog_posts -c published -i month -s 2007-01-01 -p 25

More options are available. You can see them using

./partitions_helper --help

Enjoy!

Update Here's the code to the Partition helper, published here after MySQL Forge was taken down.

#!/usr/bin/perl
#    The MySQL Partitions helper
#    Copyright (C) 2008, 2009 Giuseppe Maxia
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; version 2 of the License
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

#
# This program creates a ALTER TABLE statement to add or reorganize 
# date based partitions for MySQL 5.1 or later
#

use strict;
use warnings;
# use diagnostics;
use English qw( ‐no_match_vars ) ;
use Getopt::Long qw(:config no_ignore_case );
use Data::Dumper;

my $VERSION = '1.0.4';

#
# Parse options are fully qualified options with descriptive help,
# parse string for the command line, and sort order for the help
#
my %parse_options = (
    table           =>  {
                            value   => '',
                            parse   => 't|table=s',
                            help    => [
                                        'The table being partitioned',
                                        '(no default)'
                                       ],
                            so      =>  20,
                        },
    column          =>  {
                            value   => '',
                            parse   => 'c|column=s',
                            help    => [
                                        'The partitioning column',
                                        '(no default)',
                                       ],
                            so      =>  30,
                        },
    interval        =>  {
                            value   => 'month',
                            parse   => 'i|interval=s',
                            help    => [
                                        'The interval between partitions',
                                        'Accepted: "year", "month", "week", "day", or a number',
                                        '(default: month) ',
                                       ],
                            so      =>  40,
                        },
    partitions      =>  {
                            value   => 0,
                            parse   => 'p|partitions=i',
                            help    => [
                                        'How many partitions to create',
                                        '(default: 0) ',
                                       ],
                            so      =>  50,
                        },
    first_partition =>  {
                            value   => 1,
                            parse   => 'first_partition=i',
                            help    => [
                                        'the first partition to create',
                                        '(default: 1) ',
                                       ],
                            so      =>  60,
                        },
    reorganize      =>  {
                            value   => '',
                            parse   => 'reorganize=s',
                            help    => [
                                        'the partition(s) to reorganize',
                                        '(default: none) '
                                       ],
                            so      =>  70,
                        },
    
    start           =>  {
                            value   => '2001-01-01',
                            parse   => 's|start=s',
                            help    => [
                                        'the minimum partitioning value',
                                        '(default: 1 for numbers, 2001-01-01 for dates) '
                                       ],
                            so      =>  80,
                        },
    end             =>  {
                            value   => '',
                            parse   => 'e|end=s',
                            help    => [
                                        'The maximum partitioning value',
                                        'Used unless --partitions is used',
                                        '(no default) ',
                                       ],
                            so      =>  90,
                        },
    function             =>  {
                            value   => '',
                            parse   => 'f|function=s',
                            help    => [
                                        'The partitioning function to use in the "range" declaration',
                                        '(default: to_days, unless --list is used) ',
                                       ],
                            so      =>  100,
                        },
    list             =>  {
                            value   => 0,
                            parse   => 'l|list',
                            help    => [
                                        'Use the COLUMNS feature (versions >= 5.5)',
                                        '(default: no) ',
                                       ],
                            so      =>  110,
                        },
     maxvalue       =>  {
                            value   => 0,
                            parse   => 'x|maxvalue',
                            help    => [
                                        'Adds MAXVALUE as last partition',
                                        '(default: disabled) ',
                                       ],
                            so      =>  115,
                        },
    prefix           =>  {
                            value   => 'p',
                            parse   => 'prefix=s',
                            help    => [
                                        'prefix for partition name',
                                        '(default: p) ',
                                       ],
                            so      =>  120,
                        },
    explain          =>  {
                            value   => 0,
                            parse   => 'explain',
                            help    => [
                                        'show the current option values',
                                        '(default: no) ',
                                       ],
                            so      =>  130,
                        },

    version             =>  {
                            value   => 0,
                            parse   => 'version',
                            help    => [
                                        'display the version',
                                        '(default: none) ',
                                       ],
                            so      =>  400,
                        },
    help             =>  {
                            value   => 0,
                            parse   => 'h|help',
                            help    => [
                                        'display the help page',
                                        '(default: no) ',
                                       ],
                            so      =>  500,
                        },
);

# 
# convert parse options to simple options
#
my %options = map { $_ ,  $parse_options{$_}{'value'}}  keys %parse_options;

# 
# get the options from the command line
#
GetOptions (
    map { $parse_options{$_}{parse}, \$options{$_} }        
        grep { $parse_options{$_}{parse}}  keys %parse_options 
) or get_help();

get_help() if $options{help};

if ($options{version}) {
    print credits();
    exit 0;
}


# print Dumper(\%options) ; exit;

my %valid_intervals = (
    day   => 1,
    week  => 1,
    month => 1,
    year  => 1,
);

#
# start and end dates
#
my ($syear, $smonth, $sday) = ();
my ($eyear, $emonth, $eday) = ();

#
# deals with placeholder features
#

for my $op ( qw(operation) ) {
    if ($options{$op}) {
        die "option <$op> is not implemented yet\n";
    }
}

# 
# check that a table and column are given
#
unless ($options{table}) {
    die "table name required\n";
}

unless ($options{column} or $options{reorganize} ) {
    die "column name required\n";
}

#
# accept only one of --end or --partitions
#
if ($options{end} && $options{partitions}) {
    die "too many quantifiers. Use EITHER '--partitions' OR '--end' \n";
}

#
# check that we parsed a valid interval
#
if ( $options{interval} =~ /^\d+$/) {
    unless ($options{start} =~ /^\d+$/) {
        $options{start} = 1;
    }
    if ($options{end}) {
        unless ($options{end} =~ /^\d+$/) {
            die "the end value must be a number\n";
        } 
        if ($options{end} < $options{interval}) {
            die "the end value must be bigger than the interval\n";
        }
        if ($options{end} <= $options{start}) {
            die "the end value must be bigger than the start\n";
        }
        $options{partitions} = int( ($options{end} +1 - $options{start}) / $options{interval});
    }
}
else {
    unless (exists $valid_intervals{ $options{interval} } ) {
        die "invalid interval specified: $options{interval}\n";
    }
    #
    # for year,  month, or week the function must be to_days
    #
    unless ($options{list}) {
        $options{function} = 'to_days' unless $options{function};
    }

    #
    # check the start date
    #
    if ( $options{start} =~ /(\d{4})[\-\.\/](\d+)[\-\.\/](\d+)/) {
        ($syear, $smonth, $sday) = ($1, $2, $3);
        $smonth +=0;
        $sday +=0;
        #print "start $syear $smonth $sday\n";
    }
    else {
        die "invalid date $options{start}\n";
    }
    #
    # check the end date
    #
    if ($options{end}) {
        if ( $options{end} =~ /(\d{4})[\-\.\/](\d+)[\-\.\/](\d+)/) {
            ($eyear, $emonth, $eday) = ($1, $2, $3);
            $emonth +=0;
            $eday +=0;
            # print "end $eyear $emonth $eday\n";
        }
        else {
            die "invalid date $options{end}\n";
        }
        if ($options{interval} eq 'year') {
            $options{partitions} = $eyear - $syear +1;
        }
        elsif ($options{interval} eq 'month') {
            my $months =   (12 - $smonth) 
                 + ( ($eyear - $syear -1) * 12 )
                 + $emonth + 1;
            # print $months,$/;
            $options{partitions} = $months;
        }
        elsif ($options{interval} eq 'week') {
            $options{partitions} = weeks_between($syear, $smonth, $sday,
                                                 $eyear, $emonth, $eday);
        }
        elsif ($options{interval} eq 'day') {
            $options{partitions} = days_between($syear, $smonth, $sday,
                                                 $eyear, $emonth, $eday);
        }
    }       
}

#
# there must be a valid number of partitions
#

unless ($options{partitions} && ($options{partitions} =~ /^\d+$/) ) {
    die "number of partitions required. Use EITHER '--partitions' OR '--end'\n";
}

if ($options{partitions} > 1024) {
    die "too many partitions ($options{partitions}). Maximum allowed is 1024\n";
} 
else {
    print "# partitions: $options{partitions}\n";
}

#
# don't accept a function if COLUMS is being used
#
if ( $options{function} && $options{list} ) {
    die "you must specify either list or function\n";
}

if ($options{explain}) {
    show_options();
}

# print Dumper(\%options) ; exit;

# -----------------------------------------
# start building the ALTER TABLE statement
# -----------------------------------------

print "ALTER TABLE $options{table} \n";
if ($options{reorganize} ) {
    print "REORGANIZE PARTITIONS $options{reorganize} INTO \n";
    $options{prefix} = 'pr';
}
else {
    print "PARTITION by range " ;

    if ($options{function}) {
        print "($options{function}(";
    }
    elsif ($options{list}) {
        print "columns(" 
    }
    else {
        print "("
    }

    print "$options{column}";

    if ($options{function}) {
        print ")";
    }

    print ")\n";
}

print "(\n";

make_partitions( $options{interval});

print ");\n";


# -----------------------------------------
# functions
# -----------------------------------------

sub make_partitions {
    my ($interval) = @_;
    my $partitions_done=0;
    my $p_year  = $syear;
    my $p_month = $smonth;
    my $p_day = $sday;
    my $func_start = 'to_days('; #$options{function};
    my $func_end = ")";
    if ($options{list}) {
        $func_start = "";
        $func_end = "";
    }
    for my $part ($options{first_partition} .. 
            $options{first_partition} + $options{partitions} -1 ) {
        if ($partitions_done) {
            print ", ";
        }
        else {
            print "  "
        }
        $partitions_done++;

        if ($interval =~ /^\d+$/) {
            printf "partition %s%03d VALUES LESS THAN (%d)\n", 
                $options{prefix},
                $partitions_done,
                ($options{start} + (($partitions_done - 1) * $interval)) + $interval;
        }
        else {
            printf "partition %s%03d VALUES LESS THAN (%s'%4d-%02d-%02d'%s)\n", 
                $options{prefix},
                $partitions_done,
                $func_start,
                $p_year,
                $p_month,
                $p_day,
                $func_end;
            if ($interval eq 'day') {
                ($p_year,$p_month,$p_day) = next_day($p_year, $p_month, $p_day);
            }
            elsif ($interval eq 'week') {
                ($p_year,$p_month,$p_day) = next_week($p_year, $p_month, $p_day);
            }
            elsif ($interval eq 'month') {
                ($p_year,$p_month) = next_month($p_year, $p_month);
            }
            elsif ($interval eq 'year') {
                ($p_year,$p_month) = next_year($p_year, $p_month);
            }
            else {
                die "unsupported interval\n";
            }
        }
    }
    if ($options{'maxvalue'}) {
            printf ", partition %s%03d VALUES LESS THAN (MAXVALUE)\n", 
                $options{prefix},
                ++$partitions_done;
    }
}

sub next_year {
    my ($y, $m) = @_;
    $y++;
    return ($y, $m);
}

sub next_week {
    my ($y, $m, $d) = @_;
    for my $i (1 .. 7) {
        ($y, $m, $d) = next_day($y, $m, $d);
    }
    return ($y, $m, $d);
}

sub next_day {
    my ($y, $m, $d) = @_;
    $d++;
    $m += 0;
    my $last_day = days_in_month($y, $m);
    if ($d > $last_day) {
        $d = 1;
        $m++;
    }
    if ($m > 12) {
        $m = 1;
        $y++;
    }
    return ($y, $m, $d);
}

sub is_leap_year {
    my ($y) = @_;
    if (($y % 400) == 0) {
        return 1;
    }
    elsif (($y % 100) == 0) {
        return 0;
    }
    elsif (($y % 4) == 0) {
        return 1
    }
    return 0
}

sub days_in_month {
    my ($y, $m) = @_;
    $m = $m +0;
    my %last_day = (
        1 => 31,
        2 => 28,
        3 => 31,
        4 => 30,
        5 => 31,
        6 => 30,
        7 => 31,
        8 => 31,
        9 => 30,
        10=> 31,
        11=> 30,
        12=> 31,
    );
    if (($m ==2) and (is_leap_year($y))) {
        $last_day{2} = 29;
    }
    return $last_day{$m};
}

sub days_to_year_end {
    my ($y, $m, $d) = @_;
    my $days = days_in_month($y,$m) - $d +1 ;
    for my $month ( $m +1 .. 12 ) {
        $days += days_in_month($y, $month);
    }
    return $days;
}

sub months_between {
    my ($syear, $smonth,
        $eyear, $emonth) = @_;
 
    my $months =   (12 - $smonth) 
         + ( ($eyear - $syear -1) * 12 )
         + $emonth + 1;
    return $months;
}
 
sub days_between {
    my ($syear, $smonth, $sday,
        $eyear, $emonth, $eday) = @_;
    # print "start $syear, $smonth, $sday\n end $eyear, $emonth, $eday\n";
    my $days =0;
    if (  sprintf ("%4d%2d%2d", $eyear, $emonth, $eday) 
          lt 
          sprintf("%4d%2d%2d", $syear, $smonth, $sday) ) 
    {
        die "start interval > end interval\n";
    }
    while (    ($syear < $eyear) 
            or ( ($syear == $eyear) and ($smonth < $emonth) ) 
            or ( ($syear == $eyear) and ($smonth == $emonth) and ($sday < $eday) ) 
        ) {
        if ($syear < $eyear) {
            $days += days_to_year_end($syear, $smonth, $sday);
            $syear++;
            $smonth=1;
            $sday=1;
        }
        elsif ($smonth < $emonth) {
            $days += days_in_month($syear, $smonth) - $sday;
            ($syear, $smonth) = next_month($syear, $smonth);
            $sday =1;
        }
        elsif ($sday < $eday) {
            $days += $eday - $sday +1;
            $sday = $eday;
        }
    }
    return $days;
}

sub weeks_between {
    my ($syear, $smonth, $sday,
        $eyear, $emonth, $eday) = @_;
   my $days = days_between ($syear, $smonth, $sday,
            $eyear, $emonth, $eday);
    # print $days, "\n"; exit;
    return int ($days / 7) +1;
}

sub next_month {
    my ($y, $m) = @_;
    if ($m  == 12) {
        $m = 1;
        $y++;
    }
    else {
        $m++;
    }
    return ($y, $m);
}

sub get_help {
    my ($msg) = @_;
    if ($msg) {
        warn "[***] $msg\n\n";
    }

    my $HELP_MSG = q{};
    for my $op ( 
                sort { $parse_options{$a}{so} <=> $parse_options{$b}{so} } 
                grep { $parse_options{$_}{parse}}  keys %parse_options  ) {
        my $param =  $parse_options{$op}{parse};
        my $param_str = q{    };
        my ($short, $long ) = $param =~ / (?: (\w) \| )? (\S+) /x;
        if ($short) {
            $param_str .= q{-} . $short . q{ };
        } 
        $long =~ s/ = s \@? / = name/x;
        $long =~ s/ = i / = number/x;
        $param_str .= q{--} . $long;
        $param_str .= (q{ } x (40 - length($param_str)) );
        my $text_items = $parse_options{$op}{help};
        for my $titem (@{$text_items}) {
            $HELP_MSG .= $param_str . $titem . "\n";
            $param_str = q{ } x 40;
        }
        if (@{$text_items} > 1) {
            $HELP_MSG .= "\n";
        }
        # $HELP_MSG .= "\n";
   }

   print credits(),
          "syntax: $PROGRAM_NAME [options] \n", 
          $HELP_MSG;
    exit( $msg ? 1 : 0 );
}

sub credits {
    my $CREDITS = 
          qq(    The Partition Helper,  version $VERSION\n) 
        . qq(    This program creates a ALTER TABLE statement to add or reorganize\n )
        . qq(    partitions for MySQL 5.1 or later\n)
        . qq(    (C) 2008-2010 Giuseppe Maxia\n);
    return $CREDITS;
}

sub show_options {
    printf "# %-20s %-20s %s\n", 'options', 'default', 'value';
    printf "# %-20s %-20s %s\n", '-' x 20, '-' x 20, '-' x 20;
    for my $op ( sort { $parse_options{$a}{so} <=> $parse_options{$b}{so} }
            keys %parse_options) {
        my $v = $options{$op};
        my $d = $parse_options{$op}{value};
        printf "# %-20s %-20s %s\n", 
                $op, 
                '(' . (defined $d ? $d : '') . ')',  
                defined $v ? $v : '' ;
    }
    print '# ', '-' x 62, "\n";
}

Monday, November 03, 2008

A quick usability hack with partitioning

A few days ago I was describing a common grievance when using partitions.
When you care at a table, like the following

CREATE TABLE t1 ( d DATE ) 
PARTITION by range (to_days(d))
(
  partition p001 VALUES LESS THAN (to_days('2001-01-01')) 
, partition p002 VALUES LESS THAN (to_days('2001-02-01')) 
, partition p003 VALUES LESS THAN (to_days('2001-03-01'))
);

Then you have the problem of finding out the original values. SHOW CREATE TABLE doesn't help.

show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `d` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (to_days(d)) (PARTITION p001 VALUES LESS THAN (730851) ENGINE = MyISAM, PARTITION p002 VALUES LESS THAN (730882) ENGINE = MyISAM, PARTITION p003 VALUES LESS THAN (730910) ENGINE = MyISAM) */

I spent some time writing a MySQL Proxy module that, among other things, gives you the original values with SHOW CREATE TABLE. However, I was mostly reinventing the wheel in this case, because there is some useful data in the information_schema. Look:


select 
  PARTITION_NAME, 
  PARTITION_DESCRIPTION 
from 
  information_schema.PARTITIONS 
where 
  table_schema ='test' and table_name='t1';
+----------------+-----------------------+
| PARTITION_NAME | PARTITION_DESCRIPTION |
+----------------+-----------------------+
| p001           | 730851                | 
| p002           | 730882                | 
| p003           | 730910                | 
+----------------+-----------------------+

This is as informational as the output of SHOW CREATE TABLE, but having the data in a table, allows us to do the trick.


select 
  PARTITION_NAME , 
  from_days(PARTITION_DESCRIPTION) AS original_value 
from 
  information_schema.PARTITIONS 
where 
  table_schema ='test' and table_name='t1';
+----------------+----------------+
| PARTITION_NAME | original_value |
+----------------+----------------+
| p001           | 2001-01-01     | 
| p002           | 2001-02-01     | 
| p003           | 2001-03-01     | 
+----------------+----------------+

More on this subject when I finish working on my Proxy partition helper.

Thursday, October 30, 2008

Partitions usability I - blues and a preview

Pop quiz. Look at the picture below. Quick: can you tell me to which year and month partition P001 refers?

I couldn't say it without asking the database itself to revert the output of to_days:


select from_days(723180);
+-------------------+
| from_days(723180) |
+-------------------+
| 1980-01-01        | 
+-------------------+

Just to make the pain more clear, you do something like this, clean and neat, and you think you have done a good job.


CREATE TABLE t1 ( d DATE ) 
PARTITION by range (to_days(d))
(
  partition p001 VALUES LESS THAN (to_days('2001-01-01')) 
, partition p002 VALUES LESS THAN (to_days('2001-02-01')) 
, partition p003 VALUES LESS THAN (to_days('2001-03-01'))
);

And then, what the database server is keeping, is really hard to use.


show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `d` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (to_days(d)) (PARTITION p001 VALUES LESS THAN (730851) ENGINE = MyISAM, PARTITION p002 VALUES LESS THAN (730882) ENGINE = MyISAM, PARTITION p003 VALUES LESS THAN (730910) ENGINE = MyISAM) */

If you made a mistake and you look at the code, you will need to convert all the values using FROM_DAYS to make sure that the dates were as intended.
An additional pain is caused by the unbroken long line of partition definition. (Bug#14326). A patch was created, and it should eventually be applied soon.

New features around the corner

Well, not really around the corner, but there is hope that this enhancement will be available soon.
The change is conceptually simple, but momentous for users. You will be allowed to use native data types, without need of converting them to integers.
Here's how the above DDL will be rewritten:


CREATE TABLE t1 ( d DATE ) 
PARTITION by range column_list(d)
(
  partition p001 VALUES LESS THAN (column_list('2001-01-01')) 
, partition p002 VALUES LESS THAN (column_list('2001-02-01')) 
, partition p003 VALUES LESS THAN (column_list('2001-03-01'))
);

SHOW CREATE Table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `d` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE  COLUMN_LIST(d) 
(PARTITION p001 VALUES LESS THAN ( COLUMN_LIST('2001-01-01')) ENGINE = MyISAM,
PARTITION p002 VALUES LESS THAN ( COLUMN_LIST('2001-02-01')) ENGINE = MyISAM,
PARTITION p003 VALUES LESS THAN ( COLUMN_LIST('2001-03-01')) ENGINE = MyISAM) */

Not only you will use native data types without conversion, but the partition definition will be human readable. If you made a mistake somewhere, you will find out without need for recalculations.
When is the above marvel going to be available? Now!, with a Launchpad tree. Although I can't say when it will be released. It won't be in 5.1, because it's almost GA. It could be in a quick point release (5.2) or a major release (6.0).
And what about the error prone procedure of creating partitions? If I need to make monthly partitions from 1985 to 2008, doing that manually is going to be a pain, and a source of errors.
Stay tuned. I have a solution handy.
Till next time.

Thursday, October 02, 2008

Using partitions to create constraints

A devilish quiz by Scott Noyes has made me thinking about a side effect of partitioning.

The quiz

Given a table trickytable with a INT field, explain how these statements and results are possible:

INSERT IGNORE INTO trickyTable (id) VALUES (1);
Query OK, 0 rows affected (0.00 sec)

SELECT * FROM trickyTable;
Empty set (0.00 sec)

There are no blackhole, federated, triggers, proxy.
I initially tried with the event scheduler, with an event that empties the table every second, but the insert results in 1 row to be inserted, while the quiz asked for 0 rows.

After a few minutes, I got it. The right answer was to use partitioning.

Let me show off a bit. :)
Table trickytable was created with partitions by range or list , with a partition that does not include 1.

Either of the following will do the trick:

create table trickytable (id int)
partition by list (id) (partition p1 values in (2));

#or

create table trickytable (id int)
partition by range (id) (partition p1 values less than (1));

If you insert "1" in this table, without the IGNORE keyword, you get an error:

INSERT INTO trickytable (id) VALUES (1);
ERROR 1526 (HY000): Table has no partition for value 1

Using IGNORE means "don't react to errors". This is used, for example, when inserting duplicated values, to have them discarded without triggering errors.
Thus, adding the IGNORE keyword will discard the error, leaving no rows to be inserted. Thus the "0 rows affected" message and the resulting empty table.

Well spot, Scott!

Practical use

Using the same concept, we can use partitioning to enforce a constraint.
For example, let's assume that we have a table with part_number and category_id, and we want only products belonging to categories 10, 25, 40, and 53 to be in this table.
Using a partition by list does the trick nicely:

create table special_products (
 part_no int not null,
 category_id int not null,
 primary key (part_no, category_id)
) engine = InnoDB
partition by list (category_id)
(
 partition p1 values in (10, 25, 40, 53)
);
insert into special_products values (1, 53), (2, 25);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

insert into special_products values (1, 23);
ERROR 1526 (HY000): Table has no partition for value 23

If we want to include category 23 into the list, we can use REORGANIZE PARTITION

alter table special_products
reorganize partition p1 into
(
 partition p1 values in (10, 23, 25, 40, 53)
);

Or we can actually add more partitions

alter table special_products
reorganize partition p1 into
(
 partition p1 values in (10, 23, 25),
 partition p2 values in (40, 53)
);

This is an efficient way of adding a constraint (within the limits of partitioning, which accept only integer values and a fixed set of functions).