The Data Charmer: stored procedures

Showing posts with label stored procedures. Show all posts

Saturday, January 22, 2011

Pitfalls of monitoring MySQL table activity with stored routines

A friend of mine needed a quick method to monitor a table that was frequently updated. There were several procedures writing to the table several times per second. He needed a quick way of determining how many updates per second the table was getting.
The table has only one row, which includes a counter that is increased at each operation. Therefore, the simple plan was:

Let's get the counter from the table;
Allow N seconds to pass;
Get the counter again;
The difference between the second counter and the first counter, divided by the number of seconds gives the updates per second.

The plan makes sense, and if you run the above commands manually, you get what you want.

However, my friend wanted the update ratio to be a single operation, say like:


SELECT update_ratio();

He went to make a simple function, following the four steps described above.


delimiter //
drop function if exists update_ratio //
create function update_ratio()
RETURNS INT 
begin
     declare sleep_wait int default 5;
     declare start int default 1;
     declare finish int default 1;

     set start = (select counter from mytable);
     do sleep(sleep_wait);
     set finish = (select counter from mytable);
     return (finish-start)/sleep_wait;
end $$
delimiter ;

It seems OK. The function runs without errors, but it always returns zero.
Mystery! Running the statements manually gives always a sensible result. Using triggers to monitor the table shows that indeed it is updated many times per second, but the function returns always zero.
More puzzling is the fact that if we convert the function to a procedure, it gives the wanted result.

The solution to the mystery is found in the MySQL online manual

A stored function acquires table locks before executing, to avoid inconsistency in the binary log due to mismatch of the order in which statements execute and when they appear in the log.

In other words, it means that all tables referenced in a stored functions are locked when the function starts. Therefore the external procedures that were updating the table will have to wait until the function's end before updating. When the function reads from the table, it gets always the same record counter, because no updates were happening in the meantime. That's why the second read is the same as the first one, and the result is zero.

What should you do then?
One option is to convert the function into a procedure:


delimiter //
drop procedure if exists show_update_ratio //
create procedure show_update_ratio()
begin
     declare sleep_wait int default 5;
     declare start int default 1;
     declare finish int default 1;
    
     select counter into start from mytable;
     do sleep(sleep_wait);
     select counter into finish from mytable;
     SET @UPS := (finish-start)/sleep_wait;
end //

That gets the job done. If you want to get the result into a variable, you can do it with two statements.


call show_update_ratio();
select @UPS;

If you don't change the last SET into a SELECT and just display the value.

Another option is using several SQL commands from your application. Also in this case, make sure that you are NOT wrapping this code inside a transaction, or you will get the same result in both queries

# WRONG!
     set autocommit=0;
     BEGIN;
     select counter into @start from mytable;
     set @start = start;
     do sleep(5);
     select counter into @finish from mytable;
     select (@finish - @start) / 5 as UPS;

If you go for this solution (or even the stored procedure), make sure that you are either using autocommit, or commit after each query if you must use a transaction.

Thursday, December 16, 2010

Some hidden goods in MySQL 5.5

The announcement of MySQL 5.5 released as GA has outlined the improvements in this version, which indeed has enough good new features to excite most any user.
There are two additions, though, that were lost in the noise of the bigger features, and I would like to spend a few words for each of them.

The first addition is something that users of stored routines have been waiting for since MySQL 5.0. No, it is not SIGNAL and its close associate RESIGNAL, which have been publicized enough. I am talking about the stored routine parameters, for which now there is a dedicated table in the information_schema.
Let's see an example, with a simple procedure that uses three parameters.


drop procedure if exists add_to_date ;
create procedure add_to_date(in d date, in i int, out nd date)
deterministic
    set nd = d + interval i day;

This works as expected in both 5.1 and 5.5. (Never mind that it's redundant. I know it. It's only for the sake of keeping the example short).


 call add_to_date('2010-12-15',10,@new_date);
Query OK, 0 rows affected (0.00 sec)

 select @new_date;
+------------+
| @new_date  |
+------------+
| 2010-12-25 |
+------------+
1 row in set (0.00 sec)

The difference starts to show when you want to deal with this procedure programmatically. If you need to find out which parameters are expected by this procedure, your only option in MySQL 5.1 is parsing the result of SHOW CREATE PROCEDURE add_to_date. Not terribly difficult in any scripting language, but a hassle in SQL.
In MySQL 5.5, instead, you can easily get the routine parameters with a simple query:


 select parameter_name, parameter_mode,data_type from information_schema. parameters where specific_schema='test' and specific_name= 'add_to_date' order by ordinal_position;
+----------------+----------------+-----------+
| parameter_name | parameter_mode | data_type |
+----------------+----------------+-----------+
| d              | IN             | date      |
| i              | IN             | int       |
| nd             | OUT            | date      |
+----------------+----------------+-----------+
3 rows in set (0.00 sec)

Speaking of the information_Schema, there are more goodies that were not emphasized enough. The Innodb engine that you find in the server is the evolution of the InnoDB plugin that ships with MySQL 5.1. Only that it is now built-in. What many people forget to mention is that the plugin (and thus the current InnoDB engine in 5.5) comes provided with its own InnoDB-specific instrumentation tables in the information_schema.


show tables like 'innodb%';
+----------------------------------------+
| Tables_in_information_schema (innodb%) |
+----------------------------------------+
| INNODB_CMP_RESET                       |
| INNODB_TRX                             |
| INNODB_CMPMEM_RESET                    |
| INNODB_LOCK_WAITS                      |
| INNODB_CMPMEM                          |
| INNODB_CMP                             |
| INNODB_LOCKS                           |
+----------------------------------------+
7 rows in set (0.00 sec)

This is the same set of tables that you may have seen if you have worked with the InnoDB plugin in 5.1. In short, you can get a lot of the info that you used to look at in the output of SHOW ENGINE INNODB STATUS. For more information, you should look at what the InnoDB plugin manual says on this topic.
I don't know if the tables can replace the SHOW ENGINE INNODB STATUS. Perhaps someone can comment on this issue and provide more information?

Tuesday, December 15, 2009

Getting started with MySQL 5.5

Some time go, we announced a new release model for MySQL. As all new things, it had some initial hiccups (with MySQL 5.4 we were still getting acquainted with the new model), but now it seems to be in full swing.
By the time you read these lines, MySQL 5.5 will be available. If the mirrors aren't seeded yet, the impatient can compile and use the new version from the launchpad source tree..

Overview

What's this new release anyway? I'll leave it to Kaj's blog to give you the full description. Suffice it to say that this release is the second milestone of the current installment. It is of beta quality, and it will mature to RC quality. There will be yet another milestone before we release a GA in mid 2010.
One thing that this milestone shows is that there are no dead weights. If a feature doesn't make the deadline, i.e. it doesn't reach beta quality by the scheduled date, it will be dropped, and eventually rescued at the next milestone.
With the introduction of the milestone model, we have also increased our internal QA, especially thanks to the Random Query Generator, which finds bugs in early stages of the code faster than any other method. ⁽¹⁾

Built-in InnoDB plugin

The InnoDB plugin 1.0.5 is included in the distribution, and, unlike MySQL 5.1, it's built-in. There is no need to load and register the plugin. The performance enhancements developed for MySQL 5.4 are now available together with the other enhancements available with the InnoDB plugin. This was already available in the previous milestone, but it's worth mentioning it now, because not many people are aware of that.

Semi-synchronous replication

Of all the new features, this one is probably the most relevant. It is based on a patch made by Google to the InnoDB engine, and adapted by MySQL developers to make it engine-independent.
In short, it's a safety device that establishes some internal communication between master and slaves, and makes sure that at least one slave has received the changes being committed. That is, before committing, the master waits until at least one slave has acknowledged that it has received the portion of binary log necessary to reproduce the transaction.
UPDATE As Harrison Fisk rightly notes, there is a mistake in my description. The acknowledgment of the transaction being sent to the slave only happens after the master's commit to the binary log.
Some caveats apply:

It's called semi-synchronous replication, because it doesn't necessarily apply to all the slaves. Although you can manually check if the replication has worked for all the slaves, it's enough for the master to make sure that at least one slave has got the goods.
Received by a slave doesn't mean "executed". The slave acknowledges when it has got the binary log, even if the SQL thread is busy or stopped.
If there is no slave that can acknowledge receipt (e.g. slaves are all down or stopped), then the master reverts to the normal asynchronous operation.

To use this feature, you need to install two plugins: one for the master and one for each slave. No need to compile anything, though. They are provided with the binaries. All you need to do is load the appropriate plugin for each server.

master >  INSTALL PLUGIN rpl_semi_sync_master SONAME 'libsemisync_master.so';

slave1 > INSTALL PLUGIN rpl_semi_sync_slave SONAME 'libsemisync_slave.so';

Additionally, there are a few variables that you must set, either in the options file or online.

master > SET GLOBAL rpl_semi_sync_master_enabled=1;
slave1 > SET GLOBAL rpl_semi_sync_slave_enabled=1;

Now that the system is ready, let's see how to use it.
Before doing anything, we ask for the value of two important status variables:

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 0     |
+-----------------------------+-------+

The first one is the number of failed synchronized transactions, the second one is the number of successful ones. Since nothing has happened so far, they are both zero.

create table t1 (i int not null primary key) engine=innodb;
Query OK, 0 rows affected (0.13 sec)

 SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 1     |
+-----------------------------+-------+

The first operation (a table creation) was successfully transferred to a slave. Let's do one more.

set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

insert into t1 values (1);
Query OK, 1 row affected (0.00 sec)

COMMIT;
Query OK, 0 rows affected (0.00 sec)

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 2     |
+-----------------------------+-------+

Also this one was successful.
Now, let's try something sneaky. On each slave, we execute "STOP SLAVE SQL_THREAD". Normal replication would not work, but semi-synchronous replication will go on.


insert into t1 values (2);
Query OK, 1 row affected (0.01 sec)

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 3     |
+-----------------------------+-------+
2 rows in set (0.00 sec)

The semi-synch replication has worked. However, if we query both master and slaves, only the master has the new record. The slaves have it only in their relay logs, which you can easily ascertain with mysqlbinlog.

Enhanced partitioning syntax

About one year ago, I briefly announced that this feature was in the making. With some interface improvement, it is now part of the regular partitioning. It's an extension of partitioning BY RANGE. As you know, you can only partition on one column value, and you can only partition on INTEGER columns. Both these restrictions were lifted in 5.5, with a syntax change that makes the code more readable and the overall feature more usable.
You can now partition by date, datetime, varchar, and char columns, not just integers, and you can use more than one column in your list. The most immediate usage of this extension is the ability of using dates without resorting to functions that convert the dates into integers. For example:

CREATE TABLE t2 
(dt date,a int, b int, c int)  
PARTITION BY RANGE COLUMNS (dt)  
(
PARTITION p0 VALUES LESS THAN ('2007-01-01'),
PARTITION p1 VALUES LESS THAN ('2008-01-01'),
PARTITION p2 VALUES LESS THAN ('2009-01-01'),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);

The COLUMNS keyword does the trick. The manual has more examples.

The partition helper has been updated to handle this new feature and generate partitions accordingly.

SIGNAL and RESIGNAL

If you have used stored routines extensively, you will certainly have asked yourself "why isn't there any way of raising an exception?" In the SQL standard, exception handling is implemented using the SIGNAL and RESIGNAL keywords, which were notably missing in MySQL 5.0 and 5.1 stored routines.
There have been many clever hacks by several community members to emulate the missing SIGNAL, but none were quite satisfactory. After long waiting here we have SIGNAl and RESIGNAL, which make stored routines programming much more robust and easier to debug. An authoritative example on how to use the new syntax is available in Roland Bouman's blog.

There is more. For the complete list of features, have a look at the official manual.
Happy hacking!

UPDATE Added more partitions to the example, as suggested by Jon.

⁽¹⁾ For the more technologically savvy, here's how Philip Stoev, one of my distinguished QA colleagues, describes the enhancements:

Historically, most of the MySQL tests have been manually created, however a modern database is so complex that it is impossible to test manually even a tiny percentage of the available functionality. Therefore for Betony [codename for MySQL 5.5], and the upcoming Celosia [5.6], the majority of our testing effort was concentrated around stochastic testing, using random data and millions of random queries to validate the behavior of the server across a wide range of scenarios and workloads.
For each new feature, and some existing ones, we automatically generated tests that attempt to cover all relevant SQL constructs, including the interaction between the feature being tested and existing code within the server. For features that have concurrency implications, we ran the random queries as a stress test or along with concurrent DDL statements. For areas such as the partitioning, we used the random queries to functionally validate the new code, by comparing the result from each query to a reference source, such as a previous version of the server.

Friday, October 10, 2008

Community at work - SIGNAL

Long time ago, I saw a blog post by Jorge Bernal, with a simple implementation of SIGNAL for MySQL stored procedures. If you have ever tried to write MySQL stored procedures, you know how dearly missed is this feature.
I discussed this feature internally, and everyone told me "don't bother, we're going to implement SIGNAL in MySQL 6.1". And indeed, the full implementation for SIGNAL and RESIGNAL is in the roadmap.

What does that mean? Should we wait two or three more years to have a SIGNAL feature? Wouldn't it be nice to have a minimal SIGNAL working today, and get the full thing when it's ready?

A bird in the hand ...

I would like to have SIGNAL available today, and therefore I tested Jorge's patch in the latest 5.1.28 code. It needs some more twisting to get it working. If you want to try it, make sure you have the current patch.
Once you apply the patch and compile the code, you will have a server that accepts a simple "SIGNAL" command, without any parameter.
Here's a simple usage example:
First, you create two stored procedures:


delimiter //

drop procedure if exists less_than_10 //
create procedure less_than_10(i int )
deterministic
begin
    if ( i < 10)
    then
        set @comment = concat('a valid number was entered (', i,')');
    else
        set @comment = concat('number too high (', i,')') ;
        SIGNAL ;
    end if;
    select concat('everything fine [',@comment,']')  as comment;
end//

drop procedure if exists trapping //
create procedure trapping ()
deterministic
begin
    declare mystatus varchar(20) default 'ok';
    declare continue handler for SQLSTATE '38503'
        set mystatus = 'ERROR';
    call less_than_10(7);
    select mystatus, @comment;
    call less_than_10(17);
    select mystatus, @comment;
    select 'and this is after the error';
end //

delimiter ;

And then we'll try out the simple one, by calling directly the procedure with a SIGNAL.


call less_than_10(2);
+--------------------------------------------------+
| comment                                          |
+--------------------------------------------------+
| everything fine [a valid number was entered (2)] |
+--------------------------------------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

select @comment;
+--------------------------------+
| @comment                       |
+--------------------------------+
| a valid number was entered (2) |
+--------------------------------+
1 row in set (0.00 sec)

call less_than_10(12);
ERROR 1340 (38503): Exception generated from user-defined function/procedure

select @comment;
+----------------------+
| @comment             |
+----------------------+
| number too high (12) |
+----------------------+
1 row in set (0.00 sec)

next, we call the procedure that traps the exception. This one has full control of the execution flow, with SIGNAL and HANDLER.


call trapping();
+--------------------------------------------------+
| comment                                          |
+--------------------------------------------------+
| everything fine [a valid number was entered (7)] |
+--------------------------------------------------+
1 row in set (0.00 sec)

+----------+--------------------------------+
| mystatus | @comment                       |
+----------+--------------------------------+
| ok       | a valid number was entered (7) |
+----------+--------------------------------+
1 row in set (0.00 sec)

+----------+----------------------+
| mystatus | @comment             |
+----------+----------------------+
| ERROR    | number too high (17) |
+----------+----------------------+
1 row in set (0.00 sec)

+-----------------------------+
| and this is after the error |
+-----------------------------+
| and this is after the error |
+-----------------------------+
1 row in set (0.00 sec)

This less than 30 lines patch is much better than many hacks that we've seen published for years.
What's your take? Would you like to have this feature in MySQL 5.1, while the full thing is being developed for 6.1?

Tuesday, March 11, 2008

Reason #4 to attend the MySQL UC2008

Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#4 A Tour of External Language Stored Procedures for MySQL

Eric Herman and Antony Curtis are hard workers. You don't see them bragging about this and that. They do long and thorough research, and then they prototype. At that point, they claim victory, and usually with reason.
Their work on stored procedures using external languages is one of the most refreshing pieces of new technology applied to MySQL since the announcement of MySQL 5.0.
If you are dissatisfied with the cumbersome heaviness of the standard stored routines, come to this session. You will be surprised of the interesting things you can do NOW with a MySQL server.

More reasons:

Wednesday, April 18, 2007

MySQL Stored routines at the Users Conference

The Users Conference and Expo is approaching fast. As a last minute assignment, I will be speaking about Stored routines for MySQL administration. The session will cover the basics of stored routines and their specifics for administration.

Coincidentally, today was published a review of an excellent book about MySQL Stored Procedures programming by Guy Harrison and Steven Feuerstein. I warmly recommend this book to anyone using stored routines with MySQL.

Wednesday, April 11, 2007

Handling multiple data sets in stored procedures from Perl

MySQL stored procedures can use SELECT statements in their body, returning a data set to the caller. The manual also explains that multiple data sets can be returned from such a stored procedure.
For example, assume you have this stored procedure:

CREATE PROCEDURE p1(x INT, y INT)
    DETERMINISTIC
    BEGIN
        SELECT 
            x ;
        SELECT 
            x AS first_param, 
            y AS second_param;
        SELECT 
            x, 
            y, 
            x + y AS sum_xy, 
            x * y AS prod_xy; 
        SELECT *  FROM t1;
    END

There are four SELECT statements, with different headers, and a variable number of rows returned.
Early versions of DBD::mysql could not handle multiple data sets. Actually, prior to version 3.0004 they could not even handle one dataset from a stored procedure. Starting with version 4.001, single and multiple datasets from SPs are handled gracefully.
The trick is to call $sth->more_results after retrieving each dataset.
See the complete example at Multiple data sets in MySQL stored procedures

For those interested, the MySQL Users Conference and Expo includes a BoF about DBD::mysql on April 24^th at 7:30pm.

The Data Charmer