Showing posts with label feature. Show all posts
Showing posts with label feature. Show all posts

Tuesday, March 19, 2019

dbdeployer community: Part 3 - MySQL Cluster (NDB)

I remember wanting to create MySQL Cluster sandboxes several years ago. By then, however, MySQL-Sandbox technology was not flexible enough to allow an easy inclusion, and the cluster software itself was not as easy to install as it is today. Thus, I kept postponing the implementation, until I started working with dbdeployer.

I included the skeleton of support for MySQL Cluster since the beginning (by keeping a range of ports dedicated for this technology, but I didn’t do anything until June 2018, when I made public my intentions to add support for NDB in dbdeployer with issue #20 (Add support for MySQL Cluster)). The issue had just a bare idea, but I needed help from someone, as my expertise with NDB was limited, and outdated.

Help came in November, when Daniël van Eeden started giving me bits of information on how to put together a cluster sandbox. I still resisted forcing my hand at the implementation, because by then I had realised that my method of checking the database server version to know whether it supported a given feature was inadequate to support anything other than vanilla MySQL or fully complaint forks.

The game changer was the cooperation with TiDB that opened the way for supporting Percona XtraDB Cluster. Even though these technologies are way different from MySQL Cluster, they forced me to improve dbdeployer’s code, making it more flexible, easier to enhance.

When I finally decided to start working on NDB, it took me only a few days to implement it, because I had all the pieces ready for this technology to become part of dbdeployer.

Following Däniel’s instructions, I had a prototype working, which I submitted to #dbdeployer channel on MySQL community slack. In that channel, I got help again from Däniel van Eeden, and then Frédéric Descamps summoned two more experts (Ted Wennmark and Bernd Ocklin), who gave me feedback, advice, and some quick lessons on how the cluster should work, which allowed me to publish a release (dbdeployer 1.23.0) this past week-end.

The implementation has some peculiarities for both users of dbdeployer and MySQL Cluster. For the ones used to dbdeployer, the biggest change is that we are deploying two entities, of which the main one is an NDB cluster, with its own directories and processes, while the MySQL servers are just the visible part of the cluster, but are, in fact, only cluster clients. Still, the cluster works smoothly in dbdeployer paradigm: the cluster is deployed (like Group replication or PXC) as a replication topology, and as such we can run the standard replication test and expect to get the same result that we would see when checking another multi-source deployment.

For people used to NDB, though, seeing NDB as “replication” feels odd, because the cluster is seeing as a distinct entity, and replication is when we transfer data between two clusters. If we were developing a dedicated tool for NDB clusters, this is probably what we would have done, but since we want dbdeployer integration, we must play by the general rules of the tool, where “single” is a stand-alone server instance, and we can’t have dbdeployer deploy single --topology=ndb, because single instance don’t have a topology, which is a property of a group of entities. Therefore, the price to pay for dbdeployer support accepting to see a MySQL cluster deployment as replication.

Now that we have covered all the philosophical angle, it’s time to show an example. Unlike PXC, which is requires Linux, MySQL Cluster can also run on MacOS, which makes my testing much easier.

The first step to run a cluster in dbdeployer is to download a tarball from dev.mysql.com/downloads/cluster, and then expand it in our usual directory ($HOME/opt/mysql):

$ dbdeployer unpack --prefix=ndb --flavor=ndb \  
    ~/Downloads/mysql-cluster-gpl-7.6.9-macos10.14-x86_64.tar.gz  
Unpacking tarball $HOME/Downloads/mysql-cluster-gpl-7.6.9-macos10.14-x86_64.tar.gz to $HOME/opt/mysql/ndb7.6.9  
[...]  
Renaming directory $HOME/opt/mysql/mysql-cluster-gpl-7.6.9-macos10.14-x86_64 to $HOME/opt/mysql/ndb7.6.9

We can repeat the same operation for MySQL Cluster 8.0.14, and in the end we will two expanded tarballs named ndb7.6.9 and ndb8.0.14. With this we can install a few clusters in the same host:

$ dbdeployer deploy replication ndb7.6 --topology=ndb --concurrent  
# ndb7.6 => ndb7.6.9  
$HOME/sandboxes/ndb_msb_ndb7_6_9/initialize_nodes  
MySQL Cluster Management Server mysql-5.7.25 ndb-7.6.9  
2019-03-18 23:47:15 [ndbd] INFO     -- Angel connected to 'localhost:20900'  
2019-03-18 23:47:16 [ndbd] INFO     -- Angel allocated nodeid: 2  
2019-03-18 23:47:16 [ndbd] INFO     -- Angel connected to 'localhost:20900'  
2019-03-18 23:47:16 [ndbd] INFO     -- Angel allocated nodeid: 3  
executing 'start' on node 1  
................ sandbox server started  
executing 'start' on node 2  
.. sandbox server started  
executing 'start' on node 3  
.. sandbox server started  
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb7_6_9  
run 'dbdeployer usage multiple' for basic instructions'


$ dbdeployer deploy replication ndb8.0 --topology=ndb --concurrent  
# ndb8.0 => ndb8.0.14  
$HOME/sandboxes/ndb_msb_ndb8_0_14/initialize_nodes  
MySQL Cluster Management Server mysql-8.0.14 ndb-8.0.14-dmr  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel connected to 'localhost:21400'  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel allocated nodeid: 2  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel connected to 'localhost:21400'  
2019-03-18 23:45:53 [ndbd] INFO     -- Angel allocated nodeid: 3  
executing 'start' on node 1  
........ sandbox server started  
executing 'start' on node 2  
... sandbox server started  
executing 'start' on node 3  
.. sandbox server started  
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb8_0_14  
run 'dbdeployer usage multiple' for basic instructions'

If we look at the sandbox directories, we will see a few more subdirectories than we usually have with other topologies. For example:

 ndb_conf   # cluster configuration  
 ndbnode1   # management node (1)  
 ndbnode2   # data node (2)  
 ndbnode3   # data node (3)  
 node1      # MySQL node 1  
 node2      # MySQL node 2  
 node3      # MySQL node 3

The clusters are well framed into dbdeployer’s architecture, and they respond to standard commands like any other sandbox:

$ dbdeployer sandboxes --full-info  
.-------------------.------.-----------.----------------------------------------------.--------.-------.--------.  
|       name        | type |  version  |                    ports                     | flavor | nodes | locked |  
+-------------------+------+-----------+----------------------------------------------+--------+-------+--------+  
| ndb_msb_ndb7_6_9  | ndb  | ndb7.6.9  | [20900 27510 27511 27512 ]                   | ndb    |     3 |        |  
| ndb_msb_ndb8_0_14 | ndb  | ndb8.0.14 | [21400 28415 38415 28416 38416 28417 38417 ] | ndb    |     3 |        |  
'-------------------'------'-----------'----------------------------------------------'--------'-------'--------'

$ dbdeployer global status  
# Running "status_all" on ndb_msb_ndb7_6_9  
MULTIPLE  /Users/gmax/sandboxes/ndb_msb_ndb7_6_9  
node1 : node1 on  -  port   27510 (27510)  
node2 : node2 on  -  port   27511 (27511)  
node3 : node3 on  -  port   27512 (27512)

# Running "status_all" on ndb_msb_ndb8_0_14  
MULTIPLE  /Users/gmax/sandboxes/ndb_msb_ndb8_0_14  
node1 : node1 on  -  port   28415 (28415)  
node2 : node2 on  -  port   28416 (28416)  
node3 : node3 on  -  port   28417 (28417)

$ dbdeployer global test-replication  
# Running "test_replication" on ndb_msb_ndb7_6_9  
# master 1  
# master 2  
# master 3  
# slave 1  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 2  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 3  
ok - '3' == '3' - Slaves received tables from all masters  
# pass: 3  
# fail: 0

# Running "test_replication" on ndb_msb_ndb8_0_14  
# master 1  
# master 2  
# master 3  
# slave 1  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 2  
ok - '3' == '3' - Slaves received tables from all masters  
# slave 3  
ok - '3' == '3' - Slaves received tables from all masters  
# pass: 3  
# fail: 0

Like other topologies, also the NDB cluster has a script that shows the status of the nodes:

$ ~/sandboxes/ndb_msb_ndb7_6_9/check_nodes  
+---------+-----------+---------------+--------+---------+-------------+-------------------+  
| node_id | node_type | node_hostname | uptime | status  | start_phase | config_generation |  
+---------+-----------+---------------+--------+---------+-------------+-------------------+  
|       2 | NDB       | localhost     |     58 | STARTED |           0 |                 1 |  
|       3 | NDB       | localhost     |     58 | STARTED |           0 |                 1 |  
|       1 | MGM       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       4 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       5 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       6 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
|       7 | API       | localhost     |   NULL | NULL    |        NULL |              NULL |  
+---------+-----------+---------------+--------+---------+-------------+-------------------+  
Connected to Management Server at: localhost:20900  
Cluster Configuration  
---------------------  
[ndbd(NDB)] 2 node(s)  
id=2    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0, *)  
id=3    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)  
id=1    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)

[mysqld(API)]   4 node(s)  
id=4    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)  
id=5    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)  
id=6    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)  
id=7 (not connected, accepting connect from localhost)

It is possible that we will need more iterations to make the deployment more robust. When testing it, keep in mind that this deployment is only for testing, and it won’t probably have all the performance that you may find in a well deployed production cluster. Still, compared to other topologies, the replication tests performed faster than I expected.

Friday, April 20, 2018

MySQL adjustment bureau


When maintainng any piece of software, we usually deal with two kind of actions:

  • bug fixing,
  • new features.

bugs and features

A bug happens when there is an error in the software, which does not behave according to the documentation or the specifications. In short, it's a breech of contract between the software maintainer and the users. The promise, i.e. the software API that was published at every major version, is broken, and the software must be reconciled with the expectations and fixed, so that it behaves again as the documentation says. When we fix a bug in this way, we increment the revision number of the software version (e.g. 1.0.0 to 1.0.1. See semantic versioning).

New features, in turn, can be of two types:

  • backward compatible enhancements, which add value to the software without breaking the existing functionality. This is the kind of change that requires an increment of the minor indicator in the version (for example: 1.1.15 to 1.2.0.)
  • Incompatible changes that break the existing behavior and require users to change their workflow. This kind of change requires bumping up the major number in the version (as in 2.1.3 to 3.0.0.)

Not a bug, nor a feature, but an adjustment.

The above concepts seem simple enough: you either fix something that's broken or add new functionality.

However, when maintaining a tool that has the purpose of helping users to deal with another software (as it is the case of dbdeployer that helps users to deploy MySQL databases) there is yet another category of changes that don't fall into the standard categories: it's what happens when the software being helped (MySQL) changes its behavior, which would break the normal functioning of the helping tool, giving the maintainer a difficult choice:

  • shall I modify the tool's interface to adapt to the new behavior, breaking existing procedures?
  • or shall I adapt the tool's functioning behind the scenes to keep the interface unchanged?

My philosophy with dbdeployer (and MySQL-Sandbox before it) is to preserve the tool's interface, so that users don't have to change existing procedures. I call this kind of changes adjustments, because they are not bugs, as they are not a consequence of a coding error, and not a feature, as the intervention is not a conscious decision to add new functionality, but an emergency operation to preserve the status quo. You can think of this category as a capricious change in specifications, which so often happens to software developers, with the difference that the one changing the specs is not the user, but a third party who doesn't know, or care, about our goal of preserving the API integrity.

For example, from MySQL 8.0.3 to 8.0.4 there was a change in the default authentication plugin. Instead of mysql_native_password, MySQL 8.0.4 uses caching_sha2_password. The immediate side effect for MySQL-Sandbox and dbdeployer was that replication doesn't work out of the box. A possible solution would be to force the old authentication plugin, but this would not allow users to test the new one. Since the main reason to use a tool like dbdeployer is to experiment with new releases safely, I had to keep the default behavior. Thus, I left the default plugin in place, and changed the way the replication works. It's an ugly workaround actually, but allows users to see the new behavior without losing existing functionality.
To complete the adjustment, I added a new option --native-auth-plugin, which would deploy using the old mysql_native_password. In total, the adjustment consists of a behind-the-scenes change, almost undetectable by users, and a new option to keep using the familiar authentication if users want it.

From the point of view of semantic versioning, this kind of change is a backward-compatible modification of the API, which warrants an increase of the minor number of the version.

Another example: when MySQL went from 8.0.4 to 8.0.11, it introduced a deal breaker change: the X Plugin is now loaded by default. This is easy for users of MySQL as a document store, as they don't need to enable the plugin manually, but bad news for anyone else, as the server is opening a port and a socket that many users may not choose to open voluntarily. What's worse, when installing more sandboxes of version 8.0.11 in the same host (for example in replication), one will succeed in reserving the plugin port and socket, while the others will have the error log populated with surprising errors about a socket being already in use.

The solution is similar to the previous one. When dbdeployer detect MySQL 8.0.11 or newer, it adds options to customize the mysqlx plugin port and socket, thus allowing a frictionless deployment where the new functionality is available to the brave experimenters. At the same time, I added a new option (--disable-mysqlx) for the ones who really don't want an extra port and socket in their servers, not even for testing.

These adjustment are usually costly additions. While the added code is not that much, they require extra tests, which are often complex and require more time to write and execute them. The process to add an adjustment goes mostly like this:

  • I dedicate my morning walk to think about the fix. Sometimes the fix requires several walks, while I decide the less intrusive solution.
  • If the walk has been fruitful, writing the code requires just a few minutes. If I missed something, I iterate.
  • Then the more difficult part: writing meaningful tests that prove that the adjustment is correct and it doesn't introduce side effects in any MySQL version. And of course the option that reintroduces the old behavior must be tested too.
  • A positive side effect of this exercise is that often I realize that I was missing a test for an important behavior and then I write down that as well. The test suite included 6,000+ tests 1 month ago, and now it has almost doubled.

Thursday, May 06, 2010

Two quick performance tips with MySQL 5.1 partitions

partitionsWhile I was researching for my partitions tutorial, I came across two hidden problems, which may happen often, but are somehow difficult to detect and even more difficult to fix, unless you know what's going on, and why. I presented both cases during my tutorial, but there were no pictures to convey the mechanics of the problem. Here is the full story.

TO_DAYS() prunes two partitions instead of one


If you are partitioning by date, chances are that you are using TO_DAYS(). And depending on how you have partitioned your table, your queries are as fast as you expect them to be. However, there are cases where your query takes twice as long as it should, and of course this will not make you happy.

For example, in a table partitioned by month, when your query searches for values within one specific month, EXPLAIN PARTITIONS tells you that the search involves two partitions (see figure above). This means that, instead of searching through 1,000,000 rows in one partitions, the partitions engine is searching through 2,000,000 rows in two partitions.
But why? The reasoning, as reported from the developers, is that
This is not a bug, since TO_DAYS() returns NULL for invalid dates, it needs to scan the first partition as well (since that holds all NULL values) for ranges.

Bug#49754: Partitioning by RANGE with TO_DAYS always includes first partition when pruning
This makes sense, from a developer's standpoint. From a user's experience, though, it's a bug.
Anyway, it doesn't do us any good to rant about it. Our query is still twice as slow as we want it. We need to take action. The workaround is to create an empty partition in first position. If we are creating a new table, it's simple. Just say
PARTITION p000 VALUES LESS THAN (0)
and all will be well. The partition pruning mechanism will still find two partitions, but since the first one is empty, it won't impact the performance.
If you have an existing table already partitioned, like in our example, then you need to perform a different operation

Now we have a different first partition, with no records. When we issue the same query, the partition pruning will look at partition p0, but it will skip it because there are no records.

Inserting single rows in partitions is slow


Also this bug is sometimes difficult to detect. If you want to test partitions in MySQL 5.1, probably you will take an existing table and convert it to a partitioned one, or you create a new table and load the contents from a dump. Either way, you are unlikely to insert millions of records with single INSERT statements. These single inserts are slower than bulk inserts in the first place, but with partitions there is an additional penalty. Whenever you insert a record, the partitioning engine locks the entire table. When you insert thousands of records, the partitioning engine will lock all partitions before the insert, and unlock them after the insert. If you have 500 partitions, that's 500 locks and 500 unlocks for every statement. Ouch!
It's a design problem, and it is not likely to be fixed without turning around the whole architecture of partitions. Also in this case, there is a bug report, Partitioning performance drops drastically with hundreds of partitions, although nobody says that this is a feature.
What can you do, then? You have several choices:
  • You can use a bulk insert. Instead of single statements, use INSERT with multiple records, or LOAD DATA INFILE.
  • Explicitly LOCK the table before inserting and UNLOCK it after you finish with all the inserts. This will avoid the overhead, although it won't make your table concurrently accessible until you finish.
  • If you use partitioning only to facilitate heavy queries, consider using a non-partitioned table on the master, and partitioned ARCHIVE tables on the slaves (see figure below).


As I have said many times in my presentations, always benchmark before using partitions in production. Whether you think that it will boost your performance or that it will slow things down, don't trust your instincts, and test. You may be up for a surprise.

Friday, November 28, 2008

Yet another replication trap


replication and engines

When I filed Bug#39197 replication breaks with large load with InnoDB, flush logs, and slave stop/start, I genuinely thought that it was a serious problem. I was a bit puzzled, to tell the truth, because the scenario that I was using seemed common enough for this bug to be found already.
Anyway, it was verified independently, but there was a catch. The script in the master was using SET storage_engine=InnoDB to create the tables necessary for the test. That looked good enough to me. The script was indeed creating InnoDB tables on the master. The trouble was that the "SET" command is not replicated. Thus the slaves were using the default storage engine (i.e. MyISAM). That reduces the scenario to a case were I was splitting a transaction and trying to apply it to a non transactional table. Not a bug, as Andrei Elkin and Sveta Smirnova correctly found out, but not an easy error to spot either.
According to the manual, using the storage_engines variable allows for a certain degree of flexibility in replication. The non-replication of this instruction that affects DDL commands is described as a feature. In my experience, it's a bug, by design even, but this is my personal opinion.