sysadvent

December 7, 2012

Day 7 - Bacon Preservation with ZFS

This was written by Bryan Horstmann-Allen

An Intermediate Guide to Saving Your Butt

ZFS is a pooled storage filesystem with many advanced features. This article will describe several circumstances where trivial ZFS usage can aid you, as a systems administrator or developer, immensely.

Everything in this article is applicable to any version of ZFS, including Oracle Solaris or an illumos distribution (collectively known as "Solarish" systems) or the FreeBSD or Linux ZFS ports.

(There has been a fair amount of branching between Oracle ZFS and the open version of ZFS curated by illumos, but none of those changes will be relevant in this article.)

Terminology

In ZFS, we refer to a single set of storage as a "pool." The pool can be one disk, or a group of disks, or several groups of disks in any number of configurations.

A single group of disks is referred to as a "vdev."

The pool contains "datasets." A dataset may be a native ZFS filesystem or it may be a block device (referred to as a "zvol.")

ZFS supports mirrors and various RAID levels. The latter are referred to as "RAIDZ1", "RAIDZ2", and RAIDZ3." The number denotes how many disks the vdev can lose before the pool becomes corrupt.

Comparable Stacks

If you're more familiar with Linux filesystems, ZFS condenses the facilities offered by standard filesystems like ext, XFS, and so on, md, and LVM into a single package. However, it also contains many features simply not found in that stack. On Linux, btrfs has been trying to catch up, but ZFS has been around for coming up on a decade, so they have a long way to go yet.

Checksumming

By default, ZFS enables block-level checksumming: each block in the pool has an associated checksum. If you have data silently corrupted by disk firmware, neutrons moseying by, co-workers playing with dd, you’ll hear about it.

If you are working in a redundant pool configuration (and in production, you will be) a zpool scrub will auto-heal any corrupted data.

 If an application tries to read a corrupted block and you’re running redundant, it will go read a good block instead. And because ZFS loves you, it will then quietly repair the bad block.

Visibility

ZFS was developed at Sun Microsystems, where engineers had a deep and somewhat disturbing love of numbers. They loved keeping metrics and stats for everything, and then giving you, the administrator, access to them. Typically these are exposed via the generic kstats facility:

# kstat -l | wc -l
   42690

ZFS is no different. It exposes several hundred metrics to kstats.

Ben Rockwood’s arc_summary.pl is something I keep handy on all my Solarish systems. There is an overview of the ARC below.

I also use OmniTI’s resmon memstat plugin to generate graphs.

You can also use command-line utilities on Solarish derivatives like fsstat zfs and, more recently, arcstat to get live usage information.

For other versions of ZFS, look to the locally preferred method of exposing kernel stats to userland for your ARC stats (/proc or sysctl, for instance.)

When New Users Say ZFS Sucks

You may notice some other behaviors with ZFS that you may not find with other filesystems: it tends to expose poorly behaving HBAs, flapping disks, bad RAM, and subtly busted firmware. Other filesystems will not surface these problems. They way these issues are exposed tends to be in hung disks, or pools whose files regularly become corrupted (even if ZFS recovers from those issues.)

Some new users complain about this ("It works fine with foofs!"), but it is far better to be aware that your hardware is having problems. Run zpool scrub periodically. Get your due diligence in, and sleep more soundly as a result.

Blissful ignorance stops being so blissful when you're blissfully losing customer data.

Bottlenecking on Disk I/O

In serverland, you’ll find you tend to have more CPU than I/O if your application is iops heavy. Your applications will end up waiting on disk operations instead of doing useful work, and your CPU will sit around watching Vampire Diaries in its free cycles rather than crunching numbers for your customers.

When your applications and users are suffering, it's good to have options. Depending on your workload, ZFS gives you several easy performance wins.

Write Log

ZFS provides a Separate Log Device (the “slog” or “write log”) to offload the ZFS Intent Log to a device separate from your ZFS pool.

zpool add tank log c1t7d0p1

In effect, this allows you to ship all your synchronous writes to a very fast storage device (SSD), rather than waiting for your I/O to come back from a slower backing store (SATA or SAS). The slog tends to not get very full (a few dozens of megabytes, at most) before it flushes itself to the backing store, but your customers won’t feel that. Once the data hits the slog, the application returns, and the customer doesn’t feel the latency of your slower but much larger SATA disks.

ZFS also batches async writes together into an atomic Transaction Group every 5 or 30 seconds, depending on your version. This not only ensures data should always be consistant on disk (though perhaps not immediately up to date!), but it gives you a heavy performance boost for applications not calling fsync.

If the txg fails to write due to a power outage or the system panicking or so forth, you’ll get the most recently known-good transaction. Thus, no fsck in ZFS.

Filesystem Cache

ZFS also has a main memory filesystem cache called the Adaptive Replacement Cache. The ARC both stores recently accessed data in main memory, and also looks at disk usage pattern and prefetches data into RAM for you. The ARC will use all available memory on the system (it's not doing anything else anyway), but will shrink when applications start allocating memory.

You can also add extra cache devices to a ZFS pool, creating a Layer 2 ARC (L2ARC):

zpool add tank cache c2d0

The caveat for the L2ARC is it consumes main memory for housekeeping. So even if you attach a very fast, battery-backed Flash device as an L2ARC, you may still lose out if L2ARC consumes too many blocks of main memory as L2ARC pointers. (My rule of thumb, which may be out of date, is that each 100GB of L2ARC will utilize 1GB of ARC. So keep that in mind.)

Much like the main ARC, L2ARC is volatile cache: It's lost on reboot, and will take some time to re-warm.

You can view both slog and L2ARC usage in the output of zpool iostat -v.

Compression

However, there is an even simpler way to get more performance out of your disks:

zfs compression=on tank

You can enable compression on a per-dataset level, or at the pool level. The latter will cause all child datasets to inherit the value.

ZFS supports two compression algorithms: lzjb, which is a light-weight but very fast streaming block compression algorithm, and gzip. I enable lzjb on all my pools by default, and have since 2007.

Modern CPUs are ridiculously fast, and disks (even 6Gb/s 15k SAS) are rather slow comparatively. If you’re doing a lot of I/O, you can get a simple but impressive performance win here.

You also get a nice bonus: More disk utilization. On a simple RAIDZ1 SmartOS compute node storing mostly VM block devices, I’m getting 1.48x compression ratio using lzjb. So out of my 667GB SAS pool, I’m actually going to get around a terabyte of actual capacity.

The default gzip level is 6 (as you’d get by running the command itself). For my logserver datasets, I enable gzip and get an impressive compression ratio of 9.59x. The stored value? 1TB. Actual uncompressed? Almost 10TB. I could enable gzip-9 there and get more disk space at the cost of CPU time.

A couple years ago at a previous gig, we were rewriting 30,000 sqlite files as quickly as possible from a relatively random queue. For each write, you read the whole file into memory, modify it, and then write the whole thing out.

As initially deployed, this process was taking 30-60m to do a complete run. Users were not too happy to have their data be be so far out of date, as when they actually needed something it tended to be an item less than a few minutes old.

Once we enabled compression, well, you can see here.

A job going from around an hour to a minute or less by running one command? Not bad. We also minimized the I/O workload for this job, which was very helpful for a highly multi-tenant system.

We later parallelized the process, so it now it takes only a few seconds to complete a run.

The caveat with compression is what when you send a compressed stream (described below), you lose compression. You can compress inline through a pipe, but the blocks will be written uncompressed on the other side. Something to keep in mind when moving large datasets around!

Compression only works on new writes. If you want old data to be compressed, you'll need to move the files around yourself.

Snapshots

ZFS gives you an unlimited number of atomic dataset-level snapshots. You can also do atomic recursive snapshots for a parent dataset and all its children.

zfs snapshot tank/kwatz@snapshot

For application data, I tend to take snapshots every five minutes via a cron job. Depending on the backing disk space and how often the data changes, this means I can keep snapshots -- local to the application -- around for a few hours, or days, or months.

For simple centralized host backups, I tend to use something like this:

#!/bin/bash

source /sw/rc/backups.common

now=`/bin/date +%Y%m%d-%H%M`

HOSTS="
host1
host2
host3
...
"

DIRS="/etc /root /home /export/home /mnt/home /var/spool/cron /opt"

for HOST in $HOSTS ; do

  echo "==> $HOST"

  /sbin/zfs create -p $BACKUP_POOL/backups/hosts/$HOST
  /sbin/zfs snapshot $BACKUP_POOL/backups/hosts/$HOST@$now

  for DIR in $DIRS; do
    rsync $RSYNC_OPTIONS --delete root@$HOST:$DIR /var/backups/hosts/$HOST/
  done

  /sw/bin/print_epoch > /var/run/backups/host-$HOST.timestamp
done

/sw/bin/print_epoch > /var/run/backups/hosts.timestamp

So the root of your backups is always the most recent version (note rsync --delete). Not only are we only transferring the changed files, we're only storing the changed blocks in each snapshot.

We also touch some local files when the backup completes, so we can both graph backups latency and alert on hung or stale backup jobs.

Getting access to the snapshots is trivial as well: There is a hidden .zfs/snapshot/ directory at the root of every dataset. If you go looking in there, you’ll find all your snapshots and the state of your files at that snapshot.

# cd /var/backups/hosts/lab-int
# ls -l .zfs/snapshot | head
total 644
drwxr-xr-x   7 root     root           7 Aug 31 22:04 20120901-2200/
drwxr-xr-x   7 root     root           7 Sep  1 22:03 20120902-2200/
drwxr-xr-x   7 root     root           7 Sep  2 22:03 20120903-2200/
...

# ls -l etc/shadow
----------   1 root     root        2043 Oct 12 00:22 etc/shadow
# ls -l .zfs/snapshot/20120901-2200/etc/shadow
----------   1 root     root        1947 Jul 30 13:13 .zfs/snapshot/20120901-2200/etc/shadow

It makes building recovery processes rather painless. If you have customers who often delete files they’d rather not, for instance, this is a very simple win for both you (whose mandate as the administrator is to never lose customer data) and the customer (whose mandate is to lose data that is most valuable to them at the least opportune moment).

Make sure you set up your purging scripts, however, or months down the line you might find you've used up all your disk space with snapshots. They're both additive and addictive.

Replicating snapshots

So local snapshots are awesome, but ZFS does you one better:

zfs send tank/kwatz@snapshot | ssh backups1 zfs recv -vdF tank

That will send that one snapshot to another system. That particular command will overwrite any datasets named kwatz on the target.

However, why only keep one snapshot, when you can ship every snapshot you’ve taken of a dataset and all of its children, off-system or off-site entirely?

And you don’t actually want to send the entire dataset every time, for obvious reasons, so ZFS handily provides deltas in the form of ZFS incremental sends:

#!/bin/bash -e

REMOTE_POOL=tank2
LOCAL_POOL=tank
TARGET_HOST=foo

LAST_SYNCED=$( ssh $TARGET_HOST zfs list -t snapshot -o name -r $REMOTE_POOL/zones/icg_db/mysql | tail -1 )
echo "r: $LAST_SYNCED"

LAST_SNAPSHOT=$( zfs list -t snapshot -o name -r tank/zones/icg_db/mysql | tail -1 )
echo "l: $LAST_SNAPSHOT"

# In case the target/source pool names are different.
RENAMED_STREAM=$( echo $LAST_SYNCED | sed -e "s/$REMOTE_POOL/$LOCAL_POOL/" )
echo "s: $RENAMED_STREAM : $LAST_SNAPSHOT -> $REMOTE_POOL"

zfs send -vI $RENAMED_STREAM $LAST_SNAPSHOT | ssh $TARGET_HOST zfs recv -vdF $REMOTE_POOL

I tend to ship all my snapshots to a backup host. Mail stores, databases, user home directories, everything. It all constantly streams somewhere. The blocks tend to already be hot in the ARC, so performance impact is generally very light.

It’s also trivial to write a rolling replication script that constantly sends data to another host. You might use this technique when your data changes so often (I have one application that writes about 30GB of data every run) you can’t actually store incremental snapshots.

Here’s a very naive example that has served me pretty well over the years.

Finally, need full offsite backups? Recursive incremental sends from your backup host.

Clones

By this point I hope you’re getting the idea that ZFS provides many facilities -- all of them easy to understand, use, and expand upon -- for saving your butt, and your customers data.

In addition to snapshots, ZFS lets you create a clone of a snapshot. In version control terminology, a clone is a branch. You still have your original dataset, and you’re still writing data to it. And you have a snapshot -- a set of blocks frozen in time -- and now you can create a clone of those frozen blocks, modify them, destroy them.

This gives you a way of taking live data and easily testing against it. You can perform destructive or time-consuming actions, without impacting production. You can time how long a database schema change might take, or you ship a snapshot of your data to another system, clone it, and perform analysis without impacting performance on your production systems.

Eric Sproul gave a talk at ZFS Days this year about just that topic.

Database Snapshots and Cloning

In the same vein, one of my favorite things is taking five minute snapshots of MySQL and Postgres, shipping all those snapshots off-system, and keeping them forever.

For most production databases, I can also keep about a days worth of snapshots locally to the master... so if someone does a “DROP DATABASE” or something, I can very quickly revert to the most recent snapshot on the system, and get the database back up.

We only lose a few minutes of data, someone has to buy some new undies, and you don’t have to spend hours (or days) reimporting from your most recent dump.

The best part about this bacon-saving process is how trivial it is. Here’s a production MySQL master:

# zfs list tank/zones/icg_db/mysql
NAME                      USED  AVAIL  REFER  MOUNTPOINT
tank/zones/icg_db/mysql  54.8G   184G  46.8G  /var/mysql

# zfs list -t snapshot -r tank/zones/icg_db/mysql | tail -1
tank/zones/icg_db/mysql@20121202-1005  15.6M      -  46.8G  -

# zfs clone tank/zones/icg_db/mysql@20121202-1005 tank/database

# zfs list tank/database
NAME            USED  AVAIL  REFER  MOUNTPOINT
tank/database     1K   184G  46.8G  /tank/database

# zfs set mountpoint=/var/mysql tank/database

So we've got our data cloned and mounted. Now we need to start MySQL. Once we do so, InnoDB will run through its crash recovery and replay from its journal.

# ./bin/mysqld_safe --defaults-file=/etc/my.cnf 
# tail -f /var/log/mysql/error.log
121202 10:11:37 mysqld_safe Starting mysqld daemon with databases from /var/mysql
...
121202 10:11:41  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
...
121202 10:11:50 [Note] /opt/mysql/bin/mysqld: ready for connections.

MySQL is now running with the most recent snapshot of the database we have.

# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3 to server version: 5.5.27-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

This entire process took under a minute by hand.

Being able to very quickly spin up snapshots of any journaled datastore has been incredibly helpful over the last few years, for accident remediation, troubleshooting, and performance analysis.

My Next Projects

At work, we’re building a malware lab on top of Joyent’s SmartDatacenter (which runs SmartOS). There are many pieces involved here, but one of the biggest ones is the ability to take a Windows VM image, install software on it, and then clone it 20 times and run various malware through it.

With ZFS, this is as trivial as taking a snapshot of the dataset, cloning it 20 times, and booting the VMs stored on those volumes.

(There are many other facilities that aid in this process in SmartOS, namely the way they’ve wrapped Solaris Zones with vmadm, but that’s perhaps for another article!)

This facility would also make it trivial for us to implement something a lot like AWS’s Elastic MapReduce:

Spin a master node, 20 slaves, and just keep track of which job(s) they’re working on. When its done, terminate the VMs the ZFS datasets are backing, and destroy the clones.

Wash, rinse, repeat, and all the lower-level heavy lifting is done with a handful of ZFS commands.

Conclusion

These processes have all saved multiple butts. More importantly, they have helped us ensure services customers rely upon are not impacted by system failure or accidents.

ZFS’s power lies not only in its many features, safety mechanisms or technical correctness, but in the ways it exposes its features to you.

ZFS is a UNIX tool in the truest sense. It allows you to build powerful and flexible solutions on top of it, without the gnashing of teeth and tedium you might find in other solutions.

(Much thanks to @horstm22, @rjbs, @jmclulow, and @richlowe for help with this article.)

December 6, 2012

Day 6 - Watching out for Vendor Lock-In

This was written by Matt Simmons (blog)

Welcome to the dystopia your parents warned you about.

Vendor lock-in used to mean that your data was stuck in a proprietary format, requiring you to buy expensive services to migrate to another provider. With [PS]aaS, it means that your entire company can disappear in a puff of smoke if you weren't careful about your choices.

Lets figure out how to avoid that outcome.

System Administrators are a combination of maintenance staff, standing army, and consigliere. Not only do we keep things running smoothly, we guard against invaders, and we act as trusted advisors to the people who make corporate policies. It's unavoidable that at some point we will need to advise our organizations to rely on outside sources for IT services, and when we do that, the onus is on us for ensuring our company's data can out-survive the service provider we choose.

Here are some rules to take into consideration when choosing a provider:

No Roach Motel

There can never be the scenario where data checks in, but it doesn't check out. Data needs to be able to be programmatically extracted from the remote service. If raw data dumps aren't available, make sure that there's an API that can be utilized which provides a way to access all of the data that you entered, including any important metadata.

For example, I had a load balancer (actually several, because who just buys one load balancer?) that worked perfectly well. It had all kinds of interfaces to allow me to do everything I needed. I enjoyed using it because it was especially helpful with regard to generating Certificate Signing Requests (CSRs) and doing certificate management. The downside was that if you used the key that it generated to sign the CSRs, you couldn't actually export the key. It's not like it advertised that fact - it just didn't give you the option to do it. The "certificate backup and recovery" process used encrypted tarballs, and you could import into another of the company's load balancers, but you couldn't do anything else with the certificate. Talk about annoying...

Authentication, Access Control, and Accounting

You use centralized authentication to maintain your users. Use a cloud service that will allow you to automate Moves, Adds, and Changes (MACs) of accounts on their end. Also, ensure that the service uses sufficiently-finely-grained control to company resources, and ensure that when people make changes, those changes are recorded. Too many cloud providers don't offer field-level logging of data, and when a user changes a field maliciously or by accident, it can be difficult or impossible to investigate using their tools.

Do you like running email servers? Me neither. Between spam, defense from blacklisting, and half a dozen other irritants, plus the fact that our existing software of choice couldn't do the advanced calendaring that our users wanted to use led to us considering the possibility of building an Exchange infrastructure. After fully considering things, we determined that a combination of the fact that we were a primarily-Linux shop plus the license fees of building an Exchange infrastructure meant that we would be better off to outsource our email services. Because user complaints had risen to a clamor by that time, we assented to their demands and went with an affordable Exchange provider.

Unfortunately, we didn't warrant enough users to have our own dedicated server, so we were stuck in a shared environment. That also led to us having to administer our users through a broken, under-featured, over-complex web interface that had little or nothing to do with the underlying Exchange server. Plus, the company didn't support importing Active Directory users and groups, nor was there an API that would allow us to "pretend". It was a miserable experience for everyone involved.

Be Aware of Provider Limitations

Don't rely on a service provider with a lesser infrastructure than your own. A chain's only as strong as its weakest link. You use multiple AWS regions. Or maybe multiple data sites. But a bad choice of a SaaS provider can ruin all of your carefully laid plans. Investigate and decide accordingly.

You know exactly how much work and effort you spent on developing a solid, stable infrastructure. You know that you have disaster recovery plans, and that you test your fail saves and fail overs. You don't know about the SaaS provider's infrastructure until you ask. One of the first things I did with companies I was evaluating for hosted services was have an in-depth discussion with an engineer from their side who could talk with me about things like infrastructure and service uptime, SLAs, and so on.

Essentially, I interviewed the companies like I interviewed potential employees, because there are a lot of similarities. Both are working for you, both can screw up and cost you uptime, and firing both is harder than you'd like it to be. On the other hand, the right company (and the right employee) can both make your life immeasurably easier, too. Good service is rare, so when you find it, treasure it.

December 5, 2012

Day 5 - Following the White Rabbit

This was written by Kent C. Brodie

Have you ever worked with a vendor support, and after much back and forth, ended up with an answer, "works for us, so it must be something with your setup, sorry!" This is such a story. And like most similar situations, I learned some good lessons worth sharing.

The Background (our environment)

I work in the Human & Molecular Genetics Center, a large center within a private medical school. We currently deal with whole genome analysis, and let me tell you, it's pretty fun. There's lots of data, lots of PhD-type people doing complicated analysis, lots of servers, and lots of tools. Did I mention there's lots of data?

The foundation of all of this is a Sun Grid Engine (SGE) cluster and a software package from Illumina, Inc, that does nifty genetic things like "demultiplexing", "alignment", "variant calling", and several other sexy scientist things The cluster isn't huge, but it's powerful enough to get the processing done in a reasonable amount of time, between 8 and 14 hours depending on the data.

Fast servers. Dedicated 10gb network. Industry-standard cluster software. What could go wrong?

The Problem

This clustered processing job was failling, but only sometimes. Specifically, the demultiplexing and variant calling steps always worked fine, but the alignment step did not. To make debugging harder, the process would run for 12 or 14 hours before blowing up.

In the resulting error log file, I found several instances of this kind of error:

AlignJob.e22203:error: commlib error: got read error (closing "baku/shepherd_ijs/1")

I also saw errors like this:

[2012-10-13 00:47:38]   [kiev.local]  ERROR: The ELAND extended file for the mate 2 reads 001_eland_extended.txt.oa) could not be found.
[2012-10-13 00:47:38]   [kiev.local]  qmake: *** [205P_GCCAAT_L001_001_pair.xml] Error 1

It's important to point out that the software worked just fine in all cases on a single node. It's just in the SGE environment that it failed. The Illumina pipeline is essentially really nothing more than a handful of binaries a few Makefiles. Many of you are familiar with "make" and its use - SGE introduces a new flavor of that called "qmake", which is like "make", but runs distributed when the code is designed to take advantage of it. In my case, "make" on a single node worked fine, "qmake" in the cluster did not.

Getting Some Help

The first place I turned to was, of course, Illumina, our sequencing equipment and software vendor. Because of the complicated nature of the setup, they really could not provide any clear answers. Their support responses were along the lines of "it's a cluster issue". "Contact your grid engine company", "It's a race condition, you probably have issues with your switches", and so on. They support their software tools completely, but can really only guarantee help in single-server installations. Due to the multiple variables in a typical cluster setup, Illumina cannot support customer-built clusters. (Illumina sells a specific cluster setup, but we do not have that).

I had a few leads from their support, but I was basically on my own.

And So It Begins

Being a bit new to the SGE environment, I figured I had probably missed something. There ARE several installation options to choose from, and so I went off to test each and every possible option.

When debugging a problem, most of us sysadmins dive in, execute a task, analyze the result, and repeat until the problem is solved. This is how I worked on this - no surprise there, but remember, I had to wait at LEAST 12 hours or more to even find out if my changes had any effect. Let me tell you, that's not fun. The emotional roller-coaster of THINKING I had solved the problem, only to find out the NEXT DAY that I had not, was incredibly difficult and took its toll.

Tick, tock, tick, tock.

I tried one change at a time, but in some cases, multiple changes – all depending on gut feeling and experience. The combinations of changes got pretty crazy. All of my attempts ended with frustration - no matter WHAT I did, I got the same errors. Here is the abbreviated list:

Various versions and implementations of Grid Engine (official Oracle SGE, open source "Son of Grid Engine", and multiple versions of each)
Various 10-gig switch settings (enable/disable STP/portfast, flow control, etc)
Physical connections to the switches (all nodes on one switch vs multiple switches, etc)
Grid engine options dealing with spooling (local? centralized? classic(flat) vs BerkeleyDB?)
Number of actual processes per node when running the job

Finally, a Breakthrough

At this point, over a month has passed in this troubleshooting marathon. I am losing sleep. The boss is cranky. He wants to replace the cluster with several standalone 48-core machines to run the Illumina pipeline, which means he's lost faith I can ever solve this. (A horrible solution by the way, because even on a 48-core server, a single alignment job takes 40 hours.) It WORKS, but it takes 40 hours: four times longer than when clustered.

In my daily Google searches about grid engines and configurations and such, I eventually come across the default configuration for ROCKS clusters with SGE – and notice something I've not yet tried. (ROCKS is a software distribution to rapidly build physical and virtual clusters, pre-loaded with all sorts of goodies).

By default, SGE uses and internal RSH mechanism to make all of the connections to the other hosts and/or back to the grid master. But in the ROCKS distribution, SSH is used. Why SSH? Because for extremely large clusters OR for jobs that require a boatload of communication stream, RSH will run out of connections due because it only uses ports below 1024.

Lather, rinse, repeat. Wait 12 hours.

Bingo. Well, sort of. The "commlib" errors are now gone. Seems the commlib error was trying to tell me, "can't establish connection because there's no more ports left". I am now left with the missing file(s) error.

I am closer.

The situation seems simple - there are files that a job on a node out there is expecting to be there, and it isn't. OK, so the vendor's claim of some "race condition" seems possible. The testing continues.

Nothing. Nada. Zip.

Another 3 weeks pass. I have made zero progress on the final error - the famous missing file. I have even gone so far as to acquire a demo Force10 switch from my partner/reseller to try out to see if that solves the problem (Our cluster installation has a Dell 8024 10-gig switch, a choice made by a former manager. It was not on our storage vendor's approved-hardware list..). The new switch makes no difference. Despite that, I am extremely thankful to my reseller and Dell for allowing me to test the switch.

A pair of breakthroughs?

Finally, in week 8, two things come together, both pointing to issues with Illumina software and "qmake".

Use the Source, Luke

First, I find some interesting comments in two of the complicated Makefiles in Illumina's code. Due to licensing restrictions on the code, I cannot post the comments or code here, but the guts of it boil down to that (this is paraphrased), "due to the limitations of qmake ... we have to do it this way".

This is interesting. Even one of the authors of the code is sort of acknowledging that he or she has had issues with qmake – the core utility I'm using in SGE.

A funny thing happened on the way to the Forum

Second, at about the same time, I got an answer from an online forum for people doing sequencing analysis ("seqanswers"). As a hail Mary pass, I post specific questions about my issue, and from more than one person I hear back, "use distmake".

Distmake is an interesting animal. It's a distributed make. It's kind of like qmake, but it BEHAVES like "make" does on a single server. It's a little hard to explain, but the difference showed up in my logfiles. When I ran the alignment job using "qmake", the log was peppered with log entries from every node in the cluster. Node A did this step, node C did that step, and so on. With "distmake", every single log entry is from ONE node. The job distribution works behind the scenes, but it also works within the SGE framework. This is critical, since we depend on SGE for the scheduling of all of the jobs.

And.. it worked!

(Personal note, as soon as this happened, I took 2 days of vacation to celebrate and basically do NOTHING except clean my garage and catch up on my favorite tv shows).

What was wrong? Why this failed in our environment, but works ok "at Illumina", I can't say. The type and size of the data? The number of nodes in the cluster? The sequencing/alignment options? Endless possibilities. My own conclusion is that under many circumstances, the Illumina code and qmake, the heart of SGE, simply do not behave well together. I have had difficulty convincing the vendor of that, and have to settle for the fact that I have a working solution. A solution several other sites are using, by the way.

Lessons learned

I learned several lessons while going through this process. Some are technical tidbits specific to SGE, and some are more general advice. Some of these seem obvious now, but during those two months, it wasn't.

Sun Grid Engine lessons

In my opinion, Son of Grid Engine is where it's at. It is a much more active project and gets quick bug fixes. Oracle DOES still support their own SGE product, but updates are slow, and I wouldn't bet long term banking on SGE support from them. It was, after all, a Sun product.
SGE spooling: go with local spooling. And classic mode is just fine except for perhaps the largest (hundreds of nodes) clusters, that's the only time you probably really need BerkeleyDB.
SGE ports: Despite what the documentation says, do not depend on scripts and such to define the ports used for qmaster or execd communication. Always, always, use /etc/services entries.
SGE communication: Use SSH. ROCKS has it right. # SGE error messages blow chunks. Just sayin'.

Other Lessons

Do your own troubleshooting: The most important thing I learned from this experience was that I let myself be led, accidentally, by a software vendor. Following only their suggestions had me distracted with red herrings for too much time. Without predetermined suggestions for what might be wrong, I may have uncovered the flawed code earlier.
Problems with one process are not necessarily related. I initially thought the COMMLIB and MISSING FILE errors were related. They were not.
There is a support forum for EVERYTHING: I can't explain why I didn't stumble upon SEQANSWERS earlier. I just missed it. I'm a sysadmin for a relative small community (institutions that do their own genetic sequencing), and I'm guessing the handful of us that are in this are WAY too busy to be real active on the Internet :-) . I guess I was looking for the wrong community (SGE users), when the proper community was there all along (sequencing analysis users). http://seqanswers.com/
distmake is basic, but really cool.

I hope you enjoyed my little tale! Now go finish your Christmas shopping. There's only 19 shopping days left.

December 4, 2012

Day 4 - ZooKeeper for Distributed Coordination

This was written by Adam Compton (blog)

Sysadvent 2008 covered using lockfiles to protect cron jobs as well as how to use lock files in general. This year, let's take a look at how you can do distributed task control!

The company I work for has a large high-performance compute cluster (20 PB/day the last time I checked), and I’m one of the people responsible for maintaining it. When one of the servers in that cluster is performing poorly or needs maintenance, we put it through a decommissioning process that salvages any data that can be recovered safely to minimize the risk of data loss in the filesystem. Although we can decommission several servers simultaneously, if too many are going out of business at the same time it slows down the rest of the cluster.

A while back, I wanted to make this process a little easier for us. My goal was to build a “fire-and-forget” mechanism where we could queue up a bunch of servers for decommissioning, and let them work it out themselves without impacting the cluster.

Constraints

As I said before, having too many servers decommissioning at the same time will overload the cluster because other healthy servers have to both try to copy data down from the broken servers and pick up their compute-processing slack. Also, I had an eye on someday setting up a watchdog-type system, where servers could regularly insepect their own performance metrics and automatically decommission themselves (and file a ticket to let us know) if they were getting too badly broken. In that event, I had to make 100% sure that there was no possible bug or other case that could cause every server in the cluster to decide to decommission itself at the same time, since that would completely stop all work on the cluster.

These constraints led me to the solution of using Apache Zookeeper to store a small pool of distributed lockfiles as decommissioning slots, which the servers could fight over for their turn to decommission themselves.

Zookeeper

In case you’re not familiar with Zookeeper, I’ll let them explain themselves:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

Basically, Zookeeper is a hierarchical key-value data store that has some very useful properties:

You can store most any reasonable data in the nodes.
Most common operations are completely atomic.
For instance, if two processes try to create the same node at the same time, it’s guaranteed that only one of them will succeed (and both of them will know about the outcome).
Hosts can register "watchers" that subscribe to a node; when that node changes, the watches are notified (without requiring polling).
It automatically and transparently spans multiple service hosts, so it’s very highly-available and fault-tolerant.
It has bindings for many languages.

Our Solution

As it turned out, we already had a ZooKeeper service operating on our cluster for a different purpose; I appropriated a small corner of the hierarchy for this project and went to town.

First, I wrote some simple scripts to create and delete the Zookeeper nodes I wanted to use as lock files; to keep it simple, we’ll call them /lock1, /lock2, and /lock3. Then I adapted our cluster maintenance tools to use those scripts for potentially destructive processes like decommissioning.

The Workflow

The workflow looks like this:

Hosts A, B, C, D, E, and F all decide that they need to be decommissioned at the same time.
Each host tries to create the Zookeeper node /lock1, populating the node with its own hostname for future reference.
In this example, we’ll say Host E wins the race and gets to create the node. Its decommissioning script begins to run.
Having failed to create /lock1, the remaining five hosts all attempt to create /lock2 in the same fashion.
Let’s say Host B wins this time; it starts decommissioning itself alongside Host E.
Repeat this process for /lock3, which Host A wins.
For the remaining three hosts that didn’t get a lock, sleep for a while and then start trying again with /lock1.

At this point, hosts A, B, and E are all chugging along. The other three hosts are patiently waiting their turn, and they will continue doing so until one of the other machines finishes and its decommissioning job deletes the lockfile Zookeeper node it was holding. When that happens, for instance if E finishes its work:

Host E‘s decommissioning script deletes the Zookeeper node /lock1. Host E is now officially out of service and will do no other work at all until somebody investigates, fixes it, and brings it back to life.
The decommissioning script has been hanging out on Hosts C, D, and F this whole time. On their next passes through the loop, each tries to create /lock1.
Say Host F wins; it starts decommissioning itself and the other two hosts keep cooling their heels in fervent anticipation of the day they might too get to start their work.

Other Integration

Since this process is so hands-off, I wanted to make sure we didn’t wind up in a situation where we had several nodes that were stuck decommissioning and keeping everybody else out of the party. I wrote a Nagios check plugin that would read an arbitrary Zookeeper node and report its existence (and the node’s contents, if any). This is where storing the hostname in the node when it’s created comes in handy - you can check the age of the node and report if it’s more than a few days old.

This monitoring check has come in handy for other people too, so it was well worth my time to have written even aside from its use in this project.

Conclusion

ZooKeeper provided a fairly simple solution to a rather annoying problem. I wanted to be able to decomission servers at will and have them do so in a way that did not impact the cluster. Using ZooKeeper allowed me to limit the number of active decomissions, effectively solving this problem without requiring any special scheduling tooling or special baby-sitting by humans.

December 3, 2012

Day 3 - Zero-Downtime MySQL Schema Changes

This was written by Bob Feldbauer.

It's time to deploy a new version of our awesome application, but this time we're changing database stuff. Can we do it without an outage?

Schema changes can often lock your database, stall your application, and cause an outage, so you'll want to be careful in how you design the infrastructure to permit database changes.

Achieving a zero-downtime schema deployment can be done with load balancer and a technique called Blue-Green deployment.

"Blue-Green deployment" is a fancy term that basically means you have two sets of an application stack. You start with Blue (version N) and Green (N-1), deploy version N+1 to the Green stack, and then cutover to the Green stack.

Traffic for a high-availability architecture for an application might look like this:

Summarizing the above image:

Incoming requests hit the main loadbalancers
That traffic is routed to a pair of caching proxies
Then shipped to another set of loadbalancers
Before hitting the application servers themselves, which have Blue-Green versions (N and N-1 respectively).
The application servers read and write to a database.

Changing the load balancers to point to different web or application servers with new versions to implement Blue-Green is generally trivial; however, deployments with database schema changes aren't always trivial. Schema changes often lock the database which means an outage for your application during the change.

You can implement blue-green deployments for mysql using Master-Master. Many people have tried Master-Master MySQL over the years and found it to be a painful experience. The traditional Master-Master MySQL setup involves two active database servers (we'll call this "Active-Active"). The problem is that both servers can accept reads and writes, and conflicting writes will cause replication to break.

Instead of the traditional Active-Active approach, we can use an Active-Passive MySQL setup to achieve many of the same benefits while avoiding the danger of conflicting writes breaking replication, and it will still allow us to do zero downtime deployments with database schema changes.

In an Active-Passive setup, there are two database servers, but only one can accept writes at any given time. (The one that can accept writes is the "Active" server, while the read-only server is "Passive".) To achieve this, we simply add an additional load balancer layer between the application and database tiers.

Traffic in our new example architecture would swap the last step ("The application servers talk to a database") for these two new steps:

The application servers connect to a pair of high-availability load balancers for their database queries
The load balancer sends database connections to the Active MySQL server

MySQL Replication Details

Each server (both Active and Passive) has a MySQL Master and Slave running on it, just like a Master-Master (Active-Active) setup. Changes occur as follow:

Changes get written to the active server's binary log and flow through replication to the passive server's relay log
The passive server executes the query and writes the event to its own binary log
The active server retrieves the same change via replication into its relay log, but ignores it because the server ID in the event matches its own

Active-Passive MySQL Server Configuration

Just make sure to set server-id to unique values for both servers (i.e. 1 for server X, 2 for server Y), and this is all you realy need:

server-id=1
log_bin=/var/lib/mysql/mysql-bin.log
sync_binlog=1
log_slave_updates=1
log_bin_index=/var/lib/mysql/mysql-bin.index
relay_log=/var/lib/mysql/slave-relay.log
relay_log-index=/var/lib/mysql/slave-relay-log.index
binlog_do_db=Your Database Name

Zero Downtime Database Schema Changes

Let's get down to the nitty-gritty details of how zero downtime database schema changes actually work:

Run STOP SLAVE on both the Active and Passive servers
Run SQL for the schema change on the Passive server
Run START SLAVE on the Active server
Wait for replication lag on the Active server to become small enough (ideally about a second). You can check replication lag with SHOW SLAVE STATUS "Seconds_Behind_Master", although that isn't 100% reliable and you are better off with something like Percona's MySQL Toolkit's pt-heartbeat.
Run LOCK TABLES on the Active server for the final replication catchup
Ensure replication lag is zero on the Active server
Modify your proxy configuration to change the Active/Passive designations
Unlock the new Passive server
Run START Slave on the new Active server

Required Rules for Schema Changes

One small caveat to the whole process is that you must be able to follow/enforce two basic rules for schema changes to work:

The new schema must be backwards compatible with the previous schema:

Add new columns with triggers rather than modifying in place
New columns cannot be required immediately, or old writes will not replicate appropriately
No use of server-generated data functions (UUID, NOW, RAND, etc)

It cannot conflict with pending writes:

No auto-increment INSERT unless the application doesn't insert to that table
No DROP COLUMN nor DELETE rows if they are used in the previous schema version

Conclusion

Don't forget about your databases!

Using an Active-Passive MySQL setup allows zero downtime deployments to become a reality. Active-Passive is much less scary than the traditional Active-Active, Master-Master MySQL setup you may have tried in the past.

December 2, 2012

Day 2 - Building Community for Fun and Profit

This was written by Brandon Burton.

Community usually refers to a village that shares common values. In human communities, intent, belief, resources, preferences, needs, risks, and a number of other conditions may be present and common, affecting the identity of the participants and their degree of cohesiveness.

Since the advent of the Internet, the concept of community has less geographical limitation, as people can now gather virtually in an online community and share common interests regardless of physical location. Prior to the internet, virtual communities (like social or academic organizations) were far more limited by the constraints of available communication and transportation technologies.

Taken from http://en.wikipedia.org/wiki/Community

what community means to me

Community has always been a critical part of my life long obsession with computers and the Internet. At every stage, it was some community that took me a new level of my computer usage, starting with BBSes and gaming clubs, through the community around the Englightenment, particularly its irc channel #e (now on Freenode), and the online open source community, that at the time revolved around Slashdot, Freshmeat, and Themes.org. All of which I got involved in back in High School. It was through these sites and the #e irc channel that I encountered people who got paid! to manage Linux boxes all day. I knew immediately that this is what I wanted to do for a living to. I loved Linux. I loved building servers at home. My path was set.

Along the way, I've become involved in many communities around various open source projects, particularly libraries or web frameworks, commercial technologies, and geographic location, and through these, I've grown as a technologist, as a professional, and as a person. I've made many long lasting friendship and community relationships are responsible for my last three jobs.

I suspect most, if not all, of you are having a me too moment in reading this, because community is a big part of both our world of technology and the world at large. Community involvement is a critical piece of being an Operations professional, and I hope to provide some examples of communities you can join or help build as I discuss what I've learned about building community over my last 14 years of being involved in open source and ops/sysadmin communities.

dat devops

I've the had the privilege of becoming part of the DevOps community over the last two years, including helping organize and speak at last year's devops days Mountain View conference. The DevOps community to me is a perfect example of how a great community is built. It has grown organically over the last three years because people around the world found common ground in a topic and wanted the opportunity to discuss, evolve, and share that topic.

Some of the key things the DevOps community has done are

create free or very cheap conferences at numerous locations around the world
created google groups for discussion
created irc channels for discussion
accepted that marketing and sales will become a part of any vibrant community
encouraged civil discourse and respect (see "the No Asshole rule"
inclusive by default, encourage "noobs" with passion to step and up and be a part
"lead the herd" by speaking, blogging, tweeting, ircing, posting to newsgroups
busted ass to create real change for the topic at hand, not worrying about fame or fortune

hangops history lesson

In the past three months I've been lucky enough to work with Jordan Sissel in building a new community: hangops.com.

Hangops started as an idea I had one day and proposed to Jordan, that he and I do a weekly Google Hangout to have coffee virtually and shoot the shit. I had been working from home for six months and was starting to miss the socialization of being in an office. He liked the idea and said "why not make it public?" I could see no reason not to, and following the recent Twitter trend of #hugops and #dadops, we thought using the hashtag #hangops would be a fun moniker, since the idea is to enable remote Ops folks (dev, qa, security, etc are welcome too!) to hang out via a Google Hangout and shoot the shit.

So we decided on a day/time and did a couple tweets about it. We had a good turn out for the first one and decided to make it a weekly event. After a few weeks of good times and we decided to get official. I registered hangops.com, put up a small static site, registered @hangops, and we started using Google Hangout's On Air feature to live stream the sessions, and record them for people that can't make a session. I put the hangops.com site up as a repo on Github and encouraged people to mention topic ideas by making issues against the repo, which also got a nice response. And we started an irc channel, #hangops on Freenode.

We had a great response to these things and in the last month I've started getting specific people from the greater (dev)Ops community to be special guests as we discuss topics like Puppetconf/Surgecon recap, AWS, Sensu, Career Paths and hiring, and an upcoming session on #monitoringlove.

There has been an amazing response from people in the community and coworkers at Mozilla about hangops. People love the style of the sessions and the material, and they keep asking for more. We even saw a live hangops at Puppetconf.

Reflecting on the growth of the hangops community, I think the key things were:

Ops people who work from home aldo like to socialize
Ops people like to hear from smart and experience folks on interesting topics
I had fun with it
I was willing to invest time and money into it, with no guarantee it would be more than Jordan and myself sharing cat gifs every week.
I've worked hard to make it an open community and encourage people to join the hangout even if they're unsure what they may add
I've made the format a mix of roundtable and topical discussions
encouraging feedback whenever possible

Things I am looking for the future are:

a better site
a google calendar
making a podcast mp3 feed and a video feed
show notes
transcription
a proper logo

Conclusion

The key to building a community is a mix of passion and fearlessness. You have to really be invested in the idea of the community you want to help build, but you also have to be willing to fail

You have a huge array of communication tools available today; Things like Twitter, Google Hangouts, IRC channels, and a fun hash tag make it really easy to organically grow and promote a new community as well.

If you're someone looking for a community about ops and sysadmin things, I'd love to see you join the #hangops community, and if you're someone trying to build your own community, I hope this article has given you some ideas and hopefully some motivation for your own endevour.

About the author

Brandon Burton is currently a webops engineer at Mozilla where he spends his time herding Apache servers, writing Puppet manifests, and mostly posting gifs in IRC channels.

He can be found on Twitter as @solarce cat gifs, memes, and sometimes interesting links.

December 1, 2012

Day 1 - Easy Visualizations with Spreadsheets

This was written by Jordan Sissel.

On the 8th day of the first sysadvent, I talked about ways to get graphs from arbitrary data, but I was never really satisfied with the result since I find gnuplot to be a bit cumbersome (though it is powerful).

Since then, technology and tools have improved greatly. For one, Google's got some pretty neat features in their Google Spreadsheets product. Bonus, it's free to use and if your company already uses Google Apps, you've got an easy way to share data and spreadsheets easily among coworkers.

So why care? Well, the spreadsheets product has some excellent statistical and visual tools.

The first time I was exposed to this tool was when I worked at a web advertising company: When debugging some odd user tracking data, the workflow usually included dumping the logs to csv, loading into Excel, doing some magic, and somehow the answer seemed to reveal itself. My first times watching this process reminded me of those 'enhance that photo!' scenes in some crime dramas, but this wasn't fiction. Sometimes the person driving Excel moved so quickly my face had this "are you a wizard?" expression on it.

Load the data, do some grouping, sort, filter, summarize, "enhance" ... Bam. Answer!

Let's figure out how to do that, but first we need a data set to play with.

Mail server activity

There are a bunch of mail servers at work. Let's look at yesterday's log file sizes and compare them in a spreadsheet (sounds exciting, I know!)

(
  # ssh into a few servers and get the file sizes of certain logs
  echo "host\tfile\tsize"
  for i in server1 server2 server3 server4 ; do 
    ssh $i du -sb /var/log/{mail.log,auth.log,syslog}.1 \
    | awk '{ OFS="\t"; print "'$i'", $2, $1}
  done
) > /tmp/maildata.tsv

The output is hostname, logfile, size-in-bytes; tab-delimited. In general, most spreadsheet tools can import data that is comma or tab-separated quite easily. My data looks like this:

host  file  size
mailer-1  /var/log/mail.log.1 1789031327
mailer-1  /var/log/auth.log.1 2352800
mailer-1  /var/log/syslog.1 1799335420
mailer-12 /var/log/mail.log.1 2066206745
...

Import CSV

Loading this into a spreadsheet is easy. In Google Spreadsheets, File -> Import will let you do it.

Once imported, I get a nice spreadsheet with three columns:

As you see, Each line in the imported file becomes a row.

Pivot Tables

Pivot tables let you group and aggregate data.

To make one, select all the data in your spreadsheet, then choose Data -> Pivot table report from the menu.

Let's try to answer some questions with a pivot table.

Which server has the largest total logs?

On the right of the spreadsheet, you'll see "Report Editor" where you can add rows, columns, and values to your pivot table.

To see which server has the largest total logs:

click 'Rows - Add field' and choose the 'host'
click 'Columns - Add field' and choose 'file'
click 'Values - Add Field' and choose 'size'

At the end of each column and row will be a 'Grand Total' entry which summarizes the whole column or row.

Since I'm looking for 'largest total logs', for the 'Group by: host' panel on the right, choose 'Sort by -> SUM of size in...' 'Grand Total' - which results in this nicely sorted display:

Which log is largest across all servers?

Create a new pivot table, but this time specify 'file' as the rows, don't add any columns, and specify 'size' for the values. The result is a table showing total sum by each log file:

Visualization

Often, problems aren't easy to solve if your only method is to eye-ball a table full of numbers. A big table of numbers is indistinguishable from noise, so you need a better way to represent the data.

Graphs are nice, right? Simply select the data in the pivot table (or the spreadsheet) and choose Insert -> Chart from the menu. How about a bar chart with comparing log sizes across servers?

Or a pie chart?

There are two main points to make here. First, that this tool gives you a wide array of tools to mold your data into something that answers your questions. Second, that the minimum number of steps required are usually small.

Select some data, graph it. Select more data, choose rows/columns/values to view in aggregate, and maybe make a graph on that.

It's pretty awesome.

Leveling Up with Forms

Spreadsheets has this other neat feature called Forms (New -> Form from google docs). A form is basically just a customizable input form that inserts to a spreadsheet when submitted.

What if you created a form and had a computer write to it, kinda like logging to your spreadsheet? When creating the form, there is access control that requires login by default, but you can turn that off - uncheck 'Require sign-in to view this form'.

For fun, Here's a sample form I made. The interesting part here isn't that you can type stuff in as a human, but you can submit with curl if you wanted to!

How to submit to a Google Form with curl:

Take the 'formkey' and put it on this url: https://docs.google.com/spreadsheet/formResponse?formkey=FORMKEY
curl -XPOST https://docs.google.com/spreadsheet/formResponse?formkey=FORMKEY -d "entry.0.single=first&entry.1.single=second&submit=Submit

The http POST payload is form url-encoded with 'entry.N.single' being each field value (in your browser, 'inspect element' on the form inputs to see the names). You must also include 'submit=Submit' set in the POST or google docs won't record the submission.

Here's a full example using the sample form I made (linked above):

echo -n "What is your name? "; read name
echo -n "How are you? "; read status
url="https://docs.google.com/spreadsheet/formResponse?formkey=dE9EOTROMzBIeG92UDZ2cG9XaHRucFE6MQ"
curl -s -XPOST "$url" -d "entry.0.single=${name}&entry.1.single=${status}&submit=Submit"

In the output of curl, you should see something like "Your response has been recorded".

You can view the results of form submissions to this specific form here: sysadvent sample form spreadsheet

The spreadsheet updates in near-real-time with form postings. Any charts you are using are also updated when the spreadsheet changes. Smells like this could be useful for light logging and metric recording, right? I think so!

Looking back at the 'mail logs size' data set above, we can use forms to automate this. Set up a daily cron job that publishes the size of each log file to a form and you can trend usage patterns over time. If you don't have a graphing system available right now, like Graphite or Ganglia, this Forms solution could be just the right tool for you.

Conclusion

Spreadsheets in general are really useful tools because they let you treat your data like Play-Doh - squish and shape your data into whatever form is most useful for you. Google Docs is an easy way to get these spreadsheet and forms features.

December 7, 2012

Day 7 - Bacon Preservation with ZFS

An Intermediate Guide to Saving Your Butt

Terminology

Comparable Stacks

Checksumming

Visibility

When New Users Say ZFS Sucks

Bottlenecking on Disk I/O

Write Log

Filesystem Cache

Compression

Snapshots

Replicating snapshots

Clones

Database Snapshots and Cloning

My Next Projects

Conclusion

Further Reading

December 6, 2012

Day 6 - Watching out for Vendor Lock-In

No Roach Motel

Authentication, Access Control, and Accounting

Be Aware of Provider Limitations

Further Reading

December 5, 2012

Day 5 - Following the White Rabbit

The Background (our environment)

The Problem

Getting Some Help

And So It Begins

Tick, tock, tick, tock.

Finally, a Breakthrough

Nothing. Nada. Zip.

A pair of breakthroughs?

Use the Source, Luke

A funny thing happened on the way to the Forum

Lessons learned

Sun Grid Engine lessons

Other Lessons

Further Reading

December 4, 2012

Day 4 - ZooKeeper for Distributed Coordination

Constraints

Zookeeper

Our Solution

The Workflow

Other Integration

Conclusion

Further Reading

December 3, 2012

Day 3 - Zero-Downtime MySQL Schema Changes

MySQL Replication Details

Active-Passive MySQL Server Configuration

Zero Downtime Database Schema Changes

Required Rules for Schema Changes

Conclusion

Further Reading

December 2, 2012

Day 2 - Building Community for Fun and Profit

what community means to me

dat devops

hangops history lesson

Conclusion

About the author

December 1, 2012

Day 1 - Easy Visualizations with Spreadsheets

Mail server activity

Import CSV

Pivot Tables

Which server has the largest total logs?

Which log is largest across all servers?

Visualization

Leveling Up with Forms

Conclusion

Further Reading

What is sysadvent?

Subscribe

Blog Archive