Journal of a Programmer

Friday, July 31, 2009

Lake Sabrina backpacking

I haven't posted anything for a week; sorry about that. But I have a good excuse, as I was unplugged and unwinding at 11,000 feet in the Eastern Sierra Nevada mountains.

We took a 4-day, 3-night backpack trip out of the Lake Sabrina trailhead above Bishop.

The trailhead is at 9,180 feet, our trail crested at 11,200 feed, and we spent most of our time at 10,975 foot Baboon Lake. Our trip route was similar to the trip marked on this map.

Although we had a few problems with mosquitos, the weather was nearly perfect, we had the lakes almost completely to ourselves, and the setting was as beautiful as any I've seen.

As a reward for the mosquitos, we were treated to vistas of snowpacks, foaming and churning waterfalls, and meadows of gorgeous wildflowers.

See more pictures of the trip here.

And here's a slideshow, if you wish:

Tuesday, July 21, 2009

Perforce pure Java API

Perforce have released a 100% pure Java client-side API (in beta status).

I've used several of the other Perforce client APIs before, including the Ruby and Python APIs. Generally, these APIs work by invoking the Perforce command-line tool (p4) in "-ztag" mode, and then parsing the returned output.

Based on those ideas, I built my own small client-side Java API by spawning p4 from my own Java code, and parsing the output. It is simple and straightforward and works great. However, it is code that I have to maintain myself.

Unlike these other APIs, it appears that the new Java API is a complete implementation of the client-side Perforce networking protocol, so it speaks directly to the server without requiring the command-line tool to be installed and run by the API libraries.

I don't currently need to make heavy use of writing wrappers and tools which automate Perforce commands, and my current code is pretty stable, so there's no rush, but the next time I need to write any code like this, I will certainly investigate the Perforce P4Java API in more detail, as it looks quite nice.

Here's the javadoc.

For example, a common task that I do with my current code is to get a short description of a submitted changelist, and format it nicely for display in my UI. It seems like this would be quite straightforward in the new API, requiring little more than:


 P4Server p4Svr = P4ServerFactory.getServer(...);
 P4Changelist myCL = p4Svr.getChangelist(12345);
 ... access myCL.getDescription(), myCL.getFiles(), etc. ...

This is about as clean as I could possibly want; off the top of my head it looks ideal. Yay Perforce!

Java performance slides from Cliff Click

Cliff Click has posted the slides from his talks at the 2009 JavaOne conference.

If you aren't already familiar with Cliff's work, he's with Azul, the company which makes the custom servers for ultra-high-end Java applications, and he is deeply involved with Java performance issues, particularly those which involve multi-threaded systems.

This year's presentations from Cliff include:

The hardware talk basically makes the point that single-processor-performance has pretty well maxed out, and all the action is in making multiprocessor machines, and so the important questions are:

How well does application software use many CPUs?
Can the hardware guys provide an adequate memory subsystem ("memory is the new disk", says Cliff)?

The benchmarking slides are a great review of the problems of trying to design and run a decent Java benchmark.

This description of the typical performance cycle is all-too-true:

Typical Performance Tuning Cycle
> Benchmark X becomes popular
> Management tells Engineer: “Improve X's score!”
> Engineer does an in-depth study of X
> Decides optimization “Y” will help
● And Y is not broken for anybody
● Possibly helps some other program
> Implements & ships a JVM with “Y”
> Management announces score of “X” is now 2*X
> Users yawn in disbelief: “Y” does not help them

Also, given our discussion a few weeks ago about the odd sizing of application memory, it was interesting to read that Azul are running benchmarks with 350 Gb heaps.

Anyway, the slides are fascinating, even though (as is often the case) it is hard to read presentation slides without having the listener explain them to you. But they're well worth reading, so: Enjoy!

Thursday, July 16, 2009

Ubuntu boot speed rocks!

The boot-up speed of Ubuntu 9 is remarkable!

I have a variety of machines that I use regularly.

Several are RedHat Linux machines that reside in a machine closet. I leave those machines always-running; it's not uncommon for 9 months to elapse between reboots.

Many are Windows desktop and laptop machines. I shut these machines down routinely, but never happily, because these machines take 2, 5, sometimes 7 minutes or more to boot up. If I can avoid it, I try not to shut these machines down, because boot is so painful.

I have an old laptop (a Dell Latitude 610) which runs Ubuntu 9, after years of painfully running Windows XP.

This machine boots up in lightning speed! It gets to the login prompt in 6-7 seconds, and gets to the full Ubuntu desktop in about 5 more seconds.

I'm not sure how they accomplished this (though I recall reading a detailed article several years ago by a team which was focusing on improving startup speed, so I suspect the answer is simple: hard, sustained work), but I sure am happy that they did it!

Tuesday, July 14, 2009

To String.intern, or not to intern?

I don't have a lot of hands-on experience with String.intern.

This function has been around for a long time, but I recently started thinking about it as a possible tool for controlling memory usage. As you'll see, the Sun documentation describes the function as a tool for altering the behavior of String comparisons:

Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification

In my particular case, we maintain a large cache of object graphs, where the object data is retrieved from a database. Furthermore, it so happens that these object graphs contain a large number of strings which are used and re-used quite frequently.

So I was recently pawing through an enormous memory dump, and I was skimming through the dump of all the active String objects, and I was struck by how there was a lot of duplication, and that made me think of whether or not we were using String.intern appropriately.

So I did some research, and found several quite interesting essays on the topic.

My reaction so far is that:

Yes, it looks like String intern'ing could really help.
Unfortunately, the need to potentially configure PermGen space is a bummer.
And, it seems important to have a really good handle on what strings are worth interning. Too few, and I've just changed a bunch of code to no real effect. Too much, and I've exchanged a memory waste problem for a PermGen configuration problem, plus possibly burdened the VM by making it do more work on allocations for little gain.

In general, given my vague understanding of the state of the art in JVMs nowadays, it seems like the JVM teams are working on making memory allocation fast and cheap.

And, as we've discussed previously in this blog, memory is becoming cheap and widely available.

So, it doesn't seem to be immediately obvious that intern'ing will be worth it, because in general it seems like a bad strategy to be asking the CPU to be doing more work in order to conserve memory, unless we have a strong reason to believe that we have a lot of memory duplication and the memory savings are either

so substantial that they will outweigh the extra expense and hassle of managing the intern pool, or
so substantial that the conservation of that much memory will open up a broad new range of applications for the code (e.g., we can now handle some problem sizes that were just way too large for us to handle without interning).

So I think that for now I will read some more, and think about this some more, but I'm not going to race to start planting a lot of intern calls in the code.

Are there profiling features that look at a benchmark underway, and analyze whether or not interning would have been useful?

Threads for JavaScript!

I recently upgraded to Firefox 3.5, and was browsing the release notes.

In the release notes, there was a quiet reference to the new WebWorkers feature:

Support for native JSON, and web worker threads.

Somehow, I just skimmed over this, but then a separate posting to hacks.mozilla.org woke me up and made me pay more atttention:

Web Workers, which were recommended by the WHATWG, were introduced in Firefox 3.5 to add concurrency to JavaScript applications without also introducing the problems associated with multithreaded programs. Starting a worker is easy - just use the new Worker interface.

Reading a bit of background material, it seems as though the Web Workers feature was partially added to previous releases of Firefox, but wasn't quite ready for prime time. Now it seems that the bespin guys have been actively using this, and have proven it out, and so it's become a Real Feature of Firefox 3.5.

The master documentation at the WHATWG is quite thorough, and contains a lot of examples, including:

The simplest use of workers is for performing a computationally expensive task without interrupting the user interface.In this example, the main document spawns a worker to (naïvely) compute prime numbers, and progressively displays the most recently found prime number.
In this example, the main document spawns a worker whose only task is to listen for notifications from the server, and, when appropriate, either add or remove data from the client-side database.
In this example, the main document uses two workers, one for fetching stock updates for at regular intervals, and one for fetching performing search queries that the user requests.
In this example, multiple windows (viewers) can be opened that are all viewing the same map. All the windows share the same map information, with a single worker coordinating all the viewers. Each viewer can move around independently, but if they set any data on the map, all the viewers are updated.
With multicore CPUs becoming prevalent, one way to obtain better performance is to split computationally expensive tasks amongst multiple workers. In this example, a computationally expensive task that is to be performed for every number from 1 to 10,000,000 is farmed out to ten subworkers.
offload all the crypto work onto subworkers.

Apparently, this support is also part of Thunderbird, as well. Active background JavaScript threading in my email reader! Zounds!

Sunday, July 12, 2009

Fakes, Mocks, and Stubs

The other day, I was listening to Roy Osherove on Scott Hanselman's podcast, Hanselminutes, and I really liked the way that Roy described the difference between Fakes, Mocks, and Stubs.

As I heard it (which of course, may not be the way Roy intended it to be heard), it is something like the following:

Fakes are the various bits of test-infrastructure that you end up piecing together in order to write decent unit tests. Any code that is written solely to support testing falls into this category. There are lots of kinds of fakes, among which are Stubs, and Mocks.
Stubs are fakes which have no behavior, or which at best have trivial behavior. Stubs are simple-minded, and have no logic and make no decisions, simply providing an interface to compile/load/run with.
Mocks are fakes which verify the correct behavior of the test. That is, you can assert against mocks; mocks are part of the deciding about whether the test passed or failed.

Here's a link to Osherove's blog posting discussing this in more detail, with a bit of a diagram. In his post, he refers to Martin Fowler's longer and more detailed essay about Mocks and Stubs. Fowler, in turn, describes the various types of testing objects as follows, and cites Gerard Meszaros for the origin of this terminology:

The vocabulary for talking about this soon gets messy - all sorts of words are used: stub, mock, fake, dummy. For this article I'm going to follow the vocabulary of Gerard Meszaros's book. It's not what everyone uses, but I think it's a good vocabulary and since it's my essay I get to pick which words to use.
Meszaros uses the term Test Double as the generic term for any kind of pretend object used in place of a real object for testing purposes. The name comes from the notion of a Stunt Double in movies. (One of his aims was to avoid using any name that was already widely used.) Meszaros then defined four particular kinds of double:
Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test. Stubs may also record information about calls, such as an email gateway stub that remembers the messages it 'sent', or maybe only how many messages it 'sent'.
Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.
Of these kinds of doubles, only mocks insist upon behavior verification. The other doubles can, and usually do, use state verification. Mocks actually do behave like other doubles during the exercise phase, as they need to make the SUT believe it's talking with its real collaborators - but mocks differ in the setup and the verification phases.

There is an enormous amount of additional information available at mockobjects.com, including several quite readable papers about the Mock Objects philosophy of testing.