Stephen Colebourne's blog: javaideas

Showing posts with label javaideas. Show all posts

Thursday, 6 October 2022

Java on-ramp - Fully defined Entrypoints

How do you start a Java program? With a main method of course. But the ceremony around writing such a method is perhaps not the nicest for newcomers to Java.

There has been a bit of dicussion recently about how the "on-ramp" for Java could be made easier. This is the original proposal. Here are follow ups - OpenJDK, Reddit, Hacker news.

Starting point

This is the classic Java Hello Word:

  public class HelloWorld { 
    public static void main(String[] args) { 
      System.out.println("Hello World");
    }
  }

Lots of stuff going on - public, class, a class name, void, arrays, a method, method call. And one of the weirdest things in Java - System.out - a public static field in lower case. Something that is pretty much never seen in normal Java code. (I still remember System.out.println being the most confusing part about getting started in Java 1.0 - why are there two dots and why isn't it out()?)

The official proposal continues to discuss:

A more tolerant launch protocol
Unnamed classes
Predefined static imports for the most critical methods and fields

The ensuing discussion resulted in various suggestions. Having taken some time to reflect on the proposal and discussion, here is my contribution, which is that what is really needed is something more comprehensive.

Entrypoints

When a Java program starts some kind of class file needs to be run. It could be a normal class, but that isn't ideal as we don't really want static/instance variables, subclasses, parent interfaces, access control etc. One suggestion was for it to be a normal interface, but that isn't ideal as we don't want to mark the methods as default or allow abstract methods.

I'd like to propose that what Java needs is a new kind of class declaration for entrypoints.

I don't think this is overly radical. We already have two alternate class declarations - record and enum. They have alternate syntax that compiles to a class file without being explictly a class in source code. What we need here is a new kind - entrypoint - that compiles to a class file but has different syntax rules, just like record and enum do.

I believe this is a fundamentally better approach than the minor tweaks in the official proposal, because it will be useful to developers of all skill levels, including framework authors. ie. it has a much better "bang for buck".

The simplest entrypoint would be:

  // MyMain.java
  entrypoint {
    SystemOut.println("Hello World");
  }

In the source code we have various things:

Inferred class name from file name, the class file is MyMain$entrypoint
Top-level code, no need to discuss methods initially
No access to paraneters
New classes SystemOut, SystemIn and SystemErr
No constructor, as a new kind of class declaration it doesn't need it

The classes like SystemOut may seem like a small change, but it would have been much simpler for me from 25 years ago to understand. I don't favour more static imports for them (either here or more generally), as I think SystemOut.println("Hello World") is simple enough. More static imports would be too magical in my opinion.

The next steps when learning Java are for the instructor to expand the entrypoint.

Add a named method (always private, any name although main() would be common)
Add parameters to the method (maybe String[], maybe String...)
Add return type to the method (void is default return type)
Group code into a block
Add additional methods (always private)

Here are some valid examples. Note that instructors can choose the order to explain each feature:

  entrypoint {
    SystemOut.println("Hello World");
  }
  entrypoint main() {
    SystemOut.println("Hello World");
  }
  entrypoint main(String[] args) {
    SystemOut.println("Hello World");
  }
  entrypoint {
    main() {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    void main(String[] args) {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    main(String[] args) {
      output("Hello World");
    }
    output(String text) {
      SystemOut.println(text);
    }
  }

Note that there are never any static methods, static variables, instance variables or access control. If you need any of that you need a class. Thus we have proper separation of concerns for the entrypoint of systems, which would be Best Practice even for experienced developers.

Progressing to classes

During initial learning, the entrypoint class declaration and normal class declaration would be kept in separate files:

  // MyMain.java
  entrypoint {
    SystemOut.println(new Person().name());
  }
  // Person.java
  public class Person {
    String name() {
      return "Bob";
    }
  }

However, at some point the instructor would embed an entrypoint (of any valid syntax) in a normal class.

  public class Person {
    entrypoint {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

We discover that an entrypoint is normally wrapped in a class which then offers the ability to add static/instance variables and access control.

Note that since all methods on the entrypoint are private and the entrypoint is anonymous, there is no way for the rest of the code to invoke it without hackery. Note also that the entrypoint does not get any special favours like an instance of the outer class, thus there is no issue with no-arg constructors - if you want an instance you have to use new (the alternative is unhelpful magic that harms learnability IMO).

Finally, we see that our old-style static main method is revealed to be just a normal entrypoint:

  public class Person {
    entrypoint public static void main(String[] args) {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

ie. when a method is declared as public static void main(String[]) the keyword entrypoint is implicitly added.

What experienced developers gain from this is a clearer way to express what the entrypoint actually is, and more power in expressing whether they want the command line arguments or not.

Full-featured entrypoints

Everything above is what most Java developers would need to know. But an entrypoint would actually be a whole lot more powerful.

The basic entrypoint would compile to a class something like this:

  // MyMain.java
  entrypoint startHere(String[] args) {
    SystemOut.println("Hello World");
  }
  // MyMain$entrypoint.class
  public final MyMain$entrypoint implements java.lang.Entrypoint {
    @Override
    public void main(Runtime runtime) {
      runtime.execute(() -> startHere(runtime.args()));
    }
    private void startHere(String[] args) {
      SystemOut.println("Hello World");
    }
  }

Note that it is final and methods are private.

The Entrypoint interface would be:

  public interface java.lang.Entrypoint {
    /**
     * Invoked by the JVM to launch the program.
     * When the method completes, the JDK terminates.
     */
    public abstract void main(Runtime runtime);
  }

The Runtime.execute method would be something like:

  public void execute(ThrowableRunnable runnable) {
    try {
      runnable.run();
      System.exit(0);
    } catch (Throwable ex) {
      ex.printStackTrace();
      System.exit(1);
    }
  }

The JVM would do the following:

Load the class file specified on the command line
If it implements java.lang.Entrypoint call the no-args constructor and invoke it
Else look for a legacy public static void main(String[]), and invoke that

Note that java.lang.Entrypoint is a normal interface that can be implemented by anyone and do anything!

This last point is critical to enhancing the bang-for-buck. I was intriguied by things like Azul CRaC which wants to own the whole lifecycle of the JVM run. Wouldn't that be more powerful if they could control the whole lifecycle through Entrypoint. Another possibile use is to reset the state when an application has finished, allowing the same JVM to be reused - a bit like Function-as-a-Service providers or build system daemons do. (I suspect it may be possible to enhance the entrypoint concept to control the shutdown hooks and to catch things like System.exit but that is beyond the scope of this blog.) For example, here is a theoretical application framework entrypoint:

  // FrameworkApplication.java - an Open Source library
  public interface FrameworkApplication extends Entrypoint {
    public default main(Runtime runtime) {
      // do framework things
      start();
      // do framework things
    }
    public abstract start();
  }

Applications just implement this interface, and they can run it by specifying their own class name on the command line, yet it is a full-featured framework application!

Summary

I argue that the proposal above is more powerful and more useful to experienced developers than the official proposal, while still meeting the core goal of step-by-step on-ramp learning. Do let me know what you think. (Reddit discussion).

Monday, 8 December 2014

What might a Beans v2.0 spec contain?

What features should be in scope and out of scope for a Beans specification v2.0?

Note: I am an independent blogger, not an Oracle employee, and as such this blog is my opinion. Oracle owns the JavaBeans trademark and controls what actually happens to the specification.

Update 2015-01-03: New home page for the Beans v2.0 effort now available!

JavaBeans v1.0

My last blog examined the contents of the current JavaBeans (TM) specification. The key feature elements can be summarized as follows:

Properties - accessed using getter and setter methods
Events - change events for properties
Methods - all public methods
BeanInfo - permits customization of properties, events and methods
Beans - a utility class to access the features
Property editor - a mechanism to edit properties, including conversion to and from a string format
GUI support - allows a bean to be manipulated visually

These features were a good fit for 1997, given that the original goal was to be a component system, interoperating with other similar systems like COM/DCOM. These features are no longer what we need, so any Beans v2.0 is naturally going to look a lot different.

Beans v2.0 - Data abstraction

The key goal of a Beans v2.0 spec is to define an abstraction over the data in an object.

When such an abstraction exists, frameworks and libraries become far simpler to write and interact with. For example, a serialization framework at its most basic level simply needs to iterate over all the beans and properties in an object graph in order to perform its task.

In addition to frameworks, many ad-hoc coding tasks become simpler because it is much easier to walk over the object graph of beans and properties. For example, to find all instances of a particular type within an object graph starting from an arbitrary bean.

The data abstraction needs to be fully round-trip. The spec must allow for data to be read from a bean into another format. But it must also provide the tools to allow the original bean to be recreated from the data previously read.

Beans v2.0 - Mutable vs Immutable

The JavaBean v1.0 specification implicitly defines mutable objects. It requires all beans to have a no-args constructor, thus the only way for the properties to get populated is through the use of setters. This mutability is one of the key complaints that people have about JavaBeans, and often Java itself by association.

Any Beans v2.0 specification must support immutable objects.

Immutable classes are now a common design approach to systems and are increasingly popular. Supporting them is thus vital to encompass the range of data objects in use today.

Any Beans v2.0 specification must also still support mutable objects. While there are those that turn their nose up at any mutability, it is still a perfectly valid choice for an application. The decision between mutable and immutable data objects is a trade-off. The Beans v2.0 spec needs to support both options.

The implications of supporting immutable is that an alternative to a no-args constructor and public setters will be needed to support construction. This is likely to be a "bean builder" of some kind.

Beans v2.0 - Beans and Properties

The term "bean" is often associated with a specific Java class, such as Person. The term "property" is often associated with a single field on the class, such as surname or forename.

Any Beans v2.0 spec must not have a reflection-based view of beans and properties.

Firstly, a property is not the same as a field (instance variable). Fields hold the state of an object, but not necessarily in the form that the object wishes to expose (it might be stored in an optimised format for example). As such, it must be possible for a bean to control the set of properties that it publishes.

Secondly, it can be desirable to a bean with a dynamic set of properties. This might consist of a bean with a fixed set of base properties and an extensible map of additional ones, or a bean which is entirely dynamic, much closer to the data model of many dynamic languages.

The implications of this is that there is not a one-to-one mapping between a bean and a Class, nor is there a one-to-one mapping between a property and a java.lang.reflect.Field.

The JavaBeans v1.0 spec treats indexed properties in a special way. I can see no reason to do that in Beans v2.0, particularly given the range of collection types now in use.

Beans v2.0 - Getters and Setters

The current JavaBeans v1.0 spec requires there to be a physical java.lang.reflect.Method instance for both the getter and setter. What Beans v2.0 needs to provide is an abstraction away from this, as it provides many more options.

Any Beans v2.0 spec must not require data access via a java.lang.reflect.Method.

Moving away from direct exposure of reflection raises the level of abstraction. It permits alternative object designs to be exposed as beans.

For example, consider a bean that has no public getters and setters. With the Beans v2.0 abstraction, this could still expose its data as properties, providing interoperation with frameworks, but encapsulating the data for most programmatic uses. This tackles another big criticism of JavaBeans, which is that they cause state to be exposed rather than hidden.

Another example is the HashMap based bean. In this case a Method does not exist for each property, just a single get by name. Being able to expose this as a bean provides the desired dynamic behaviour.

The implications of this are simply that the Beans v2.0 spec abstraction will simply be long the lines of the Map interface get and put.

Beans v2.0 - Annotations

The current JavaBeans v1.0 spec predates annotations in Java. But annotations are now key to many programming practices in Java.

Any Beans v2.0 spec must provide access to property and bean annotations.

Annotations are represented by implementations of the Annotation interface. It so happens that the Annotation interface is not part of reflection. It is possible to implement an annotation yourself (apart from a compiler warning).

What is needed is for the Beans v2.0 spec abstraction to provide access to the set of annotations on the bean, and the set of annotations on each property without going via reflection. With this step, and the ones above, many direct uses of reflection will completely disappear, with the Beans v2.0 abstraction being good enough.

Beans v2.0 - Events

The current JavaBeans v1.0 spec considers events to be first class parts of the abstraction.

There is no good reason to include events in the Beans v2.0 spec.

The current JavaBeans v1.0 spec is GUI focussed, and as such events were an essential element. But the Beans v2.0 spec outlined here focuses on a different goal, abstracting over the data in the object. Round-tripping data from a bean to another format and back to a bean does not require events or a GUI.

There will be those reaching for the comments box right now, demanding that events must be included. But think carefully and you should realise that you don't need to.

None of what is proposed here for Beans v2.0 takes away what works today with JavaBeans v1.0. GUI developers can still carry on using the same getters, setters and events that they do today, including bound and constrained properties. The java.beans package will continue to exist and continue to be available. In addition, JavaFX is the way forward for Java GUIs, and it has its own approach to properties.

Beans v2.0 - Conversion to/from String

The current JavaBeans v1.0 spec includes PropertyEditor. One of the use cases of the class, notably in Spring framework, is to convert a simple object (typically a VALJO) to and from a String. For example, this includes Integer, Enum, Class and Currency.

Any Beans v2.0 spec must tackle conversion to/from a String for simple types.

The most common use case for the Beans v2.0 spec will be to iterate over the object graph and process it in some way, such as to write out a JSON or XML message. In order for this to be practical, all objects in the object graph need to be either a bean, or a simple type convertible to a string. As such, simple type string conversion is very much needed to round out the spec.

The Joda-Convert project contains a simple set of interfaces and classes to tackle this problem, and I imagine any Beans v2.0 spec solution would be similar.

Beans v2.0 - But what does a bean actually look like?

Part of the goal of Beans v2.0 is to allow for different kinds of bean. Immutable as well as mutable. Private getters and setters as well as public. Dynamic map-based structures as well as field-based ones.

Any Beans v2.0 spec should not overly restrict how beans are implemented.

We know that Oracle has no current plans for properties at the language level. As such, it also makes little sense for Beans v2.0 spec to specify a single approach for how classes should be designed. Instead, the spec aims to allow classes that are designed in different ways to all agree on a common way to interact with frameworks and libraries that does not require getters, setters and a no-args constructor.

The Joda-Beans, Lombok, Immutables and Auto projects (and many others) all provide different ways to create beans without writing manual code. All could be adapted to work with the proposed Beans v2.0 spec, as could frameworks like Jackson, Hibernate, Dozer and Groovy (to name but a few).

The simple answer is that the bean could look just like it does today with getters and setters. The real answer is that the bean may look any way it likes so long as it can expose its data via the Beans v2.0 API.

Summary

The current JavaBeans v1.0 spec is very old and not that useful for the kinds of applications we build today. This blog has outlined my opinion on what direction a Beans v2.0 spec should take:

Focussed on data abstraction
Support for Mutable and Immutable beans
Fully abstracted from reflection
No requirement for getters and setters
No requirement for a no-args constructor
Support for dynamically extensible properties
Access to bean and property annotations
Conversion of simple types to and from a String
No support for events

Feel free to comment on the high-level scope outlined above. In particular, if you develop a project that uses the JavaBeans spec (directly or indirectly), please comment to say whether the proposed spec is likely to be sufficient to cover your use cases.

I've also started work on a prototype of the code necessary to support the spec. So also feel free to raise pull requests or join the mailing list / group.

Monday, 10 November 2014

One more library for JDK 9?

On Tuesday night in Belgium at Devoxx I'm asking the question "Is there room for one more library in JDK 9"?

With JDK 9 full of modules, and the long wait until JDK 10 for more interesting things like value types, I pose this question to the community to ask what would you add to JDK 9 if you could?

But, I'm asking more than that. Any additional work for JDK 9 would have to be led by someone outside Oracle capable of drawing together a community. And whatever it is it would need to be tightly focussed - the timescale available is limited.

Think of something that could be expressed in 20 classes or less. Involves only code and no library or JVM changes. Something that the JDK doesn't have, but you think it should. Think about whether you'd put in your time to try and make it happen.

But also think about whether your idea should be in the JDK? Is it better as an external library? What makes it necessary to be in the JDK, especially a modularized JDK? Will Oracle's Java stewards support the idea?

As an example, I'll put forward Joda-Convert, a very small and simple library that provides a mechanism to convert and round-trip a simple object (like a VALJO) to a formatted string, ideal for use in serialization, such as JSON or XML (and noting that JSON is coming to JDK 9...). I'm sure that if you're reading this, you've got some other ideas!

If you've got any thoughts, ideas, or want to suggest something, please add a comment. But remember the rules above - there is no free lunch here. Are you willing to take a lead on your idea?

Thursday, 6 November 2014

Better nulls in Java 10?

Rethinking null when Java has value types.

Null in Java

Was null really the billion dollar mistake? Well, thats an interesting question, but what is certain is that null is widely used in Java, and frequently a very useful concept. But, just because it is useful doesn't mean that there isn't something better.

Optional everywhere!

There are those that would point you to adopt the following strategy - avoid using null everywhere, and use Optional if you need to express the possible absence of something. This has a certain attraction, but I firmly believe the Java is not yet ready for this (in Java SE 8) as per my previous post on Optional.

Specifically, all the additional object boxes that Optional creates will have an effect of memory and garbage collection unless you are very lucky. In addition, its relatively verbose in syntax terms.

Nullable annotations

A second approach to handling null is to use the @Nullable and @NonNull annotations. Except that it is only fair to point out that from an engineering perspective, annotations are a mess.

Firstly, there is no formal specification for nullable annotations. As such, there are multiple competing implementations - one from FindBugs, one from JSR-305, one from IntelliJ, one from Eclipse, one from Lombok and no doubt others. And the semantics of these can differ in subtle ways.

Secondly, they don't actually do anything:

  @javax.annotation.ParametersAreNonnullByDefault
  public class Foo {
    public StringBuilder format(String foo) {
      return new StringBuilder().append(foo);
    }
    public StringBuilder formatAllowNull(@Nullable String foo) {
      return new StringBuilder().append(foo);
    }
  }

Here, I've use @javax.annotation.ParametersAreNonnullByDefault to declare that all parameters to methods in the class are non-null unless otherwise annotated, that is they only accept non-null values. Except that there is nothing to enforce this.

If you run an additional static checker, like FindBugs or your IDE, then the annotation can warn you if you call the "format" method with a null. But there is nothing in the language or compiler that really enforces it.

What I want is something in the compiler or JVM that prevents the method from being called with null, or at least throws an exception if it is. And that cannot be switched off or ignored!

The nice part about the annotations is that they can flip the default approach to null in Java, making non-null the default and nullability the unusual special case.

Null with value types in JDK 10

The ongoing work on value types (target JDK 10 !!) may provide a new avenue to explore. Thats because Optional may be changed in JDK 10 to be a value type. What this might mean is that an Optional property of a Java-Bean would no longer be an object in its own right, instead it would be a value, with its data embedded in the parent bean. This would take away the memory/gc reasons preventing use in beans, but there is still the syntactic overhead.

Now for the hand-waving part.

What if we added a new psuedo keyword "nonnull" that could be used as a modifier to a class. It would be similar to @javax.annotation.ParametersAreNonnullByDefault but effectively make all parameters, return types and variables in the class non-null unless modified.

Then, what if we introduce the use of "?" as a type suffix to indicate nullability. This is a common syntax, adopted by Fantom, Ceylon and Kotlin, and very easy to understand. Basically, wherever you see String you know the variable is non-null and wherever you see String? you know that it might be "null".

Then, what if we took advantage of the low overhead value type nature of Optional (or perhaps something similar but different) and used it to represent data of type String?. Given this, a "nonnull" class would never be able to see a null at all. This step is the most controversial, and perhaps not necessary - it may be possible for the compiler and JVM to track String and String? without the Optional box.

But the box might come in useful when dealing with older code not written with the "nonnull" keyword. Specifically, it might be possible to teach the JVM to be able to auto box and un-box the Optional wrapper. It might even be possible to release a standard annotation jar that projects could use on codebases that need to remain compatible with JDK 8 or 9, where the annotation could be interpreted by a JDK 10 JVM to provide similar nullability information.

  public nonnull class Foo {
    public StringBuilder format(String foo) {
      return new StringBuilder().append(foo);
    }
    public StringBuilder formatAllowNull(String? foo) {
      return new StringBuilder().append(foo);
    }
  }

The example above has been rewritten from annotations to the new style. It is clearly a shorter syntax, but it would also benefit from proper integration into the compiler and JVM resulting in it being very difficult to call "format" with a null value. Behind the scenes, the second method would be compiled in bytecode to take a parameter of Optional<String>, not String.

The hard part is the question of what methods can be called on an instance of String?. The simple encoding suggests that you can only call the methods of Optional<String>, but this might be a little surprising given the syntax. The alternative is to retain the use of null, but with an enforced check before use.

The even harder part is what to do with things like Map.get(key) where null currently has two meanings - not found and found null.

This concept also appears to dovetail well into the value type work. This is because value types are likely to behave like primitives, and thus cannot be null (the value type has to be boxed to be able to be null). By providing proper support for null in the language and JVM, this aspect of value types, which might otherwise be a negative, becomes properly handled.

It should also be pointed out that one of the difficulties that current tools have in this space is that the JDK itself is not annotated in any way to indicate whether null is or is not allowed as an input or return type. Using this approach, the JDK would be updated with nullability, solving that big hurdle.

(As a side note, I wrote up some related ideas in 2007 - null-safe types and null-safe invocation.)

Just a reminder. This was a hand-wavy thought experiment, not a detailed proposal. But since I've not seen it before, I thought it was worth writing.

Just be careful not to overuse Optional in the meantime. Its not what it was intended for, despite what the loud (and occasionally zealot-like) fans of using it everywhere would have you believe.

Summary

A quick walk through null in Java, concluding with some hand-waving of a possible approach to swapping the default for types wrt null.

Feel free to comment on the concept or the detail!

Tuesday, 4 December 2012

Annotating JDK default data

A common issue in Java development is use of the default Locale, default TimeZone and default CharSet.

JDK defaults

The JDK has a number of defaults that apply to a running JVM. The most well known are the default Locale, default TimeZone and default CharSet.

These defaults are very useful for getting systems up and running quickly. When a developer new to Java writes some code, they should get the localized answer they expect. However, in any larger environment, especially code intended to run on a server, the defaults cause a problem.

The classic example of this is the String.toLowerCase() method. Many developers use str1.toLowerCase().equals(str2.toLowerCase()) to check if two strings are equal ignoring case. But this code is not valid is all Locales! It turns out that in Turkey, there are two different upper case versions of the letter I and two different lower case versions. This causes lots of problems.

The problem applies more generally to server side applications. Any server-side code that relies on a JDK method using one of the defaults will alter its behaviour based on where and how the server is setup. This is clearly undesirable.

As is often the case, there are two competing forces - ease of use for newcomers, and bugs in larger applications. There is also the issue of backwards compatibility, meaning that the JDK methods depending on the defaults cannot be removed.

One approach has been taken by the Lucene project, scanning code using ASM. The link/tool also provides a useful list of JDK methods affected by this problem:

  java.lang.String#<init>(byte[])
  java.lang.String#<init>(byte[],int)
  java.lang.String#<init>(byte[],int,int)
  java.lang.String#<init>(byte[],int,int,int)
  java.lang.String#getBytes()
  java.lang.String#getBytes(int,int,byte[],int) 
  java.lang.String#toLowerCase()
  java.lang.String#toUpperCase()
  java.lang.String#format(java.lang.String,java.lang.Object[])

  java.io.FileReader
  java.io.FileWriter
  java.io.ByteArrayOutputStream#toString()
  java.io.InputStreamReader#(java.io.InputStream)
  java.io.OutputStreamWriter#(java.io.OutputStream)
  java.io.PrintStream#(java.io.File)
  java.io.PrintStream#(java.io.OutputStream)
  java.io.PrintStream#(java.io.OutputStream,boolean)
  java.io.PrintStream#(java.lang.String)
  java.io.PrintWriter#(java.io.File)
  java.io.PrintWriter#(java.io.OutputStream)
  java.io.PrintWriter#(java.io.OutputStream,boolean)
  java.io.PrintWriter#(java.lang.String)
  java.io.PrintWriter#format(java.lang.String,java.lang.Object[])
  java.io.PrintWriter#printf(java.lang.String,java.lang.Object[])

  java.nio.charset.Charset#displayName()

  java.text.BreakIterator#getCharacterInstance()
  java.text.BreakIterator#getLineInstance()
  java.text.BreakIterator#getSentenceInstance()
  java.text.BreakIterator#getWordInstance()
  java.text.Collator#getInstance()
  java.text.DateFormat#getTimeInstance()
  java.text.DateFormat#getTimeInstance(int)
  java.text.DateFormat#getDateInstance()
  java.text.DateFormat#getDateInstance(int)
  java.text.DateFormat#getDateTimeInstance()
  java.text.DateFormat#getDateTimeInstance(int,int)
  java.text.DateFormat#getInstance()
  java.text.DateFormatSymbols#()
  java.text.DateFormatSymbols#getInstance()

  java.text.DecimalFormat#()
  java.text.DecimalFormat#(java.lang.String)
  java.text.DecimalFormatSymbols#()
  java.text.DecimalFormatSymbols#getInstance()
  java.text.MessageFormat#(java.lang.String)
  java.text.NumberFormat#getInstance()
  java.text.NumberFormat#getNumberInstance()
  java.text.NumberFormat#getIntegerInstance()
  java.text.NumberFormat#getCurrencyInstance()
  java.text.NumberFormat#getPercentInstance()
  java.text.SimpleDateFormat#()
  java.text.SimpleDateFormat#(java.lang.String)

  java.util.Calendar#()
  java.util.Calendar#getInstance()
  java.util.Calendar#getInstance(java.util.Locale)
  java.util.Calendar#getInstance(java.util.TimeZone)
  java.util.Currency#getSymbol()
  java.util.GregorianCalendar#()
  java.util.GregorianCalendar#(int,int,int)
  java.util.GregorianCalendar#(int,int,int,int,int)
  java.util.GregorianCalendar#(int,int,int,int,int,int)
  java.util.GregorianCalendar#(java.util.Locale)
  java.util.GregorianCalendar#(java.util.TimeZone)

  java.util.Scanner#(java.io.InputStream)
  java.util.Scanner#(java.io.File)
  java.util.Scanner#(java.nio.channels.ReadableByteChannel)
  java.util.Formatter#()
  java.util.Formatter#(java.lang.Appendable)
  java.util.Formatter#(java.io.File)
  java.util.Formatter#(java.io.File,java.lang.String)
  java.util.Formatter#(java.io.OutputStream)
  java.util.Formatter#(java.io.OutputStream,java.lang.String)
  java.util.Formatter#(java.io.PrintStream)
  java.util.Formatter#(java.lang.String)
  java.util.Formatter#(java.lang.String,java.lang.String)

Thinking about the two competing forces, it seems to me that the best option would be a new annotation.

Imagine the JDK adds an annotation @DependsOnJdkDefaults. This would be used to annotate all methods in the JDK that directly or indirectly rely on a default, including all of those above. Developers outside the JDK could of course also use the annotation to mark their methods relying on defaults.

  @DependsOnJdkDefaults
  public String toLowerCase() {
    return toLowerCase(Locale.getDefault());
  }

Tooling like Checkstyle, IDEs and perhaps even the JDK compiler could then use the annotation to warn developers that they should not use the method. It would be possible to even envisage an IDE setting to hide the methods from auto-complete.

This would seem to balance the competing forces by providing information that could be widely used and very valuable.

Summary

I think marking methods that rely on JDK defult Locale/TimeZone/CharSet with an annotation would be a powerful tool. What do you think?

Thursday, 17 June 2010

Exception transparency and Lone-Throws

The Project Lambda mailing list has been considering exception transparency recently. My fear with the proposal in this area is that the current proposal goes beyond what Java's complexity budget will allow. So, I proposed an alternative.

Exception transparency

Exception transparency is all about checked exceptions and how to handle them around a closure/lambda.

Firstly, its important to note that closures are a common feature in other programming languages. As such, it would be a standard approach to look elsewhere to see how this is handled. However, checked exceptions are a uniquely Java feature, so this approach doesn't help.

Within Neal Gafter's BGGA and CFJ proposals, and referenced by the original FCM proposal is the concept and solution for exception transparency. First lets look at the problem:

Consider a method that takes a closure and a list, and processes each item in the list using the closure. For out example, we have a conversion library method (often called map) that transforms an input list to an output list:

  // library method
  public static <I, O> List<O> convert(List<I> list, #O(I) block) {
    List<O> out = new ArrayList<O>();
    for (I in : list) {
      O converted = block.(in);
      out.add(converted);
    }
    return out;
  }
  // user code
  List<File> files = ...
  #String(File) block = #(File file) {
    return file.getCanonicalPath();
  };
  List<String> paths = convert(list, block);

However, this code won't work as expected unless we specially handle it in closures. This is because the method getCanonicalPath can throw an IOException.

The problem of exception transparency is how to transparently pass the exception, thrown by the user supplied closure, back to the surrounding user code. In other words, we don't want the library method to absorb the IOException, or wrap it in a RuntimeException.

Project Lambda approach

The approach of Project Lambda is modelled on Neal Gafter's work. This approach adds addition type information to the closure to specify what checked exceptions can be thrown:

  // library method
  public static <I, O, throws E> List<O> convert(List<I> list, #O(I)(throws E) block) throws E {
    List<O> out = new ArrayList<O>();
    for (I in : list) {
      O converted = block.(in);
      out.add(converted);
    }
    return out;
  }
  // user code
  List<File> files = ...
  #String(File)(throws IOException) block = #(File file) {
    return file.getCanonicalPath();
  };
  List<String> paths = convert(list, block);

Notice how more generic type information was added - throws E. In the library method, this is specified at least three time - once in the generic declaration, once in the function type of the block and once on the method itself. In short, throws E says "throws zero-to-many exceptions where checked exceptions must follow standard rules".

However, the user code also changed. We had to add the (throws IOException) clause to the function type. This actually locks in the exception that will be thrown, and allows checked exceptions to continue to work. This creates the mouthful #String(File)(throws IOException).

It has recently been noted that syntax doesn't matter yet in Project Lambda. However, here is a case where there is effectively a minimum syntax pain. No matter how you rearrange the elements, and what symbols you use, the IOException element needs to be present.

On the Project Lambda mailing list I have argued that the syntax pain here is inevitable and unavoidable with this approach to exception transparency. And I've gone further to argue that this syntax goes beyond what Java can handle. (Imagine some of these declarations with more than one block passed to the library method, or with wildcards!!!)

Lone throws approach

As a result of the difficulties above, I have proposed an alternative - lone-throws.

The lone-throws approach has three elements:

Any method may have a throws keyword without specifying the types that are thrown ("lone-throws"). This indicates that any exception, checked or unchecked may be thrown. Once thrown in this manner, any checked exception flows up the stack in an unchecked manner.
Any catch clause may have a throws keyword after the catch. This indicates that any exception may be caught, even if the exception isn't known to be thrown by the try block.
All closures are implicitly declared with lone throws. Thus, all closures can throw checked and unchecked exceptions without declaring the checked ones.

Here is the same example from above:

  // library method
  public static <I, O> List<O> convert(List<I> list, #O(I) block) {
    List<O> out = new ArrayList<O>();
    for (I in : list) {
      O converted = block.(in);
      out.add(converted);
    }
    return out;
  }
  // user code
  List<File> files = ...
  #String(File) block = #(File file) {
    return file.getCanonicalPath();
  };
  List<String> paths = convert(list, block);

If you compare this example to the very first one, it can be seen that it is identical. Personally, I'd describe that as true exception transparency (as opposed to the multiple declarations of generics required in the Project Lambda approach.)

It works, because the closure block automatically declares the lone-throws. This allows all exceptions, checked or unchecked to escape. These flow freely through the library method and back to the user code. (Checked exceptions only exist in the compiler, so this has no impact on the JVM)

The user may choose to catch the IOException, however they won't be forced to. In this sense, the IOException has become equivalent to a runtime exception because it was wrapped in a closure. The code to catch it is as follows:

  try {
    paths = convert(list, block);  // might throw IOException via lone-throws
  } catch throws (IOException ex) {
    // handle as normal - if you throw it, it is checked again
  }

The simplicity of the approach in syntax terms should be clear - it just works. However, the downside is the impact on checked exceptions.

Checked exceptions have both supporters and detractors in the Java community. However, all must accept that given projects like Spring avoiding checked exceptions, their role has been reduced. It is also widely known that other newer programming languages are not adopting the concept of checked exceptions.

In essence, this proposal provides a means for the new reality where checked exceptions are less important to be accepted. Any developer may use the lone-throws concept to convert checked exceptions to unchecked ones. They may also use the catch-throws concept to catch the exceptions that would otherwise be uncatchable.

This may seem radical, however with the growing integration of non-Java JVM languages, the problem of being unable to catch checked exceptions is fast approaching. (Many of those languages throw Java checked exceptions in an unchecked manner.) As such, the catch-throws clause is a useful language change on its own.

Finally, I spent a couple of hours tonight implementing the lone-throws and catch-throws parts. It took less than 2 hours - this is an easy change to specify and implement.

Summary

Overall, this is a tale of two approaches to a common problem - passing checked exceptions transparently from inside to outside a closure. The Project Lambda approach preserves full type-information and safeguards checked exceptions at the cost of horribly verbose and complex syntax. The lone-throws approach side-steps the problem by converting checked exceptions to unchecked, with less type-information as a result, but far simpler syntax. (The mailing list has discussed other possible alternatives, however these two are the best developed options.)

Can Java really stand the excess syntax of the Project Lambda approach?
Or is the lone-throws approach too radical around checked exceptions?
Which is the lesser evil?

Feedback welcome!

Sunday, 14 March 2010

Java language design by use case

In a blog in 2006 Neal Gafter wrote about how language design was fundamentally different to API design and how use cases were a bad approach to language design. This blog questions some of those conclusions in the context of the Java language.

Java language design by use case

Firstly, Neal doesn't say that use cases should be avoided in language design:

In a programming language, on the other hand, the elements that are used to assemble programs are ideally orthogonal and independent. ...
To be sure, use cases also play a very important role in language design, but that role is a completely different one than the kind of role that they play in API design. In API design, satisfying the requirements of the use cases is a sufficient condition for completeness. In language design, it is a necessary condition.

So, Neal's position seems very sound. Language features should be orthogonal, and designed to interact in new ways that the language designer hadn't thought of. This is one element of why language design is a different skill to API design - and why armchair language designers should be careful.

The problem, and the point of this blog, is that it would appear that the development of the Java language has never been overly concerned with following this approach. (I'm not trying to cast aspersions here on those involved - just trying to provide some background on the language).

Consider inner classes - added in v1.1. These target a specific need - the requirements of the swing API. While they have been used for other things (poor mans closures), they weren't overly designed as such.

Consider enums - aded in v1.5. These target a single specific use case, that of a typesafe set of values. They don't extend to cover additional edge cases (shared code in an abstract superclass or extensibility for example) because these weren't part of the key use case. JSR-310 has been significantly compromised by the lack of shared code.

Consider the foreach loop - added in v1.5. This meets a single basic use case - looping over an array or iterable. The use case didn't allow for indexed looping, finding out if its the first or last time around the loop, looping around two lists pairwise, and so on. The feature is driven by a specific use case.

And the var-args added in v1.5? I have a memory that suggests the use case for its addition was to enable String printf.

Finally, by accounts I've heard, even James Gosling tended to add items to the original Java builds on the basis of what he needed at that moment (a specific use case) rather than to a great overarching plan for a great language.

To be fair, some features are definitely more orthogonal and open - annotations for example.

Looking forward, Project Lambda repeats this approach. It has a clear focus on the Fork-Join/ParallelArray use case - other use cases like filtering/sorting/manipulating collections are considered a second class use case (apparently - its a bit hard to pin down the requirements). Thus, once again the Java language will add a use case driven feature rather than a language designers orthogonal feature.

But is that necessarily a Bad Thing?

Well, firstly we have to consider that the java language has 9 million developers and is probably still the worlds most widely used language. So, being use case driven in the past hasn't overly hurt adoption.

Now, most in the community and blogosphere would accept that in many ways Java is actually not a very good programming language. And somewhere deep down, some of that is due to the use case/feature driven approach to change. Yet, down in the trenches most Java developers don't seem especially fussed about the quality of the language. Understanding that should be key for the leaders of the Java community.

I see two ways to view this dichotomy. One is to say that it is simply because people haven't been exposed to better languages with a more thought through and unified language design. In other words - once they do see a "better designed language" they'll laugh at Java. While I think that is true of the blogosphere, I'm rather unconvinced as to how true that is of the mainstream.

The alternative is to say that actually most developers can more easily handle discrete use-case focussed language features better than abstracted, independent, orthogonal features. In other words - "use feature X do achieve goal Y". I have a suspicion that is how many developers actually like to think.

Looked at in this way, the design of the Java language suddenly seems a lot more clever. The use case driven features map more closely onto the discrete mental models of working developers than the abstract super-powerful ones of more advanced languages. Thus this is another key difference that marks out a blue collar language (pdf) (cache) from an academic experiment.

Project Lambda

I'm writing this blog because of Project Lambda, which is adding closures to the Java language. Various options have been suggested to solve the problem of referring to local variables and whether those reference should be safe across multiple threads or not. The trouble is that there are two use cases - immediate invocation, where local variables can be used immediately and safely, and deferred asynchronous invocation where local variables would be published to another thread and be subject to data races.

What this blog suggests is that maybe these two use cases need to be representable as two language features or two clear variations of the same feature (as in C++11).

Summary

Many of the changes to the Java language, and some of the original features, owe as much to a use case driven approach as to an overarching language design with orthogonal features. Yet despite this supposed "flaw" developers still use the Java language in droves.

Maybe its time to question whether use case focus without orthogonality in language features isn't such a Bad Thing after all?

Feedback welcome!

Tuesday, 6 January 2009

Java 7 - Null-default and Null-safe operators

The most popular small language change request is better handling of nulls. As a result, I've updated my null-handling proposal.

Enhanced null-handling

The votes from Devoxx and JavaEdge were clear. Ordinary developers find handling nulls to be a pain and they would like to see language change to address it. At JavaEdge in particular almost a third of the first preferences and two thirds of the top four preferences went to null-handling.

I blogged my original thoughts two years ago. The updated null-default and null-safe invocation proposal is now available (v0.2).

The proposal covers two new operators, which follow Groovy for syntax (and thus consistency).

The null-default operator ?: returns the LHS unless that is null in which case it returns the RHS:

// today
  String str = getStringMayBeNull();
  str = (str != null ? str : "");
  
  // with null-default operator
  String str = getStringMayBeNull() ?: "";

The null-default operator also works particularly well with auto-unboxing.

Integer value = getIntegerMayBeNull();
  int val = value ?: -1;

In fact, I hope that tools (IDEs and static analysis) would combine to effectively make the null-default operator mandatory when unsafe unboxing is occuring.

The null-safe operator ?. is an alternative form of method/field invocation. If the LHS is null, then the result of the whole method/field invocation is null:

// today
  String result = null;
  Foo foo = getFooMayBeNull();
  if (foo != null) {
    Bar bar = foo.getBarMayBeNull();
    if (bar != null) {
      result = bar.getResult();
    }
  }
  
  // with null-safe operator
  String result = getFooMayBeNull()?.getBarMayBeNull()?.getResult();

There is an interesting case if the last segment of the field/method expression returns a primitive. However, this can be handled by combining the two new operators (hence why its one proposal):

// today
  int dotIndex = -1;
  if (str != null) {
    dotIndex = str.indexOf(".");
  }
  
  // with null-safe and null-default operators
  int dotIndex = str?.indexOf(".") ?: -1;

I considered mandating this when a primitive occurs, but it would have undesirable side effects, so this is one best left to the tools to handle.

More details on all of this in the proposal.

Summary

Nulls remain a pain in Java. There simply isn't enough language support for this most common task. Every day, developers write logic to check and handle nulls and this obscures the meaning of the real code. And every day production systems go down due to NullPointerExceptions that weren't caught. This proposal provides a simple enhancement to Java that tackles the heart of the null issue.

Opinions welcome.

Sunday, 9 November 2008

Java language change - Unique identifier strings

This last week I've been refactoring some old code in my day job to simplify some very crusty code. One of the parts of the refactor has made me write up why language features often affect more than is immediately obvious.

The case of the Unique Identifier String

The specific code I've been working on isn't that significant - its a set of pools that manage resources. The important feature for this discussion is that there are several pool instances each of which has a unique identifier.

When the code started out many years ago, the identifier was simple - the unique name of the pool. As a result, the unique identifier was the pool name, and that was defined as a String:

// example code - hugely simplified from the real thing...
public class PoolManager {
  public static Pool getPool(String poolName) { ... }
  ...
}
public class Pool {
  private String poolName;
  ...
}

Internally, the manager consists of maybe 25 classes (its over-complex, no IoC, and needs refactoring, remember...). Most of the 25 classes have some kind of reference to the pool name, whether to access configuration, logging or some other reason.

At some point in the past, a new development was commissioned that affected the whole system. The new development - maintenance code - was to allow multiple sets of configuration throughout the system.

To achieve this, everywhere that accessed configuration needed a unique key for the configuration it needed to access. Again, as this was a simple lookup, a String was used. And, since the pooling component was affected, a second unique key was added:

// example code - still hugely simplified from the real thing...
public class PoolManager {
  public static Pool getPool(String poolName, String configName) { ... }
  ...
}
public class Pool {
  private String poolName;
  private String configName;
  ...
}

Now, in order to complete the change, the config name was rolled out to most of the 25 classes alongside the poolId. In effect, the true 'unique id' for the pool became the combination of the two separate keys of poolName and configName.

Now, we could debate lots about this design, but thats not the point. The point, in case you missed it, is that we now have up to 25 classes with two 'unique ids' that are really one. In addition, this creates confusion in what things mean. After all, with two keys we now need a map within a map to lookup the actual pool, right? (again, I know the alternatives - this is a blog about what maintenance code does over time, and how to tackle it...)

OK, so how might we improve this using Java?

A better design

If the original developer had coded a PoolId class then the overall design would have been a lot better:

// pre-maintenance:
public class PoolId {
  private String poolName;
  ...
}
public class PoolManager {
  public static Pool getPool(PoolId poolId) { ... }
  ...
}
public class Pool {
  private PoolId poolId;
  ...
}

Now, the maintenance coder would have had a much easier task:

// post-maintenance:
public class PoolId {
  private final String poolName;
  private final String configName;   // NEW CODE ADDED
  ...
}
// NOTHING ELSE CHANGES! PoolManager and Pool stay the same!

Wow! Thats a lot clearer. We've properly encapsulated the concept of the unique pool identifier. This allowed us to change the definition to add the configName during the later maintenance. This isn't rocket science of course, and there isn't anything new in this blog so far...

Now, what I want to do is ask the awkward question - Why wasn't the PoolId class written originally?

Its vital that we understand that question. Its the root cause as to why the code now needs refactoring, and why it is hard to understand and change. (And bear in mind this is just an example scenario - You should be able to think of many similar examples in your own code)

Well, lets look at the PoolId class in more detail. In particular, lets look at the code I omitted above with some '...'.

// real version of PoolId in Java - pretty boring...
public final class PoolId {
  private final String poolName;
  
  public PoolId(String poolName) {
    if (poolName == null) {
      throw new IllegalArgumentException();
    }
    this.poolName = poolName;
  }
  public String getPoolName() {
    return poolName;
  }
  public boolean equals(Object obj) {
    if (obj == this) {
      return true;
    }
    if (obj instanceof PoolId == false) {
      return false;
    }
    PoolId other = (PoolId) obj;
    return poolName.equals(other.poolName);
  }
  public int hashCode() {
    return poolName.hashCode();
  }
  public String toString() {
    return poolName;
  }
}

Now we know why the original developer didn't write the class PoolId. Very, very few of us would - the effort required is simply too great. Its verbose, boring, and probably has enough potential for bugs that it might need its own test.

But the way we write this class - what it actually looks like - is a language design issue!

Quick composites

It is perfectly possible to design a language that makes such classes really easy to write. For example, here is a psuedo-syntax of an imaginary language a bit like Java:

// made up language, based on Java
public class PoolId {
  property state public final String! poolName;
}

The 'property' keyword adds the get/set methods (no need for set in this case, as the field is final). The 'state' keyword indicates that this is part of the main state of the class. Adding the keyword generates the constructor, equals(), hashCode() and toString() methods. And finally, the '!' character means that the string cannot be null.

Adding another item of state is really simple:

public class PoolId {
  property state public final String! poolName;
  property state public final String! configName;
}

Suddenly, adding a new class for things like PoolId doesn't seem a hardship. In fact, we've changed implementing the right design to doing the easy thing. Basically, its about as easy as its ever going to get.

My real point is that if Java had a language feature like this, then there would a much greater chance for the better design to be written. After all, most developers will always take the lazy option - and in Java that is way too many String identifiers.

So, does this imaginary language exist? Well, some get a lot closer than Java, but I don't think any language achieves quite this kind of brevity (prove me wrong!).

In addition, I'm arguing for Fan to encompass 'quick composites' like this. After all, I'd argue that most (80%?) of the classes we write could have auto-generated equals() and hashCode() based on a 'state' keyword.

Summary

As a community of Java developers, we need sometimes to realise that the language we develop in can actually hold us back. A language design feature like this is not just about saving a few keystrokes. It can fundamentally change the way lots of code gets developed simply by changing the better design from very hard/verbose to really easy. And the knock-on effects in maintenance could be huge.

Finally, I want to be clear though that I am NOT advocating a change like this in Java. Java is probably too mature now to handle big changes like this. But new languages should definitely be thinking about it.

Opinions

What languages come close to this design?
What percentage of classes in your codebase could have their equals()/hashCode() methods generated by a 'state' keyword?
Opinions welcome!

Friday, 2 May 2008

Enhancing Java - Multi-lingual blocks

The reality for Java is that there are many other programming languages, and many of those have features that Java developers sometimes wish they could access. But its simply impossible to add all those features. Is there a possible alternative if we think 'outside the box'?

Multi-lingual

What I'm thinking about in this blog is the possibility of embedding Groovy, Ruby, Jython or Scala code directly within Java code.

Why might that be useful?

Well each language has their own benefits, whether Scala's functional style or Groovy's GStrings. Including a small part of another language within the main code body could be useful, although obviously this would be a technique to be used with care.

And it doesn't have to stop at known languages. What about a dedicated 'SQL language'? Or a dedicated 'XML language'? These would be more than just DSLs, but actual languages with whatever syntax rules are most applicable.

So, what might a syntax look like:

 public String fetchRow(int id) {
   :groovy: {
     println "Row id: $id!"
   }
   :sql: {
     SELECT %text% FROM my_table WHERE row_id = %id%;
   }
   return text;
 }

The idea is that a block of code, surrounded by curly brackets, can be identified as belonging to a different language. In this case I've used the syntax of the name of the language (which would have to be imported) surrounded by colons. Note that there is nothing specific about the syntax within the block. Bear in mind that the syntax isn't that important - its the concept that matters.

The Groovy example - just normal Groovy code - outputs the row id using an embedded string. The SQL example is an invented 'language' where a column is read by id, and then returned to the Java code as the variable text.

So, what about the detail? Well, the approach requires two parts.

Firstly, there needs to be a parser for each language that understands the relevant syntax. This will typically be a variation of the normal parser for a 'real' language like Scala or Ruby. For a new language like SQL or XML, it would be written from scratch. The parser also needs to be able to recognise when the block of code in that language is complete.

Secondly, the parser needs to be able to share variables with the surrounding code. As a basic principle, this can be thought of as a map, where the other language code can both read and write to the map. Of course this requires there to be a mapping between the various type systems - for Groovy this should be easy, other languages might find that more tricky.

So how hard is this to implement? Probably pretty hard. But it does open up lots of possibiilties - wheter for embedded DSLs or larger blocks of code in another language.

Summary

This is an outline of an idea to allow other languages, whether existing or new, to be easily embedded directly in existing code. Any thoughts?

Saturday, 26 April 2008

Java 7 - For-each loop control access

I've gathered together a few more thoughts on improving the enhanced for-each loops. The basic idea is to take this very popular Java 5 feature and provide the missing parts.

Control access

One of the more frustrating parts of the Java 5 for-each loop is when you are 80% through writing a loop, and you discover you need to remove an item, or require the loop index. At that point, you have to go back and manually change the loop to one of the old formats (in Eclipse at least). This is a hassle.

Perhaps more importantly is that the older for loops simply aren't as clear in their intentions, aren't very DRY, and are definitely more error-prone. As a result, I've documented my proposal to improve the for-each loop with control access. For example, to access the loop index:

 Collection<String> coll = new ArrayList<String>();

 for (String str : coll : it) {
   System.out.println("Item: " + str + ", Index: " + it.index());
 }

And here is an example of removing an item:

 List<String> list  = new ArrayList<String>();

 for (String str : list : it) {
   if (str == null || it.isFirst()) {
     it.remove();
   }
 }

As can be seen, the syntax simply involves adding another colon and a 'variable' name. The 'variable' can be used access loop control and manipulation functions. Note that the additional colon and 'variable' are of course optional for full backwards compatibility.

The document discusses two strategies for implementing the syntax - either via real Java types or as a language level feature. Please read the document for more information.

Maps

I have updated my previous document about extending for-each to maps. The download of the javac implementation remains available from Kijaro.

Summary

It seems increasingly unlikely that there is time for closures to make it into Java 7. There are also many developers expressing real doubts as to whether the complexity of control invocation is just too much for the venerable Java language.

The alternative is smaller improvements like these two. They provide an easy to grasp extension to the popular Java 5 for-each loop, that might still be possible to deliver in Java 7. Opinions welcome, as always.

Saturday, 19 April 2008

Java 7 - For-each loops for Maps

Have you ever been fustrated by the new Java 5 for each loop because it didn't operate directly on maps?

For-each loop for Maps

I have documented a proposal to change Java to allow for each loops on maps. I have also used the Kijaro project to implement the enhanced for-each loops!

  Map<String, Integer> map = new HashMap<String, Integer>();
 
  for (String str, Integer val : map) {
    System.out.println("Entry" + str + "=" + val);
  }

The altered version of javac can be downloaded, with the normal caveats of 'no warranty' and 'not intended for production'.

Closures

The real question here is whether we should use closures to obtain this functionality, or just code a specific language feature. Since it is far from certain that closures will appear in Java 7 due to timescale and resourcing questions, maybe we should be considering the alternatives?

Extending for-each loops to cover maps is a simple extension to the Java language that introduces no radically new concepts. Existing developers should be able to pick up the feature without any difficulty. In addition, developers that have never been exposed to the feature (but have seen a Java 5 for-each loop) should be able to read the code and grasp the meaning without tuition.

The truth is that sometimes the simple solution is the right one. Perhaps, closures are overkill for many of the uses in Java? Hopefully this document and prototype will allow people to kick the tyres on implementing this concept as a language feature allowing a fair comparison with closures.

Summary

I've released a document and prototype of For-each loop for Maps, a language change to build on the Java 5 for-each loop. All feedback welcomed!

Monday, 28 January 2008

Java 7 - Multi-line String literals

One of the most common features in other programming languages is the multi-line String literal. Would it be possible to add this to Java?

Update, 2011-10-31, Just wanted to note that multi-line strings are not in Java 7, nor are they likely to be in Java 8. This blog post is still useful to understand some of the difficulties that would have to be tackled if they were to be included in future.

Update, 2018-01-28: This is now being considered for addition to Java, read more here.

Multi-line String literals

In Java today there is only one form of string literal, supplied in double quotes. Within those double quotes, certain characters have to be escaped.

 String basic = "Hello";
 String three = "This string\nspans three\nlines";
 String welcome = "Hello, My name is \"Stephen\", Hi!";

The first of these two examples is not complex, and would not make use of a multi-line String literal. The other two might be more readable with such a literal.

The standard for defining a multi-line String literal in both Scala and Groovy is three double quotes. This also seems like a sensible choice for Java:

  String three = """This string
spans three
lines""";

This is potentially much more readable, especially with large blocks of text. This form of literal would also avoid the need for escaping:

 String welcome = """Hello, My name is "Stephen", Hi!""";

Note that we no longer need to escape the double quotes. This would be especially useful for regular expressions.

Bear in mind that multi-line String literals are fundamentally no different to normal String literals on the key point of the object created. Both would create java.lang.String objects.

Issues

The first issue is the multi-line arrangement. Since all text within the multi-line literal is included, all lines except the first must begin from column zero. This will look odd in a piece of well-formatted Java code:

  // what a naive multi-line literal forces us to write
  public class MyClass {
    public void doStuff() {
      String three = """This string
spans three
lines""";
      System.out.println(three);
    }
  }
  
  // what we'd like to write
  public class MyClass {
    public void doStuff() {
      String three = """This string
                        spans three
                        lines""";
      System.out.println(three);
    }
  }

One possible solution is to provide a method on String that strips all whitespace after each newline. This could be called directly after the literal. Unfortunately this approach loses some efficiency as the string must be trimmed each time:

  // option with trimNewline()
  public class MyClass {
    public void doStuff() {
      String three = """This string
                        spans three
                        lines""".trimNewLine();
      System.out.println(three);
    }
  }

Another, perhaps better, solution might be to have a syntax variation. If the opening triple quote is followed immediately by a newline, then the position of the first non-space character on the next line represents the column to begin the literal at. (The first newline would not be included in this form of the literal.) Only the space character would be permitted in earlier columns until the end of the literal. This would allow for natural formatting of this kind of string:

  // option with columns determined by first line
  public class MyClass {
    public void doStuff() {
      String three = """
              This string
              spans three
              lines""";
      System.out.println(three);
    }
  }

One final tricky issue is handling a string containing the triple double quote. The answer is probably to ignore this situation (Scala does this). It is going to be very rare, and it can be worked around using string concatenation.

Summary

Multi-line String literals should be a relatively easy addition to Java (anyone fancy adding it to Kijaro?). The main benefits would be avoiding escaping in regular expressions, and pasting in large blocks of text from other sources.

Overall, I think they would be a valuable addition to Java. But have I missed any obvious issues? Are there any other syntax options that should be considered? Opinions welcome as always :-)

Monday, 7 January 2008

Java 7 - Checked exception library change

Personally, I don't like checked exceptions. But why? And can we do anything about it?

Checked exceptions

Its fairly well known that checked exceptions were a kind of experiment in Java. Its also fairly well known that other newer languages are not choosing to follow. The basic reason is that as a language feature they haven't met their goal.

The expectation was that checked exceptions would improve exception handling quality. The reality is that they haven't:

  public void process() {
    try {
      callSomeExceptionThrowingMethod();
    } catch (Exception ex) {
      // ignore
    }
  }

I hope that most of us recognise this as an anti-pattern (in most cases). Rather than discuss this much longer, I'll point everyone at Joshua Gertzen's blog.

His blog reminded me of a possible library change (yes library, not language!) I'd thought of that might help. At least it is something I'd like opinions on:

  public void process() {
    try {
      callSomeExceptionThrowingMethod();
    } catch (Exception ex) {
      ex.rethrowUnchecked();
    }
  }

So, what does 'rethrowUnchecked()' on Exception do? Well it would rethrow the exception, but ignoring the checked exception status of the exception.

  public void rethrowUnchecked() {
    Unsafe.getUnsafe().throwException(this);
    return null;
  }

This all works with Java today. There is no language change, just an alternative view of an existing capability.

So, why would this be useful? Well, I suggest that this could be a better way to handle annoying checked exceptions that we don't really care about. In particular, I believe that it would be easier to understand the exception trace, as it wouldn't consist of lots of nested exceptions - no more 'throw new RuntimeException(ex)'.

Summary

I'm not convinced of the benefits of this one yet. It seems neat, but does it really benefit us, as we still have to write the catch block? Perhaps it is more suited to a language change? Opinions welcome as always :-)

Tuesday, 27 November 2007

Java 7 - Extension methods

A recent document revealed some possible language changes that are being proposed. One of these is is extension methods.

Extension methods

Extension methods allow a user to 'add' a method to an interface or class that you don't control. The original document is linked with BGGA closures. And was followed up by Peter Ahe (broken link).

The classic example of the proposal is as follows (known as use-site extension methods):

// in the application
import static java.util.Collections.sort;
List list = ...
list.sort();

Thus, the sort method appears to be a method on the list, even though we haven't actually changed the List interface. When it is compiled, the extension method is removed, and replaced with the static method call.

Peter Ahe pointed out some flaws with this design. He also proposes declaration-site extension methods:

// in the JDK
public interface List {
  void sort() import static java.util.Collections.sort;
  ...
}

// in the application
list.sort();

In this case, it is easier for the user to find the implementation of the sort method, as it genuinely is a method on the interface, just one implemented elsewhere. Unfortunately, the downside is that now only the JDK authors can add methods to the List interface in this way.

My proposal is that the freedom given to the user by the use-site approach is far more desirable, but the possible side-effects are nasty. My preference would be to add a visible marker to show that this isn't a regular method:

// in the application
import static java.util.Collections.sort;
List list = ...
list.do.sort();

Note the '.do.'. This provides the user with the clear readability to show that something 'out of the ordinary' is going on.

In addition, I would argue that the static method should be marked to indicate that it can be used as an extension method, probably using an annotation:

// in the JDK
public class Collections {
  @ExtensionMethod
  public void sort(List list) { ...}
}

This provides the final piece of the puzzle, preventing innapropriate methods from being used as extension methods. While it does suffer from the same issue of pushing control back to the library author (the JDK), it avoids the issue of which methods get added to the List interface.

The point is that as many methods as appropriate can be tagged with @ExtensionMethod. There is no conceptual overhead in doing so, and it doesn't expand the conceptual weight or complexity of the ListInterface itself.

Summary

I've outlined an alternative proposal for extension methods, that tries to focus on freedom for users, without compromising readability or allowing confusing options.

Opinions welcome, as always :-)

Sunday, 11 November 2007

Kijaro - for Java language changes

I'd like to announce the creation of a new project - kijaro. Kijaro is designed as a place where ideas for changes to the Java language can be implemented.

Kijaro

The kijaro project has been setup following various discussions on blogs, mailing lists and email. Its aim is to provide a very open place for those interested in implementing a change to javac to gather and code.

Kijaro is similar in its scope to the Kitchen Sink Language. The KSL project has been open for quite a while, but has yet to see any new features. KSL also aims to have code reviews and experienced compiler writers involved, which can be seen as quite formal.

Kijaro aims to be lightweight in rules:

Documentation. Each new language feature must have some form of associated document, even if its just a blog. It doesn't have to be much, but should have an outline of why the feature is needed and the syntax implications.
Backwards compatibility. On svn TRUNK all existing Java code must compile.
Comments. Each change must have a comment so we can find it later, such as 'FCM-MREF'.

So far, the proponents of three language enhancements have expressed an interest in working at kijaro:

First Class Methods (FCM) - Stephen Colebourne and Stefan Schulz - method references and inner methods
Properties - Remi Forax - new property keyword
Abstract Enums - Frederic Simon - allowing Enums to have an abstract superclass

In fact, the FCM and Properties code is already checked in.

So, do you have a favourite language change that you want to see implemented in Java 7? Or, would you like to download and try out one of these changes? Then, please join us at kijaro! The more ideas we get implemented the better!

After all, real working prototypes tend to produce good feedback and really encourage the decision makers that any change is practical.

Friday, 5 October 2007

JSR-310 and Java 7 language changes

Part of the difficulty I'm finding with designing JSR-310 (Dates and Times) is that I constantly come across gaps in the language of Java. My concern is that these gaps will shape the API to be less than it should be. Let me give some examples:

BigDecimal

JSR-310 is considering adding classes representing durations. (So is JSR-275, but thats another story.)

The aim of the duration classes is to meet the use case of representing "6 days", or "7 minutes". As a result, the first set of code uses int to represent the amount.

 public class Days {     // code is for blog purposes only!
   private int amount;
   ...
 }

However, what happens when you get down to seconds and milliseconds? Do we have one class that represents seconds, and a separate class that represents milliseconds? That seems rather naff. What would be better is to have a class to represent seconds with decimal places:

 public class Days {
   private double amount;
   ...
 }

But double is a no-no. Like float, it is unreliable - unable to represent some decimal values, and with sometimes unexpected answers from the maths. Of course, the answer is BigDecimal:

 public class Days {
   private BigDecimal amount;
 }

So, why am I, and others, reticent to use this 'correct' solution? Its because we don't have BigDecimal literals and operators. This is a clear case where language-level choices are affecting library-level design for the worse.

Immutables

JSR-310 is based around immutable classes, which are now a well recognised best practice for classes like dates and times. Unfortunately, the Java language is not well setup for immutable classes.

Ideally, there should be language level support for immutable classes. This would involve a single keyword to declare a class as immutable, and could allow certain runtime optimisations (so I'm told...).

 public immutable class DateTime {
   ...
 }

Unfortunately, it is probably too late for Java to do much on this one.

Self types

Another missing Java feature affecting JSR-310 is self-types. These are where a superclass can declare a method that automatically causes all subclasses to return the same type as the subclass:

 public abstract class AbstractDateTime {
   public <this> plusYears(Duration duration) {
     return factory.create(this.amount + duration);   // pseudo-code
   }
 }
 public final class DateTime extends AbstractDateTime {
 }
 // usage
 DateTime result = dateTime.plus(duration);

The thing to note is the 'this' generic style syntax. The effect is that the user of the subclass was able to use the method and get the correct return type. They were returned a DateTime rather than a AbstractDateTime.

This can be achieved today by manually overriding the method in each and every subclass. However that doesn't work if you want to add a new method to the abstract superclass in the future and have it picked up by every subclass (which is what you want in a JSR, as a JSR doesn't have a perfect crystal ball for its first release).

Again, a language-level missing feature severely compromises a library-level JSR.

Operator overloading

Another area where JSR-310 may choose the 'wrong' approach is int-wrapper classes. For JSR-310 this would be a class representing a duration in years.

If I were to give you a problem description that said you need to model 'the number of apples' in a shop', and to be able to add, subtract and multiply that number, you'd have a couple of design options.

The first option is to hold an int and perform regular maths using +, - and *. The downside is that only javadoc tells you that the int is a number of apples, instead of a number of oranges. This is exactly the situation before generics.

The second option is to create an Apples class wrapping the int. Now, you cannot confuse apples with oranges. Unfortunately, you also cannot use +, - and *. This is despite the fact that they are obviously valid in this scenario. Using plus/minus/multipliedBy/dividedBy methods just doesn't have the same clarity.

Given this choice today, most architects/designers seem to be choosing the first option of just using an int. Language designers should really be looking at that decision and shaking their head. The whole point of the generics change was to move away from reliance on javadoc. Yet here, in another design corner, people still prefer a lack of real safety and javadoc to 'doing the right thing'. Why? Primarily because of the lack of operator overloading.

Ironically, I think this is an area that really might change in the future. But if JSR-310 has chosen to use int rather than proper classes by that point, a great opportunity will have passed.

Summary

My assertion that language-level features (or rather missing features) affect library design isn't really news. The interesting thing with JSR-310 is just how constraining the missing features are proving to be.

The trouble is that with no clear idea on where the future of the Java language lies, a JSR like 310 cannot make good choices which will fit well with future language change. The danger is that we simply create another date and time library that doesn't fit well with the Java language of 2010 or later.

Opinions welcome on whether JSR-310 should completely ignore potential language changes, or try to make a best guess for the future.

Monday, 1 October 2007

Java 7 - Properties terminology

The debate around properties is hotting up. I'm going to try and contribute by defining a common terminology for everyone to communicate in.

Properties terminology

This is a classification of the three basic property types being debated. Names are given for each type of property. My hope is that the names will become widely used to allow everyone to explain their opinions more rapidly.

Note that although none of the examples show get/set methods, all of the options can support them. It should also be noted that the interface/class definitions are reduced and don't include all the possible methods.

Type 1 - Bean-independent property

(aka per-class, property-proxy, property adaptor)

This type of property consists of a type-safe wrapper for a named property on a specific type of bean. No reference is held to a bean instance, so the property is a lightweight singleton, exactly as per Field or Method. As such, it could be held as a static constant on the bean.

 public interface BeanIndependentProperty<B, V> {
   /**
    * Gets the value of the property from the specified bean.
    */
   V get(B bean);
   /**
    * Sets the value of the property on the specified bean.
    */
   void set(B bean, V newValue);
   /**
    * Gets the name of the property.
    */
   String propertyName();
 }

This type of property is useful for meta-level programming by frameworks. The get and set methods are not intended to be used day in day out by most application developers. Instead, the singleton bean-independent property instance is passed as a parameter to a framework which will store it for later use, typically against a list of actual beans.

A typical use case would be defining the properties on a bean to a comparator, for example comparing surnames, and if they are equal then forenames. Clearly, the comparator needs to be defined completely independently to any actual bean instance.

 Comparator<Person> comp = new MultiPropertyComparator<Person>(
   Person.SURNAME, Person.FORENAME   // bean-independent property defined as static constant
 );

Bean-independent properties can be implemented in two main ways. The first option uses reflection to access the field:

 public class Person {
  public static final BeanIndependentProperty<Person, String> SURNAME =
      ReflectionIndependentProperty.create(Person.class, "surname");
  private String surname;
 }

The second option uses an inner class to access the field:

 public class Person {
  public static final BeanIndependentProperty<Person, String> SURNAME =
      new AbstractIndependentProperty("surname") {
        public String get(Person person) { return person.surname; }
        public void set(Person person, String newValue) { person.surname = newValue; }
      };
  private String surname;
 }

As bean-independent properties are static singletons, they should be stateless and immutable.

Type 2 - Bean-attached property

(aka per-instance)

This type of property consists of a type-safe wrapper for a named property on a specific instance of a bean. As the bean-attached property is connected to a bean, it must be accessed via an instance method.

 public interface BeanAttachedProperty<B, V> {
   /**
    * Gets the value of the property from the stored bean.
    */
   V get();
   /**
    * Sets the value of the property on the stored bean.
    */
   void set(V newValue);
   /**
    * Gets the name of the property.
    */
   String propertyName();
 }

This type of property is useful for frameworks that need to tie a property on a specific bean to a piece of internal logic. The most common example of this is many parts of the Beans Binding JSR, such as textfield bindings.

The get and set methods are not intended to be used day in day out by most application developers. Instead, the bean-attached property instance is passed as a parameter to a framework which will store it for later use.

A typical use case would be defining the binding from a person's surname to a textfield. In this scenario we are binding the surname on a specific bean to the textfield on a specific form.

 bind( myPerson.surnameProperty(), myTextField.textProperty() );

Bean-attached properties can be implemented in two main ways. The first option uses reflection to access the field. This example show the attached property being created new each time (on-demand), however it could also be cached or created when the bean is created:

 public class Person {
  public BeanAttachedProperty<Person, String> surnameProperty() {
    return ReflectionAttachedProperty.create(this, "surname");
  }
  private String surname;
 }

The second option uses an inner class to access the field. Again, this example show the attached property being created new each time (on-demand), however it could also be cached or created when the bean is created:

 public class Person {
  public BeanAttachedProperty<Person, String> surnameProperty() {
    return new AbstractAttachedProperty("surname") {
        public String get() { return surname; }
        public void set(String newValue) { surname = newValue; }
      };
  private String surname;
 }

Bean-attached properties are merely pointers to the data on the bean. As such, they should be stateless and immutable.

Type 3 - Stateful property

(aka bean-properties, property objects, Beans 2.0, Eclipse IObservableValue)

Note that stateful properties have been implemented in the bean-properties project, however that project goes well beyond the description below.

This type of property consists of a stateful property on a specific instance of a bean. The property is a fully fledged object, linked to a specific bean, that holds the entire state of the property, including it's value. This approach is based around an alternative to the standard beans specification, however get/set methods can be added if desired.

 public class StatefulProperty<B, V> {
   private B bean;
   private V value;
   /**
    * Constructor initialising the value.
    */
   StatefulProperty<B, V>(B bean, V initialValue) {
     this.bean = bean;
     value = initialValue;
   }
   /**
    * Gets the value of the property from the stored bean.
    */
   public V get() { return value }
   /**
    * Sets the value of the property on the stored bean.
    */
   public void set(V newValue) { value = newValue; }
   /**
    * Gets the name of the property.
    */
   public String propertyName();
 }

This type of property is intended for application developers to use day in day out. However it is not intended to be used in the same way as normal get/set methods. Instead, developers are expected to code a bean and use it as follows:

 // bean implementation
 public class Person {
  public final StatefulProperty<Person, String> surname =
      new StatefulProperty<Person, String>(this, "surname");
  }
 }
 // usage (instead of get/set)
 String s = myPerson.surname.get();
 myPerson.surname.set("Colebourne");

The important point to note is that the surname field on the bean is final and a reference to the stateful property. The actual state of the property is within the stateful property, not directly within the bean.

A stateful property fulfils the same basic API as a bean-attached property (type 2). As such in can be used in the same use cases. However, because a stateful property is an object in its own right, it can have additional state and metadata added as required.

The key difference between bean-attached (type 2) and stateful (type 3) is the location of the bean's state. This impacts further in that bean-attached properties are intended primarily for interaction with frameworks, whereas stateful properties are intended for everyday use.

Language syntax changes

Both bean-independent (type 1) and bean-attached (type 2) properties benefit from language syntax change. This would be used to access the property (not the value) to pass to the framework in a compile-safe, type-safe, refactorable way:

 // bean-independent - accessed via the classname
 BeanIndependentProperty<Person, String> property = Person#surname;
 
 // bean-attached - accessed via the instance
 BeanAttachedProperty<Person, String> property = myPerson#surname;

Stateful properties (type 3) do not need a language change to access the property as they are designed around it, making it directly available and fully safe.

The second area of language change is definition of properties. This is a complicated area, which I won't cover here, where all three types of property would benefit from language change.

The third area of language change is access to the value of a property. Again, there are many possible syntaxes and implications which I won't cover here.

Combination

The full benefit of bean-independent (type 1) and bean attached (type 2) properties occurs when they are combined. Additional methods can be added to each interface to aid the integration, and any language syntax change becomes much more powerful.

Summary

I hope that these definitions will prove useful, and provide a common terminology for the discussion of properties in Java.

Opinions are welcome, and I will make tweaks if necessary!