Thursday, April 18, 2013

Pattern-Aware Formatting

During the last year I had the pleasure to implement a formatter for a programming language. You know, this little but incredibly useful thing that arranges indentation, line-wraps and whitespace in your code nicely. If you know me, you may have guessed that the programming languages in question is Xtend. Version 2.4.0 is the first version to ship with this formatter.

In case you haven't heard of Xtend, it is a language that compiles down to human-readable Java code and integrates seamlessly with any Java project.

We like to advertise Xtend as a language that is more concise and expressive compared to Java. And indeed it is, due to powerful type inference, higher-order functions, and less syntactical noise. Furthermore, Xtend supports concepts that provide a great degree of syntactical freedom to the developer. Among them are extension methods, operator overloading, and an implicit variable called "it".

This conciseness, expressiveness and syntactical freedom gives a great deal of power to the developer. This is a power a developer can use to increase his/her efficiency and at the same time write code in exactly that way that he/she considers the most readable to a human. It makes Xtend a perfect match for internal DSLs.

This has interesting consequences for formatting:

  • People who care about the readability of their code do have a strong opinion about formatting. This is easy to explain: Unformatted code just isn't readable.
  • But the strong opinion is about more than just having the code formatted. It's about having the code formatted in exactly the way they consider the most readable. Suddenly the formatting strategy doesn't depend on which syntactical element is being formatted, but on how a syntactical element is being used. This is an interesting thing I had to learn, as it is completely different from how the Eclipse Java formatter does its job.

The question is, how can a formatter decide how a syntactical element is being used? The amount of APIs, internal DSLs and programming styles converges to infinite, so there is no way of considering all of them. However, there are a small number of recurring formatting patterns.

Example 1: The If-Expression

This is probably how you would express it in Java: Imperative programming style and formatted as one line per statement:

var String z
if(variable == 1)
  z = "it is one"
else
  z = "it differs from one"

In Xtend, however, the if-statement is actually an if-expression, i.e. it has a result value. Java has an If-Expression as well, it is the ternary operator: (variable == 1) ? "then" : "else". Instead of assigning values inside the bodies of the if-statement, we can directly assign the if-expression. Besides being more concise, this has the tremendous advantage that the variable can be immutable and will never exist in an uninitialized state. Here is the snippet, we choose to keep the formatting style from the example above:

val y = if(variable == 1)
    "it is one"
  else
    "it differs from one"

We could also want to write it in a single line, just like you would do it with the ternary operator in Java:

val x = if(variable == 1) "it is one" else "it differs from one"

What we see here are two different formatting patterns that are alternative to each other. An if-expression can be formatted single-line as well as multi-line. You might argue that it should be formatted multi-line when it is used as a statement and single-line when it is used as an expression. While this might sound like a smart strategy at the beginning and it will probably work for most scenarios, it will frustrate the developer for the remaining cases. Frustration is something we should prevent by all costs. Therefore, the Xtend formatter leaves the choice to the developer: It recognizes which formatting pattern the developer had in mind. The decision strategy is as follows:

  1. If the then- or the else-body contain at least one line-wrap, format the if-expression multi-line.
  2. If the length of the full if-expression exceeds the maximum length of a line, format it multi line.
  3. Apply single-line formatting in all other cases.

This empowers the developer to choose for every individual if-expression, if he/she prefers the sing-line or multi-line style.

Example 2: Lambda-Expressions

Thanks to lambda-expressions you can choose a functional programming style when you work with Xtend. You can think of lambda-expressions as executable snippets of code that can be passed around in parameters and variables.

The following example calculates the total size of all files within the same directory. You can recognize the lambda-expressions by their square brackets "[" "]".

val file = new java.io.File(".")
val totalLength = file.listFiles.filter[isFile].map[length].reduce[x, y|x + y]
println("The size of all files in " + file + " is " + totalLength + " bytes.")

  • filter[isFile] removes all items from the list which are not files, such as directories.
  • map[length] converts a list of files into a list of integers, each representing the file size (length).
  • reduce[x, y|x + y] applies x + y to all list items until all are summed up to a single value.

I think there is some beauty in this code since you can understand it just by reading it from the left to the right. Furthermore, it is much more concise than anything you can do with loops in Java.

For this code to be readable, surely we want the lambda-expressions to be formatted without line-wraps, similar to parameters of a method call.

Another use case for lambdas is to provide handlers for events and asynchronous communication. The following example creates an instance of java.lang.Runnable that prints to stdout when run() is called.

val Runnable runnable = [ |
  println("Hello from " + Thread::currentThread.name)
]

Here, a multi-line style of the lambda-expression may be preferable if the implementation of the handler follows imperative programming style. If the lambda-expression does nothing but delegate to another method, a single-line style may be preferable.

A third use case for lambda-expression are named parameter values. In this example the assignments to name and priority are compiled to thread.setName(...) and thread.setPriority(...).

val thread = new Thread(runnable) => [
  name = "my thread"
  priority = Thread::MIN_PRIORITY
]

Here as well, a multi-line style is usually preferable.

Similar to if-expressions, there is no automated way for the formatter to decide for single-line or multi-line formatting. Therefore, the decision strategy is as follows:

  1. If there is a line-wrap before the closing bracket "]", apply multi-line style.
  2. If the the length of the closure exceeds the maximum line length, apply multi-line style.
  3. Apply single-line style in any other cases.

I decided that the "magic line-wrap" that triggers multi-line formatting of a lambda-expression should only be the line-wrap before the closing bracket "]" to make it easier to convert a multi-line lambda back to its single-line style. In this scenario, you'll only need to remove one single line-wrap and re-run the formatter instead of removing multiple line-wraps.

Other Examples You Can Try

As of Xtend 2.4.0, the following formatting patterns are supported besides the ones listed above:

  • single-line/multi-line style for method parameter declarations, triggered by the line-wrap before the closing parenthesis ")".
  • single-line/multi-line style for method call parameters, triggered by the line-wrap before the closing parenthesis ")".
  • switch-exprssions can be formatted single-line or multi line.
  • case-blocks inside switch-expressions can be formatted single-line or multi-line.

No comments: