Json Jerk: a flexible JSON parser

9 12 2011

I just pushed Json Jerk to Github. Json Jerk is a flexible and fast JSON parser written in Java. It consists of several composable parts for tokenizing, (un)escaping, parsing, and handling of semantic actions. Furthermore it provides a light weight and type safe object model for JSON documents.

Details and examples are in the readme and in the test cases.

Update:I renamed the parser from Flex Json to Json Jerk due to a name clash with an existing project.





Union types

12 06 2011

In his recent blog post Miles Sabin came up with an ingenious way of expressing union types in Scala. A union type is the union of some types: its values are the union of the values of each of the individual types.

In a nutshell he first defines the negation of types as

type ¬[A] = A => Nothing

and then the union of two types via De Morgan’s law

type ∨[T, U] = ¬[¬[T] with ¬[U]]

With the following auxiliary constructs

type ¬¬[A] = ¬[¬[A]]
type |∨|[T, U] = { type λ[X] = ¬¬[X] <:< (T ∨ U) }

union types can be used in a very intuitive way

def size[T: (Int |∨| String)#λ](t: T) = t match {
    case i: Int => i
    case s: String => s.length
}

scala> size(3)
res0: Int = 3

scala> size("three")
res1: Int = 5

scala> size(4.2)
:13: error: Cannot prove that ((Double) => Nothing) => Nothing >: Nothing with (java.lang.String) => Nothing) => Nothing.
       size(4.2)
           ^

… and beyond

With type negation and disjunction from above, it becomes possible to express all types whose set of values can be expressed by a term in propositional calculus. But can we do better? That is, is it possible to express types which don’t have a corresponding term in propositional calculus?

Generalizing the type constructor ∨[T, U] to some arbitrary acceptor

type Acceptor[T, U] = { type λ[X] = ... }

it becomes apparent, that all types for which there is a corresponding type level acceptor function are expressible. Since type level calculations in Scala are Turing complete, it should be possible to find an acceptor for any recursive function. This means that – in theory at least – Scala’s type system is powerful enough to express any type whose set of values is recursive.





Regular expression matching in <100 lines of code

6 12 2010

The recent discussion about the Yacc is dead paper on Lambda the Ultimate sparked my interest in regular expression derivatives. The original idea goes back to the paper Derivatives of Regular Expressions published in 1964 (!) by Janusz A. Brzozowski. For a more modern treatment of the topic see Regular-expression derivatives reexamined.

The derivative of a set of strings with respect to a character, is the set of strings which results from removing the first character from all the strings in the set which start with that character. Let for example S = \{foo, bar, baz\} then \partial_b S = \{ar, az\} . It turns out that regular languages are closed under derivatives. That is, any derivative of a regular language is again a regular language. Furthermore, it is possible to extend the notion of derivatives to regular expression such that given a regular expression r which generates the language \mathcal{L}(r) and a character c , one can derive a regular expression \partial_c r such that \mathcal{L}(\partial_c r) = \partial_c(\mathcal{L}(r)).

This is a key ingredient for a very elegant regular expression matching algorithm: to match a string against a regular expression repeatedly calculate the derivative of the regular expression for each characters in the string. When no character is left, check whether the last derivative accepts the empty string. If so we have a match and otherwise not.

The exact algorithm for finding whether a regular expression is nullable (i.e. accepts the empty string) is given in Regular-expression derivatives reexamined as is the algorithm for calculating derivatives of regular expressions. Below is a direct implementation of that algorithm in Scala (with a slight modification to allow for strings instead of individual characters).

trait RegExp {
  def nullable: Boolean
  def derive(c: Char): RegExp
}

case object Empty extends RegExp {
  def nullable = false
  def derive(c: Char) = Empty
}

case object Eps extends RegExp {
  def nullable = true
  def derive(c: Char) = Empty
}

case class Str(s: String) extends RegExp {
  def nullable = s.isEmpty
  def derive(c: Char) =
    if (s.isEmpty || s.head != c) Empty
    else Str(s.tail)
}

case class Cat(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable && s.nullable
  def derive(c: Char) =
    if (r.nullable) Or(Cat(r.derive(c), s), s.derive(c)) 
    else Cat(r.derive(c), s)
}

case class Star(r: RegExp) extends RegExp {
  def nullable = true
  def derive(c: Char) = Cat(r.derive(c), this)
}

case class Or(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable || s.nullable
  def derive(c: Char) = Or(r.derive(c), s.derive(c))
}

case class And(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable && s.nullable
  def derive(c: Char) = And(r.derive(c), s.derive(c))
}

case class Not(r: RegExp) extends RegExp {
  def nullable = !r.nullable
  def derive(c: Char) = Not(r.derive(c))
}

Having these constructors we need a way to match strings against regular expressions.

object Matcher {
  def matches(r: RegExp, s: String): Boolean = {
    if (s.isEmpty) r.nullable
    else matches(r.derive(s.head), s.tail)
  }
}

Here are some pimps to make usage of the regular expression constructors more convenient.

object Pimps {
  implicit def string2RegExp(s: String) = Str(s)

  implicit def regExpOps(r: RegExp) = new {
    def | (s: RegExp) = Or(r, s)
    def & (s: RegExp) = And(r, s)
    def % = Star(r)
    def %(n: Int) = rep(r, n)
    def ? = Or(Eps, r)
    def ! = Not(r)
    def ++ (s: RegExp) = Cat(r, s)
    def ~ (s: String) = Matcher.matches(r, s)
  }

  implicit def stringOps(s: String) = new {
    def | (r: RegExp) = Or(s, r)
    def | (r: String) = Or(s, r)
    def & (r: RegExp) = And(s, r)
    def & (r: String) = And(s, r)
    def % = Star(s)
    def % (n: Int) = rep(Str(s), n)
    def ? = Or(Eps, s)
    def ! = Not(s)
    def ++ (r: RegExp) = Cat(s, r)
    def ++ (r: String) = Cat(s, r)
    def ~ (t: String) = Matcher.matches(s, t)
  }

  def rep(r: RegExp, n: Int): RegExp =
    if (n <= 0) Star(r)
    else Cat(r, rep(r, n - 1))
}

And finally here is how to use it:

object Test {
  import Pimps._

  val digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
  val int = ("+" | "-").? ++ digit.%(1)
  val real = ("+" | "-").? ++ digit.%(1) ++ ("." ++ digit.%(1)).? ++ (("e" | "E") ++ ("+" | "-").? ++ digit.%(1)).?

  def main(args: Array[String]) {
    val ints = List("0", "-4534", "+049", "99")
    val reals = List("0.9", "-12.8", "+91.0", "9e12", "+9.21E-12", "-512E+01")
    val errs = List("", "-", "+", "+-1", "-+2", "2-")

    ints.foreach(s => assert(int ~ s))
    reals.foreach(s => assert(!(int ~ s)))
    errs.foreach(s => assert(!(int ~ s)))

    ints.foreach(s => assert(real ~ s))
    reals.foreach(s => assert(real ~ s))
    errs.foreach(s => assert(!(real ~ s)))
}

Now that’s 48 + 6 + 32 = 86 lines of code for a regular expression matching library!





Generic array factory in Java: receipt for disaster

4 11 2010

Let’s implement a generic factory method for arrays in Java like this:

static <T> T[] createArray(T... t) {
    return t;
}

We can use this method to create any array. For example an array of strings:

String[] strings = createArray("some", "thing");

Now let’s add another twist:

static <T> T[] crash(T t) {
    return createArray(t);
}

String[] outch = crash("crash", "me"); 

Running this code will result in a ClassCastException on the last line:

Exception in thread "main" java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String;

At first this seems strange. There is no cast anywhere here. So what is going on? Basically the Java compiler is lying at us: calling the crash method with string arguments, it tells us that we get back an array of strings. Now looking at the exception we see that this is not true. What we really get back is an array of objects!

Actually the Java compiler issues a warning on the createArray call in the crash method:

Type safety : A generic array of T is created for a varargs parameter

This is how it tells us about its lying: “Since I don’t know the actual type of T, I’ll just return an array of Object instead.” I thinks this is wrong. And others seem to think along the same lines.





So Scala is too complex?

24 08 2010

There is currently lots of talk about Scala being to complex. Instead of more arguing I implemented the same bit of functionality in Scala and in Java and let everyone decide for themselves.

There is some nice example code in the manual to the The Scala 2.8 Collections API which partitions a list of persons into two lists of minors and majors. Below are the fleshed out implementations in Scala and Java.

First Scala:

object ScalaMain {
  case class Person(name: String, age: Int)
    
  val persons = List(
    Person("Boris", 40),
    Person("Betty", 32),
    Person("Bambi", 17))

  val (minors, majors) = persons.partition(_.age <= 18) 
   
  def main(args: Array[String]) = {
    println (minors.mkString(", "))
    println (majors.mkString(", "))
  }
}

And now Java:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

class Person {
    private final String name;
    private final int age;

    public Person(String name, int age) {
        super();
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    @Override
    public boolean equals(Object other) {
        if (this == other) {
            return true;
        }
        else if (other instanceof Person) {
            Person p = (Person) other;
            return name == null ? p.name == null : name.equals(p.name)
                    && age == p.age;

        }
        else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        int h = name == null ? 0 : name.hashCode();
        return 39*h + age;
    }

    @Override
    public String toString() {
        return new StringBuilder("Person(")
            .append(name).append(",")
            .append(age).append(")").toString();
    }
}

public class JavaMain {

    private final static List<Person> persons = Arrays.asList(
        new Person("Boris", 40),
        new Person("Betty", 32),
        new Person("Bamby", 17));

    private static List<Person> minors = new ArrayList<Person>();
    private static List<Person> majors = new ArrayList<Person>();

    public static void main(String[] args) {
        partition(persons, minors, majors);
        System.out.println(mkString(minors, ","));
        System.out.println(mkString(majors, ","));
    }

    private static void partition(List<? extends Person> persons,
            List<? super Person> minors, List<? super Person> majors) {

        for (Person p : persons) {
            if (p.getAge() <= 18) minors.add(p);
            else majors.add(p);
        }
    }

    private static <T> String mkString(List<T> list, String separator) {
        StringBuilder s = new StringBuilder();
        Iterator<T> it = list.iterator();
        if (it.hasNext()) {
            s.append(it.next());
        }
        while (it.hasNext()) {
            s.append(separator).append(it.next());
        }
        return s.toString();
    }

}

Impressive huh? And the Java version is not even entirely correct since its equals() method might not cope correctly with super classes of Person.





Type Level Programming: Equality

18 06 2010

Apocalisp has a great series on Type Level Programming with Scala. At some point the question came up whether it is possible to determine equality of types at run time by having the compiler generate types representing true and false respectively. Here is what I came up with.

trait True { type t = True }
trait False { type t = False }

case class Equality[A] {
  def check(x: A)(implicit t: True) = t
  def check[B](x: B)(implicit f: False) = f
}
object Equality {
  def witness[T] = null.asInstanceOf[T]
  implicit val t: True = null
  implicit val f: False = null
}

// Usage:
import Equality._
    
val test1 = Equality[List[Boolean]] check witness[List[Boolean]]
implicitly[test1.t =:= True]
// Does not compile since tt is True
// implicitly[test1.t =:= False]  

val test2 = Equality[Nothing] check witness[AnyRef]
// Does not compile since ft is False
// implicitly[test2.t =:= True]  
implicitly[test2.t =:= False]

Admittedly this is very hacky. For the time being I don’t see how to further clean this up. Anyone?





Working around type erasure ambiguities (Scala)

14 06 2010

In my previous post I showed a workaround for the type erasure ambiguity problem in Java. The solution uses vararg parameters for disambiguation. As Paul Phillips points out in his comment, this solution doesn’t directly port over to Scala. Java uses Array to pass varargs, Scala uses Seq. Unlike Array, Seq is not reified so Seq[String] and Seq[Int] again erase to the same type putting us back to square one.

However, there is another way to add disambiguation parameters to the methods: implicits! Here is how:

implicit val x: Int = 0
def foo(a: List[Int])(implicit ignore: Int) { }
  
implicit val y = ""
def foo(a: List[String])(implicit ignore: String) { }

foo(1::2::Nil)
foo("a"::"b"::Nil)







Follow

Get every new post delivered to your Inbox.