Union types

12 06 2011

In his recent blog post Miles Sabin came up with an ingenious way of expressing union types in Scala. A union type is the union of some types: its values are the union of the values of each of the individual types.

In a nutshell he first defines the negation of types as

type ¬[A] = A => Nothing

and then the union of two types via De Morgan’s law

type ∨[T, U] = ¬[¬[T] with ¬[U]]

With the following auxiliary constructs

type ¬¬[A] = ¬[¬[A]]
type |∨|[T, U] = { type λ[X] = ¬¬[X] <:< (T ∨ U) }

union types can be used in a very intuitive way

def size[T: (Int |∨| String)#λ](t: T) = t match {
    case i: Int => i
    case s: String => s.length
}

scala> size(3)
res0: Int = 3

scala> size("three")
res1: Int = 5

scala> size(4.2)
:13: error: Cannot prove that ((Double) => Nothing) => Nothing >: Nothing with (java.lang.String) => Nothing) => Nothing.
       size(4.2)
           ^

… and beyond

With type negation and disjunction from above, it becomes possible to express all types whose set of values can be expressed by a term in propositional calculus. But can we do better? That is, is it possible to express types which don’t have a corresponding term in propositional calculus?

Generalizing the type constructor ∨[T, U] to some arbitrary acceptor

type Acceptor[T, U] = { type λ[X] = ... }

it becomes apparent, that all types for which there is a corresponding type level acceptor function are expressible. Since type level calculations in Scala are Turing complete, it should be possible to find an acceptor for any recursive function. This means that – in theory at least – Scala’s type system is powerful enough to express any type whose set of values is recursive.

Advertisements




Regular expression matching in <100 lines of code

6 12 2010

The recent discussion about the Yacc is dead paper on Lambda the Ultimate sparked my interest in regular expression derivatives. The original idea goes back to the paper Derivatives of Regular Expressions published in 1964 (!) by Janusz A. Brzozowski. For a more modern treatment of the topic see Regular-expression derivatives reexamined.

The derivative of a set of strings with respect to a character, is the set of strings which results from removing the first character from all the strings in the set which start with that character. Let for example S = \{foo, bar, baz\} then \partial_b S = \{ar, az\} . It turns out that regular languages are closed under derivatives. That is, any derivative of a regular language is again a regular language. Furthermore, it is possible to extend the notion of derivatives to regular expression such that given a regular expression r which generates the language \mathcal{L}(r) and a character c , one can derive a regular expression \partial_c r such that \mathcal{L}(\partial_c r) = \partial_c(\mathcal{L}(r)).

This is a key ingredient for a very elegant regular expression matching algorithm: to match a string against a regular expression repeatedly calculate the derivative of the regular expression for each characters in the string. When no character is left, check whether the last derivative accepts the empty string. If so we have a match and otherwise not.

The exact algorithm for finding whether a regular expression is nullable (i.e. accepts the empty string) is given in Regular-expression derivatives reexamined as is the algorithm for calculating derivatives of regular expressions. Below is a direct implementation of that algorithm in Scala (with a slight modification to allow for strings instead of individual characters).

trait RegExp {
  def nullable: Boolean
  def derive(c: Char): RegExp
}

case object Empty extends RegExp {
  def nullable = false
  def derive(c: Char) = Empty
}

case object Eps extends RegExp {
  def nullable = true
  def derive(c: Char) = Empty
}

case class Str(s: String) extends RegExp {
  def nullable = s.isEmpty
  def derive(c: Char) =
    if (s.isEmpty || s.head != c) Empty
    else Str(s.tail)
}

case class Cat(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable && s.nullable
  def derive(c: Char) =
    if (r.nullable) Or(Cat(r.derive(c), s), s.derive(c)) 
    else Cat(r.derive(c), s)
}

case class Star(r: RegExp) extends RegExp {
  def nullable = true
  def derive(c: Char) = Cat(r.derive(c), this)
}

case class Or(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable || s.nullable
  def derive(c: Char) = Or(r.derive(c), s.derive(c))
}

case class And(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable && s.nullable
  def derive(c: Char) = And(r.derive(c), s.derive(c))
}

case class Not(r: RegExp) extends RegExp {
  def nullable = !r.nullable
  def derive(c: Char) = Not(r.derive(c))
}

Having these constructors we need a way to match strings against regular expressions.

object Matcher {
  def matches(r: RegExp, s: String): Boolean = {
    if (s.isEmpty) r.nullable
    else matches(r.derive(s.head), s.tail)
  }
}

Here are some pimps to make usage of the regular expression constructors more convenient.

object Pimps {
  implicit def string2RegExp(s: String) = Str(s)

  implicit def regExpOps(r: RegExp) = new {
    def | (s: RegExp) = Or(r, s)
    def & (s: RegExp) = And(r, s)
    def % = Star(r)
    def %(n: Int) = rep(r, n)
    def ? = Or(Eps, r)
    def ! = Not(r)
    def ++ (s: RegExp) = Cat(r, s)
    def ~ (s: String) = Matcher.matches(r, s)
  }

  implicit def stringOps(s: String) = new {
    def | (r: RegExp) = Or(s, r)
    def | (r: String) = Or(s, r)
    def & (r: RegExp) = And(s, r)
    def & (r: String) = And(s, r)
    def % = Star(s)
    def % (n: Int) = rep(Str(s), n)
    def ? = Or(Eps, s)
    def ! = Not(s)
    def ++ (r: RegExp) = Cat(s, r)
    def ++ (r: String) = Cat(s, r)
    def ~ (t: String) = Matcher.matches(s, t)
  }

  def rep(r: RegExp, n: Int): RegExp =
    if (n <= 0) Star(r)
    else Cat(r, rep(r, n - 1))
}

And finally here is how to use it:

object Test {
  import Pimps._

  val digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
  val int = ("+" | "-").? ++ digit.%(1)
  val real = ("+" | "-").? ++ digit.%(1) ++ ("." ++ digit.%(1)).? ++ (("e" | "E") ++ ("+" | "-").? ++ digit.%(1)).?

  def main(args: Array[String]) {
    val ints = List("0", "-4534", "+049", "99")
    val reals = List("0.9", "-12.8", "+91.0", "9e12", "+9.21E-12", "-512E+01")
    val errs = List("", "-", "+", "+-1", "-+2", "2-")

    ints.foreach(s => assert(int ~ s))
    reals.foreach(s => assert(!(int ~ s)))
    errs.foreach(s => assert(!(int ~ s)))

    ints.foreach(s => assert(real ~ s))
    reals.foreach(s => assert(real ~ s))
    errs.foreach(s => assert(!(real ~ s)))
}

Now that’s 48 + 6 + 32 = 86 lines of code for a regular expression matching library!





So Scala is too complex?

24 08 2010

There is currently lots of talk about Scala being to complex. Instead of more arguing I implemented the same bit of functionality in Scala and in Java and let everyone decide for themselves.

There is some nice example code in the manual to the The Scala 2.8 Collections API which partitions a list of persons into two lists of minors and majors. Below are the fleshed out implementations in Scala and Java.

First Scala:

object ScalaMain {
  case class Person(name: String, age: Int)
    
  val persons = List(
    Person("Boris", 40),
    Person("Betty", 32),
    Person("Bambi", 17))

  val (minors, majors) = persons.partition(_.age <= 18) 
   
  def main(args: Array[String]) = {
    println (minors.mkString(", "))
    println (majors.mkString(", "))
  }
}

And now Java:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

class Person {
    private final String name;
    private final int age;

    public Person(String name, int age) {
        super();
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    @Override
    public boolean equals(Object other) {
        if (this == other) {
            return true;
        }
        else if (other instanceof Person) {
            Person p = (Person) other;
            return name == null ? p.name == null : name.equals(p.name)
                    && age == p.age;

        }
        else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        int h = name == null ? 0 : name.hashCode();
        return 39*h + age;
    }

    @Override
    public String toString() {
        return new StringBuilder("Person(")
            .append(name).append(",")
            .append(age).append(")").toString();
    }
}

public class JavaMain {

    private final static List<Person> persons = Arrays.asList(
        new Person("Boris", 40),
        new Person("Betty", 32),
        new Person("Bamby", 17));

    private static List<Person> minors = new ArrayList<Person>();
    private static List<Person> majors = new ArrayList<Person>();

    public static void main(String[] args) {
        partition(persons, minors, majors);
        System.out.println(mkString(minors, ","));
        System.out.println(mkString(majors, ","));
    }

    private static void partition(List<? extends Person> persons,
            List<? super Person> minors, List<? super Person> majors) {

        for (Person p : persons) {
            if (p.getAge() <= 18) minors.add(p);
            else majors.add(p);
        }
    }

    private static <T> String mkString(List<T> list, String separator) {
        StringBuilder s = new StringBuilder();
        Iterator<T> it = list.iterator();
        if (it.hasNext()) {
            s.append(it.next());
        }
        while (it.hasNext()) {
            s.append(separator).append(it.next());
        }
        return s.toString();
    }

}

Impressive huh? And the Java version is not even entirely correct since its equals() method might not cope correctly with super classes of Person.





Type Level Programming: Equality

18 06 2010

Apocalisp has a great series on Type Level Programming with Scala. At some point the question came up whether it is possible to determine equality of types at run time by having the compiler generate types representing true and false respectively. Here is what I came up with.

trait True { type t = True }
trait False { type t = False }

case class Equality[A] {
  def check(x: A)(implicit t: True) = t
  def check[B](x: B)(implicit f: False) = f
}
object Equality {
  def witness[T] = null.asInstanceOf[T]
  implicit val t: True = null
  implicit val f: False = null
}

// Usage:
import Equality._
    
val test1 = Equality[List[Boolean]] check witness[List[Boolean]]
implicitly[test1.t =:= True]
// Does not compile since tt is True
// implicitly[test1.t =:= False]  

val test2 = Equality[Nothing] check witness[AnyRef]
// Does not compile since ft is False
// implicitly[test2.t =:= True]  
implicitly[test2.t =:= False]

Admittedly this is very hacky. For the time being I don’t see how to further clean this up. Anyone?





Working around type erasure ambiguities (Scala)

14 06 2010

In my previous post I showed a workaround for the type erasure ambiguity problem in Java. The solution uses vararg parameters for disambiguation. As Paul Phillips points out in his comment, this solution doesn’t directly port over to Scala. Java uses Array to pass varargs, Scala uses Seq. Unlike Array, Seq is not reified so Seq[String] and Seq[Int] again erase to the same type putting us back to square one.

However, there is another way to add disambiguation parameters to the methods: implicits! Here is how:

implicit val x: Int = 0
def foo(a: List[Int])(implicit ignore: Int) { }
  
implicit val y = ""
def foo(a: List[String])(implicit ignore: String) { }

foo(1::2::Nil)
foo("a"::"b"::Nil)




Working around type erasure ambiguities

30 05 2010

In an earlier post I already showed how to work around ambiguous method overloads resulting from type erasure. In a nut shell the following code wont compile since both overloaded methods foo erase to the same type.

Scala:

def foo(ints: List[Int]) {}
def foo(strings: List[String]) {}

Java:

void foo(List<Integer> ints) {}
void foo(List<String> strings) {}

It turns out that there is a simple though somewhat hacky way to work around this limitations: in order to make the ambiguity go away, we need to change the signature of foo in such a way that 1) the erasure of the foo methods are different and 2) the call site is not affected.

Here is a solution for Java:

void foo(List<Integer> ints, Integer... ignore) {}
void foo(List<String> strings, String... ignore) {}

We can now call foo passing either a list of ints or a list of strings without ambiguity:

foo(new ArrayList<Integer>());
foo(new ArrayList<String>());

This doesn’t directly port over to Scala (why?). However, there is a similar hack for Scala. I leave this as a puzzle for a couple of days before I post my solution.





[ANN] Talking at Scala Days 2010 in Lausanne next Thursday

11 04 2010

I’ll be talking at Scala Days 2010 in Lausanne on April 15th about the Scala scripting engine for Apache Sling. While my talk at Jazoon 09 was mainly about using Scala from Sling, this session will be more focused on internals of the Scala scripting engine.

Unfortunately (or fortunately depending on the point of view) the conference is sold out already. Watch my Scala for scripting page for the session slides and other upcoming support material.