DeeJay has just written an article showing how to implement a binomial heap using F#, adopting a purely functional style. One of his questions concerns structural comparison in F#. In this article I'll give a brief description of what structural comparison is and how you can define the meaning of structural comparison on your new type definitions.
Firstly, every time you're using operators such as <, >, <=, >=, =, <>, 'compare', 'min' and 'max' in F# code you may be invoking structural comparison. You may also be using structural comparison and structural hashing when you use the default configurations of F# data structures such as Set, Map, HashSet and HashTable (see also Hashtbl module) or when you instantiate .NET data structures using the values produced by Microsoft.FSharp.Collections.HashIdentity.Structural (new in F# 1.1.11.4).
On ordinary simple types like integers and strings structural comparison does exactly what you expect. It also works just as you'd expect for .NET types that implement the IComparable interface, e.g. System.DateTime values. However, they great thng is that you can also use these operators on structured types. For example, you can use them on F# tuple values, where a lexicographic left-to-right comparison is used:
> ("abc","def") < ("abc","xyz");;
val it : bool = true
> compare (10,30) (10,20);;
val it : int = 1
As you can see, the purpose of structural comparison is to compositionally extend the orderings on primitive nodes to give a total ordering to concrete tree or DAG structured data. For example, you can also use structural comparison with list and array values:
> compare [10;30] [10;20];;
val it : int = 1
> compare [| 10;30 |] [| 10;20 |];;
val it : int = 1
> compare [| 10;20 |] [| 10;30 |];;
val it : int = -1
The F# informal language specification has a section on structural comparison and hashing. In F# 1.1.10 the definition of structural comparison on some other types such as arrays is built into the F# compiler. In the soon-to-be-releaed F# 1.1.11.x these will be more transparent as this code will be implemented in F# code in the F# library fslib.dll itself. It's important to note tht structural comparison is not implemented using reflection - instead code is auto-generated for each type definition that implements comparison efficiently and where possible uses "fast path" comparison techniques, e.g. the generated code will use primitive IL/native instructions for integer comparisons. This means that in practice structural comparison is typically very very fast when used with appropriately sized keys.
By default, every new F# type definition is given an implementation of structural comparison. DeeJay's particular question is whether the semantics of structural comparison can be modified for new type definitions. The answer is yes! It's done simply by implementing the interface Systemm.IComparable. Here's an example, taken from the F# library. Firstly, the type definition includes a promise that the type will implement the the two relevant interfaces:
type BigInt
= { sign : int; v : BigNatOps.n }
with
interface System.IComparable
interface IStructuralHash
end
If you like you can define the implementations of these interfaces immediately at the point of the type definition. However you can also use a module of F# code that implements the functionality using a value-oriented style, which can be very useful for controlling the complexity of an implementation. Here's a fragment of the BigInt implementation, defining lessThan in terms of operations defined for BigNat:
module BigIntOps = begin
module N = BigNatOps
...
let lessThan x y =
match x.sign,y.sign with
| 1, 1 -> N.lessThan x.v y.v // 1.xv <= 1.yv iff xv <= yv
| -1,-1 -> N.lessThan y.v x.v // -1.xv <= -1.yv iff yv <= xv
| 1, -1 -> false
| -1, 1 -> not (N.isZero x.v) || not (N.isZero y.v)
| _ -> invalid_arg "z signs should be +/- 1"
...
let compare n nn = if lessThan n nn then -1 else if equal n nn then 0 else 1
end
You then later in the file augment the type definition with the implementations of these interfaces:
type BigInt
with
static member ( + )(n1,n2) = RawBigIntOps.add n1 n2
static member ( * )(n1,n2) = RawBigIntOps.mul n1 n2
static member ( - )(n1,n2) = RawBigIntOps.sub n1 n2
static member ( / )(n1,n2) = RawBigIntOps.div n1 n2
static member ( ~- )(n1) = RawBigIntOps.neg n1
static member ( ~+ )(n1:BigInt) = n1
override x.ToString() = RawBigIntOps.to_string x
override x.GetHashCode() = RawBigIntOps.hash(x)
interface System.IComparable with
member x.CompareTo(y:obj) = RawBigIntOps.compare x (y :?> BigInt)
end
interface IStructuralHash with
member x.GetStructuralHashCode(_) = RawBigIntOps.hash(x)
end
end
Some notes:
- The above sample also shows the syntax to override the overloaded operators +,- etc. and the method ToString and GetHashCode.
- It also shows how to implement the interface IStructuralHash. Structural hashing is very simlar to structural comparison, but with the additional twist that hashing can be depth-limited, because you typically want to avoid hashing all of a large term (this is why we don't simply override GetHashCode). I'll save the details for another blog entry. The notes in the language specification also give more details on this.
- A cast is currently necessary when implementing IComparable. In a future version of F# you will only need to implement IComparable<T>.
Some general additional notes on structural comparison and hashing:
- Performance: Structural comparison and hashing are incredibly useful and are also efficient over small key terms. However, there are naturally performance issues when using structural comparison over large structured terms. In high-performance situations you should be careful to use small keys, and where necessary define your own comparison and hashing functions.
- Non-tree structured data: You should not use structural comparison and hashing on recursive data structures (including ones using pointers) or data that you mutate.
- NaN: when used immediately on floating point numbers, the comparison and equality operators "<" etc. implement IEEE NaN semantics, i.e. returning "false" if either value is a NaN (this is correctly implemented as of version 1.1.11.x)
Please feel free to ask for clarifications on any points regarding the specification of structural comparison and hashing
Cheers!
Don
Addendum:
DeeJay asks some great questions:
So because of this... is there any reason at all to be able to pass in custom comparators? If there is any gain, is it worth the hassle of more complex implementation over just declaring your comparator when you declare your type?
...
Comparing to Haskell with it's type classes... .
Comparing to SML and OCaml with it's functors...
With F# I'm very interested in both the theoretical and pragmatic concerns, and ultimately there are some tradeoffs here.
My advice for accessible, reusable data structures is "sensible defaults, ultimate control". That is, provide default constructors based on structural versions of compare/hash, and then provide additional constructors that permit explicit configuration of compare/hash. This is the style followed by the Microsoft.FSharp.Collections.HashTable implementation and the other F# collections (sets, maps, hash sets) will go the same way over time. It is also roughly the style followed by the .NET collections, with the exception that the default comparison used by the .NET collections is not exactly F# structural comparison, but an approximation of it - e.g. it won't structurally compare arrays, though will structurally compare tuples and all other F# data types.
Re functors and type classes:
-
I know this gives slightly weaker guarantees than using functors. This is because instances of the data structure that use different compare/hash are not different types. However it does give the reusability and flexibility of functors, while making the data structures easier for beginners to use.
-
I know this makes the data structures slightly longer to write than when using type classes (since you have to pass the compare/hash explicitly). However, while I love type classes and am considering them for F#, they can lead to problems with flexibility - if you want to use a different comparison function you're a bit stuck.
So I advocate the tasteful application of the "sensible defaults, ultimate control" approach to give a good balance: a simple way to get going with using the data structure (beginners can just start using the data structure on simple types without problems), and it gives experienced users the flexibility they need (e.g. case-invariant comparison on strings is often required).