hubFS: THE place for F#

. . . are you on The Hub?
Welcome to hubFS: THE place for F# Sign in | Join | Help
in Search

Heart of Sharpness (The MSR F# Team's blog at The Hub)

Some notes on Structural Comparison and Hashing

DeeJay has just written an article showing how to implement a binomial heap using F#, adopting a purely functional style.  One of his questions concerns structural comparison in F#. In this article I'll give a brief description of what structural comparison is and how you can define the meaning of structural comparison on your new type definitions.

Firstly, every time you're using operators such as <, >, <=, >=, =, <>, 'compare', 'min' and 'max' in F# code you may be invoking structural comparison.  You may also be using structural comparison and structural hashing when you use the default configurations of F# data structures such as Set, Map, HashSet and HashTable (see also Hashtbl module) or when you instantiate .NET data structures using the values produced by Microsoft.FSharp.Collections.HashIdentity.Structural (new in F# 1.1.11.4).

On ordinary simple types like integers and strings structural comparison does exactly what you expect.  It also works just as you'd expect for .NET types that implement the IComparable interface, e.g. System.DateTime values.  However, they great thng is that you can also use these operators on structured types. For example, you can use them on F# tuple values, where a lexicographic left-to-right comparison is used:

> ("abc","def") < ("abc","xyz");;
val it : bool = true

> compare (10,30) (10,20);;
val it : int = 1

As you can see, the purpose of structural comparison is to compositionally extend the orderings on primitive nodes to give a total ordering to concrete tree or DAG structured data.  For example, you can also use structural comparison with list and array values:

> compare [10;30] [10;20];;
val it : int = 1
> compare [| 10;30 |] [| 10;20 |];;
val it : int = 1
> compare [| 10;20 |] [| 10;30 |];;
val it : int = -1

The F# informal language specification has a section on structural comparison and hashing. In F# 1.1.10 the definition of structural comparison on some other types such as arrays is built into the F# compiler.  In the soon-to-be-releaed F# 1.1.11.x these will be more transparent as this code will be implemented in F# code in the F# library fslib.dll itself. It's important to note tht structural comparison is not implemented using reflection - instead code is auto-generated for each type definition that implements comparison efficiently and where possible uses "fast path" comparison techniques, e.g. the generated code will use primitive IL/native instructions for integer comparisons.  This means that in practice structural comparison is typically very very fast when used with appropriately sized keys.

By default, every new F# type definition is given an implementation of structural comparison.  DeeJay's particular question is whether the semantics of structural comparison can be modified for new type definitions.  The answer is yes!  It's done simply by implementing the interface Systemm.IComparable. Here's an example, taken from the F# library.  Firstly, the type definition includes a promise that the type will implement the the two relevant interfaces:

type BigInt
  = { sign : int; v : BigNatOps.n }
  with
       interface System.IComparable
       interface IStructuralHash
  end

If you like you can define the implementations of these interfaces immediately at the point of the type definition.  However you can also use a module of F# code that implements the functionality using a value-oriented style, which can be very useful for controlling the complexity of an implementation.  Here's a fragment of the BigInt implementation, defining lessThan in terms of operations defined for BigNat:



module BigIntOps = begin 
  module N = BigNatOps  

   ...
  let lessThan x y =
    match x.sign,y.sign with
    |  1, 1 -> N.lessThan x.v y.v                      //  1.xv <=  1.yv iff xv <= yv 
    | -1,-1 -> N.lessThan y.v x.v                      // -1.xv <= -1.yv iff yv <= xv 
    |  1, -1 -> false
    | -1, 1 -> not (N.isZero x.v) || not (N.isZero y.v)
    | _ -> invalid_arg "z signs should be +/- 1"
...
  let compare n nn = if lessThan n nn then -1 else if equal n nn then 0 else 1

end

You then later in the file augment the type definition with the implementations of these interfaces:



type BigInt
  with
    static member ( + )(n1,n2) = RawBigIntOps.add n1 n2
    static member ( * )(n1,n2) = RawBigIntOps.mul n1 n2
    static member ( - )(n1,n2) = RawBigIntOps.sub n1 n2
    static member ( / )(n1,n2) = RawBigIntOps.div n1 n2
    static member ( ~- )(n1)  = RawBigIntOps.neg n1
    static member ( ~+ )(n1:BigInt) = n1
    override x.ToString() = RawBigIntOps.to_string x
    override x.GetHashCode() = RawBigIntOps.hash(x)
    interface System.IComparable with
       member x.CompareTo(y:obj) = RawBigIntOps.compare x (y :?> BigInt)
    end
    interface IStructuralHash with
       member x.GetStructuralHashCode(_) = RawBigIntOps.hash(x) 
    end

  end

Some notes:

  • The above sample also shows the syntax to override the overloaded operators +,- etc. and the method ToString and GetHashCode.
  • It also shows how to implement the interface IStructuralHash.  Structural hashing is very simlar to structural comparison, but with the additional twist that hashing can be depth-limited, because you typically want to avoid hashing all of a large term (this is why we don't simply override GetHashCode). I'll save the details for another blog entry.  The notes in the language specification also give more details on this.
  • A cast is currently necessary when implementing IComparable.  In a future version of F# you will only need to implement IComparable<T>.

Some general additional notes on structural comparison and hashing: 

  • Performance: Structural comparison and hashing are incredibly useful and are also efficient over small key terms.  However, there are naturally performance issues when using structural comparison over large structured terms.  In high-performance situations you should be careful to use small keys, and where necessary define your own comparison and hashing functions.
  • Non-tree structured data: You should not use structural comparison and hashing on recursive data structures (including ones using pointers) or data that you mutate.
  • NaN: when used immediately on floating point numbers, the comparison and equality operators "<" etc. implement IEEE NaN semantics, i.e. returning "false" if either value is a NaN (this is correctly implemented as of version 1.1.11.x)

Please feel free to ask for clarifications on any points regarding the specification of structural comparison and hashing

Cheers!

Don

Addendum:

DeeJay asks some great questions:

So because of this... is there any reason at all to be able to pass in custom comparators? If there is any gain, is it worth the hassle of more complex implementation over just declaring your comparator when you declare your type?

...

Comparing to Haskell with it's type classes... .

Comparing to SML and OCaml with it's functors...

With F# I'm very interested in both the theoretical and pragmatic concerns, and ultimately there are some tradeoffs here.

My advice for accessible, reusable data structures is "sensible defaults, ultimate control".  That is, provide default constructors based on structural versions of compare/hash, and then provide additional constructors that permit explicit configuration of compare/hash. This is the style followed by the Microsoft.FSharp.Collections.HashTable implementation and the other F# collections (sets, maps, hash sets) will go the same way over time.  It is also roughly the style followed by the .NET collections, with the exception that the default comparison used by the .NET collections is not exactly F# structural comparison, but an approximation of it - e.g. it won't structurally compare arrays, though will structurally compare tuples and all other F# data types.

Re functors and type classes:

  • I know this gives slightly weaker guarantees than using functors.  This is because instances of the data structure that use different compare/hash are not different types.  However it does give the reusability and flexibility of functors, while making the data structures easier for beginners to use.
  • I know this makes the data structures slightly longer to write than when using type classes (since you have to pass the compare/hash explicitly).  However, while I love type classes and am considering them for F#, they can lead to problems with flexibility - if you want to use a different comparison function you're a bit stuck. 

So I advocate the tasteful application of the "sensible defaults, ultimate control" approach to give a good balance: a simple way to get going with using the data structure (beginners can just start using the data structure on simple types without problems), and it gives experienced users the flexibility they need (e.g. case-invariant comparison on strings is often required). 

 

Published Thursday, May 11, 2006 2:57 PM by dsyme

Comments

 

DeeJay said:

This is fanastic,

type mypoint = {x:int ; y:int}
with
 interface System.IComparable with
   member p1.CompareTo(p2:obj) = compare p1.y (p2 :?> mypoint).y
 end
end;;

compare {x=3; y=2} {x=2; y=3};;
=> -1

{x=3; y=2} < {x=2; y=3};;
=> true

without the custom implementation of IComparable... those results would be +1 and false

So because of this... is there any reason at all to be able to pass in custom comparators?
If there is any gain, is it worth the hassle of more complex implementation over just declaring your comparator when you declare your type?

And in the case where you may want to change the comparision of an existing type... can you not just wrap the type? or inherit from it...

although before I get too carried away here's an example i'm not sure quite how to solve yet.
if i want to do a custom comparator for int's. Say compare them when they are rounded down to the nearest 10...

i can't do
type myint = int
with
 interface ....

as you can't augment a type abbreviation
nor can i do
type myint =
class
 inherit int ....

as int (Int32) is a sealed type
so maybe in this case i really only can wrap the type or delegate to it in a class.

Comparing to Haskell with it's type classes... it doesn't seem like you'd be much better off in this case, as you can't declare instances of a type class with type abbreviations so you'd have to wrap it in a newtype declaration.

Comparing to SML and OCaml with it's functors... this is where the functors win big. Parameterising code (structures) seems to be an incredibly powerful tool for writing datastructures.
May 11, 2006 9:34 AM
 

dsyme said:

I've responded to DeeJay's excellent comments in the addendum to the article above.
May 11, 2006 12:24 PM
Anonymous comments are disabled

This Blog

Post Calendar

<May 2006>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Syndication

Powered by Community Server, by Telligent Systems