Welcome to hubFS: THE place for F# Sign in | Join | Help

F# at the ICFP programming contest

Some weeks ago the International Conference on Functional Programming ran its annual programming contest. In this competition teams of any size from all over the Internet compete with each other to solve programming puzzles set by the contest organizers. Two small groups from MSR Cambridge entered on the spur of the moment, one of researchers and one of interns (and yes, the interns pipped the researchers in the final analysis!)

This year's task initially concerned writing an interpreter for a virtual machine. The specification of the virtual machine was very simple and was given as part of the problem definition. The interpreter implementation needed to be very fast as it was used to run a mini operating system which served as the basis for the rest of the contest. Many teams initially wrote their interpreters in their favourite functional or scripting language but found that they had to re-write it in C to gain the required performance.

F# is a high-level and functional language, so it's not surprising that it was excellent for the rapid design and prototyping of the interpreter. However, it also ran with excellent performance, sufficient that it is was not necessary to re-write the code in a "faster" language, and indeed the fully managed and verifiable F# implementation has a speed comparable to C implementations. The C implementations can perform faster pinned memory allocation, use native pointers in the interpreter’s state and avoid indirect and bounds-checked array access. Despite this, the F# implementation is only 10-15% slower than the fastest C implementations. The code for the F# interpreter is attached.

We looked at two performance optimizations using "unsafe" C-like techniques.

  • A bottleneck in a managed CLR implementation is memory access via managed arrays. To interpret one memory access instruction in the interpreter requires two bounds-checked array lookups, whereas an unmanaged implementation can get away with an addition and a pointer dereference. To avoid the array bounds-checking we tried pinning allocated arrays and using unverifiable code to access the memory without bounds checks. This can be done very elegantly in F# simply by writing low-level code that directly uses CLR byte code via inline assembly blocks. Unverifiable byte-code will give the same memory access performance as C code. However the cost of allocating a pinned array on the CLR is so large that this approach halved the performance of the interpreter in the benchmark. So although the memory access performance improved, the memory allocation performance (which is another bottleneck for a managed CLR implementation) declined to such an extent as to produce a net decrease in performance.
  • However, unverifiable memory access to a pinned array can still be used to improve memory access to the "registers" array in the interpreter as this array only needs to be pinned once at start-up. Also aligning elements in the registers array to 8-byte boundaries improves performance again. The code for an "unsafe" version of the F# implementation uses these optimizations through inline assembly block is also attached. It gives a performance gain of about 5%.

The whole issue of performance-impairing bounds checks could be completely and elegantly solved if the Microsoft .NET Framework 2.0 CLR implemented the no.rangecheck instruction prefix as documented in the ECMA specification, and this instruction could be immediately utilized from F# (see below). This prefix tells the JIT compiler to skip the bounds checking on an array access. So regular byte-code instructions for accessing an array can simply be marked with this prefix to eliminate the bounds-checking overhead. Of course the resulting code is still unverifiable but compared to the current approach: the array does not have to be pinned (avoiding bounds checking was the sole reason we needed to pin the arrays above) and array access code does not need to be re-factored to use a different instruction sequence. If this prefix were in the CLR then fast (albeit unsafe) array access could be implemented in F# simply by defining different array brackets, e.g.

let (.[]) (arr: 'a[]) (idx:int) = (# "no.rangecheck ldelem" arr idx : 'a #)

On the topic of memory allocation, the attached F# implementations use one CLR memory allocation for every memory allocation in the interpreted code. A more efficient scheme might be to allocate a single CLR array for the interpreter’s heap and "manage" that heap yourself. This would reduce the memory allocation performance overhead and remove one layer of indirection from memory access.

Finally, the very best C implementations reported so far used runtime code generation.  We believe these results could be replicated very nicely using .NET Reflection.Emit or Lightweight code generation, though haven’t tried that.

Published Tuesday, August 15, 2006 1:41 PM by gneverov
Attachment(s): icfp programming contest.zip

Comments

# re: F# at the ICFP programming contest

Nice article!
I was just curious about the memory-pinning-problem: If you just want to access a pinned memory block - why didn't you allocate it using System.Marshal.AllocHGlobal or AllocCoTaskMem? That way you'd get a pinned block of memory for raw access without disturbing the GC. You could even pack these memory blocks into managed classes with finalizers and actually _use_ the GC for these memory blocks (thanks to GC::AddMemoryPressure).

Niki
Tuesday, August 15, 2006 1:18 PM by nikie

# re: F# at the ICFP programming contest

Good point - we'll try that technique and get back to you
Tuesday, August 15, 2006 4:01 PM by dsyme

# Interesting Finds: August 15, 2006

Tuesday, August 15, 2006 7:01 PM by Jason Haley

# re: F# at the ICFP programming contest

I find it odd that the code under your C implementations link is 112 lines long and the code for your umrun-unsafe.fs is exactly 112 lines long also.  
Wednesday, August 16, 2006 12:59 PM by dday51

# re: F# at the ICFP programming contest

As can be seen from the icfp 2007 organizers report (http://www.cs.uu.nl/research/techreps/repo/CS-2007/2007-029.pdf) record, there was the only one team who used F#. I have registered my entry with preferred lang. F#, so i suppose, that team is myself :)
My entry was about 450-500 LOC, and i have used lists as basic structure to represent DNA. Sadly, but i had just friday evening and saturday to work with task. My program was too slow, but it works and i got the first picture.
That entry was my first F# program > 100 LOC, and i have impressed by one thing: i found just 4 logical errors when debug my code,
i.e. 1 error per 100 LOC, and that metric is stable for all my F# code (for now my largest program is about 2.5 KLOC). For my C++ code i do about 2.5-3 errors per 100 LOC, for python/groovy there is 2-2.5 errors per 100 LOC and for java/perl/LISP code it's about 5. This metric is very stable and doesn't change for years.

P.S. my icfp 2006 entry was written in C++ and was about 250 LOC, but i used this code (thank to boost serialisation) and shell script to paralell brute force passwords on 4 servers instead of code it in UM BASIC :)
Tuesday, October 23, 2007 10:02 PM by ssp

# C# Trivia - What? No Overflow?

C# Trivia - What? No Overflow?
Sunday, January 11, 2009 3:01 PM by CodeThinked
Anonymous comments are disabled