I hope I'm doing something wrong, but I'm seeing some poor performance in comprehensions that's a little unexpected. I'm a long time Scheme and C# programmer, so I feel like I should be happy as a pig in slop in the F# world! My get-started example is the implementation of
Norvig's toy spelling corrector. I figure it's a nice simple example perfect for
seqs and comprehensions. Here is two of the basic routines: edits is given a word, it determines all the words which are one spelling error away from that word. edits2 determines all the words two spelling errors away
let (+..) = Seq.append
let edits (word: string) =
{ for i in {0 .. word.Length-1} -> word.Substring(0,i) + word.Substring(i+1) } +.. (* delete *)
{ for i in {0 .. word.Length-2} -> word.Substring(0,i) + word.Substring(i+1,1) + word.Substring(i,1) + word.Substring(i+2) } +.. (* transpose *)
{ for i in {0 .. word.Length-1} for c in {'a' .. 'z'} -> word.Substring(0,i) + (String.make 1 c) + word.Substring(i+1) } +.. (* alter *)
{ for i in {0 .. word.Length} for c in {'a' .. 'z'} -> word.Substring(0,i) + (String.make 1 c) + word.Substring(i) } (* insert *)
|> Set.of_seq
|> Set.to_seq
let edits2 word = Set.of_seq (Seq.fold Seq.append Seq.empty (edits word |..> edits))
Works like a charm, reads gracefully, fun to program ,etc. It takes about 60 seconds to produce all the edit2 results for "something"... about 140k words. It thought it seemed a bit pokey, though. So I coded up what appears to be a nearly equivalent implementation in C# and it produces the answer in somewhere on the order of half a second. I am not clear on what the big performance drain is.
private static IEnumerable EditsRaw(string word)
{
for(int i=0; i yield return word.Substring(0, i) + word.Substring(i + 1);
for(int i=0; i yield return word.Substring(0, i) + word.Substring(i + 1, 1) + word.Substring(i, 1) + word.Substring(i + 2);
for(int i=0; i for(char c='a'; c yield return word.Substring(0,i) + c.ToString() + word.Substring(i + 1);
for(int i=0; i for(char c='a'; c yield return word.Substring(0, i) + c.ToString() + word.Substring(i);
}
private static IEnumerable Edits(string word)
{
return F.Distinct(EditsRaw(word));
}
private static IEnumerable Edits2Raw(string word)
{
foreach(string edit in Edits(word))
foreach(string edit2 in Edits(edit))
yield return edit2;
}
private static IEnumerable Edits2(string word)
{
return F.Distinct(Edits2Raw(word));
}
static void Main(string[] args)
{
Console.Out.WriteLine(F.Count(Edits2("something")));
}
(The F class contains some generic IEnumerable based idioms I've had in my back pocket the last couple of years. Their implementation looks pretty much exactly like the stuff in the FSharp.Collections library.) I know that earlier versions of F# discussed fast comprehensions on integer ranges, but I don't see much evidence of that in the generated IL. I wonder if there is some subtlety in the implementation that I am missing, because I keep hearing stories about excellent F# performance stats. Thanks for any advice you can give me!