hubFS: THE place for F#

. . . are you on The Hub?
Welcome to hubFS: THE place for F# Sign in | Join | Help
in Search

Data structure for autocomplete textbox

Last post 06-15-2007, 8:48 by Matt. 4 replies.
Sort Posts: Previous Next
  •  06-14-2007, 19:49 3262

    Data structure for autocomplete textbox

    Hello.

    I need to implement one of those fancy autocomplete text boxes that Google Suggest has made popular. The ajax stuff is pretty trivial but more interesting is the server-side datastructure. The data is a managable ( < 10k) number of Western (i.e. US-ASCII chars) person names. My feeling is that something along the lines of a Patricia Tree will suffice as I can fit the data in memory. Prefix lookups need to be superfast. Everything else can be dog slow (i'll rebuild the structure once per day from an Oracle instance)

    I've zero experience with F# but a working familiarity with Ocaml. My gut says that an F# implementation will be clearer, more concise, possible more performant (?) than a C# version. So my questions are:
    1. Is doing this in F# a good idea?
    2. Do I need to worry about significant changes to GC patterns on the server?
    3. Is using an F# data structure from C# painful?
    4. Anyone got an implementation they'd like to share?
    Thanks,
    Matt.
  •  06-14-2007, 22:43 3263 in reply to 3262

    Re: Data structure for autocomplete textbox

    Hi Matt,

    Interesting questions.

    1. Is doing this in F# a good idea?

    My gut reaction says no, put the data in a relational database slap some indexes on the apporiate columns and let the db take the strain. However if you really think a hand code solution would be better F# is generally good at tree mainipulations and that sort of thing.

    2. Is doing this in F# a good idea?

    Wouldn't have thought so. I suppose you may see a few more allocations and therefore objects that require collecting if you stick to a very pure functional apporach and use lots of immutable data structures, but the GC in .NET is self tuning and you should see that it handles this quite efficently. The preformance counters in .NET are very good so you can easily profile your app and using perfmon to see if there is too much time spent in GC and adjust the app accordingly. But I wouldn't think you'd need to. 

    3. Is using an F# data structure from C# painful?

    It can be but doesn't have to be. The answer in a nut shell is, union types and tuples look a bit strange form C# records and basic types like strings look fine. I cover this in detail "Foundations of F#".

    4. Anyone got an implementation they'd like to share?

    No sorry.

    Cheers,
    Rob


    Robert Pickering, MVP
    http://strangelights.com
  •  06-15-2007, 6:44 3264 in reply to 3263

    Re: Data structure for autocomplete textbox

    Rob,

    The running the query in the database isn't really an option because of the usage model of auto-complete boxes. I'm going to need response times under 1/5th of a second for the textbox to seem "live" to the end user (excluding whatver the overhead of going over the wire is). Even if the database could query that quickly (and I don't think it can..read on for why), the overhead of a remote call with query hashing/parsing and marshalling the return results takes a fair bit of my time window.

    I'll be firing a request for prefix matches ever 1/3 or a second or so so there will be a high volume of queries in short bursts. Additionally, I'm doing prefix matches (i.e. typing "M" will suggest "Matt", "Mike", "Mark", etc.), so my database indices won't be used (I'm not aware of a database that supports non-deterministic function indices - I'd be surprised if it were possible). Although, as I write this, it occurs to me that I should look at Oracle's full text indexing capabilities.

    Thanks for the response, Rob. By the way, both of the APress F# books are on my Amazon wish list

    Regards,
    Matt.

  •  06-15-2007, 7:43 3265 in reply to 3264

    Re: Data structure for autocomplete textbox

    I'm pretty sure indices are taken into account when query char and varchar colomns using the like clause. Crucially your only ever searching form the begining of the string which is the way the index is order so the index can be helpful. The "example 1" shown on this page seems to agree with what I'm saying: http://www.mssqltips.com/tip.asp?tip1236

    However your right that all those queries to the database would not be a good thing. I might still be temped to query the database for the first or two characters then go on to sort though the returned list using F# for the rest. However this does run the risk of the same set of data being dragged into memory many times, say if lots of people start searching for a name that begins with M. You could use some sort of results caching mechanism, maybe using the caching block in enterprise library (http://www.microsoft.com/downloads/details.aspx?familyid=5a14e870-406b-4f2a-b723-97ba84ae80b5&displaylang=en), but that can quickly become complex engough so that it is of no benefit. So one big readonly data structure might be the way to go, though I think above a certain size of data structure it would probably become more practical to use a database.

    It's an interesting problem so it would be interesting to code it up a few different ways and see which one worked best.

    Cheers,
    Rob


    Robert Pickering, MVP
    http://strangelights.com
  •  06-15-2007, 8:48 3266 in reply to 3265

    Re: Data structure for autocomplete textbox

    Rob,

    I apologize - it appears that I simplified my problem statement a bit too much. Your statement about index usage is correct but for several reasons (and by the way, this is Oracle), indices cannot be structured so as to avoid a full scan in all cases, even when taking multiple function indices into consideration. Suffice it to say that I need to match on strings from several columns and the characters are not always at the beginning of the string.

    If I'm going to be caching on my app server and the volume of data is sufficiently small, I figured I might as well load the whole dataset into memory and rebuild it once in  a while in the background. Additionally, the javascript in the browser uses an LRU cache so that pressing backspace doesn't trigger another search request.

    Regards,
    Matt.

View as RSS news feed in XML
Powered by Community Server, by Telligent Systems