hubFS: THE place for F#

. . . are you on The Hub?
Welcome to hubFS: THE place for F# Sign in | Join | Help
in Search

Parsing an ascii file

Last post 07-09-2008, 7:54 by Stephan. 2 replies.
Sort Posts: Previous Next
  •  07-09-2008, 5:56 6337

    Parsing an ascii file

    I am trying to load an ascii data file where data are either strings or floats. Strings are always formatted as ~foo~ and floats can have no symbol around them. Fields are seperated by the ^ character. So an example is : ~203~^~g~^~PROCNT~^~Protein~^~2~^~600~

    I have used the following to extract easily each field in each line, but it seems a little slow. SO I was wondering if there were better/simpler ways.


    #light

    open System
    open System.IO
    open System.Collections.Generic

    let reader filename =
      seq
        { use reader = new StreamReader(File.OpenRead(filename))
          while not reader.EndOfStream do
            //replace twice so that ^^^^ gets properly replaced by ^0.0^0.0^0.0^, so that String splitaccounts for these missing values
            let line = reader.ReadLine().Replace("^^", "^0.0^").Replace("^^", "^0.0^")
            yield if line.[line.Length - 1] = '^' then line + "0.0" else line
        }

    let extract (s : string) =
      if s.Length >= 2 && s.[0] = '~' then s.[1 .. s.Length - 2]  else s

    let to_float (s : string) =
      if s.Length > 0 then float s else 0.0 //so that ~~ is indeed transformed to 0.0

    let line_to_strings line =
      String.split ['^'] line
      |> List.map (fun xs -> xs |> extract)
      |> Array.of_list //then for each line one can use line.[# of field]

    let path = "C:\\"

    let file f = Filename.concat path f

    let _ =
       reader (file "NUTR_DEF.txt")
      |> Seq.map line_to_strings 
      |> Seq.iter (fun x -> print_endline (any_to_string (Array.length x, x))

    Thanks for your help

  •  07-09-2008, 7:13 6338 in reply to 6337

    Re: Parsing an ascii file

    Your code seems a little too complicated for me, for what you're trying to do.


    /// reader that directly generates a sequence of floats and strings
    let reader filename =
      seq
        { use reader = new StreamReader(File.OpenRead(filename))
          while not reader.EndOfStream do
            let next = reader.nextByte()
            if numeric next then
              let num = read_till_^or~ reader
              yield (float_of_string num)
            elif next = '^' or next = '~' then
              // skip
            else
              let string = read_till_^or~ reader
              yeld string
        }


    Note: I did not define the functions numeric, or read_till_^or~, but I hope you get the point ;)
    Also I'm sure there's a couple of typing errors in there... (but you can easily return all strings or box the values)
  •  07-09-2008, 7:54 6339 in reply to 6338

    Re: Parsing an ascii file

    You could use a regular expression like in the sample below. Whether that is easier to understand than a simple imperative nested loop is debatable...

    #light

    open System
    open System.IO
    open System.Text.RegularExpressions
    type Value = VString of string
               | VFloat of float

    let regex = new Regex("^(?:(?:\^|(?<=^))([^^\r\n]*))*$", RegexOptions.Multiline)

    let reader filename =
        let isBeginOfFloat c = (c >= '0' && c <= '9') || c = '.' || c = '-'
        
        use reader = new StreamReader(File.OpenRead(filename))
        let str = reader.ReadToEnd()   
        let matches = regex.Matches(str)
               
        [| for m in matches do
            -> [| for c in m.Groups.[1].Captures do
                      -> let i, n = c.Index, c.Length
                         if n = 0 then VFloat 0.0
                         else
                             let i, n = if str.[ i] = '~' then i + 1, n - 2 else i, n
                             if n = 0 then VFloat 0.0
                             elif isBeginOfFloat str.[ i] then
                                 VFloat (Float.of_string (str.Substring(i, n)))
                             else
                                 VString (str.Substring(i, n))
               |]
        |]

View as RSS news feed in XML
Powered by Community Server, by Telligent Systems