File::Sip a perl module to read huge text files with limited memory

6 thoughts on “File::Sip a perl module to read huge text files with limited memory”

  1. Nice idea, another module used to read a part of a file in the memory is Tie::File, it would be interesting to compare these 2 modules.

  2. What’s the benefit of File::Sip over regular use of the diamond operator, particularly if you just want to iterate over it line by line? Ie

    open($fh, ‘<', '02packages.details.txt');
    while (my $line = ) {


    Using File::Sip is significantly slower, and doesn’t appear to use less memory than the above. Or am I missing something (entirely likely :-)

  3. @Neil: the advantage over the diamond operator is that you can access directly the line of the file you want, without parsing them all everytime. If you need to access multiple times, in random order, many lines of a zillions-lines file, then I can assure you this approach is better than while (< $fh>) { }

    File::Sip is slower if you access the lines only once (because it needs one complete run to build its index), but if you need to access many lines of the file, many times, at specific points, then, iterating over the file handle over and over to get where you want will be slower.

  4. At first I was wondering why you were remembering the first character of each line, that seemed like a really narrow usecase (“Give me the first character of line 238189”?!) – then I realized that I misunderstood what “builds an index of each line’s first character, accessible with the corresponding line number” meant; you index the position of the first character of each line, of course, so it is easy to jump to an arbitrary line.

    Chalk that one up to me not being a native English speaker/reader.

    I am wondering if there would be any benefit in building up the index lazily, so if it turns out you only need to jump around in the first couple of terabytes of a hundred terabyte file, you save reading the rest.

    Probably also a narrow usecase, though :-)

Comments are closed.