Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Help to slurp records - $/ maybe?

by Limbic~Region (Chancellor)
on May 02, 2003 at 13:29 UTC ( [id://254980]=perlquestion: print w/replies, xml ) Need Help??

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All:
I have already thought of, and have working code for, an iterative solution for my problem. I am just wondering if there is a way to slurp my records in if for no other reason than to learn a neat new trick.

My data file looks something like the following:

Field 1: abc Field 2: asdfasdf Field 2: asdfasdfase Field 2: aaa Field 3: ss Field 1: def Field 2: abc123 Field 3: blah Field 1: asdfa
Where the record is from Field 1 through Field 3 and there can be varying amounts of Field 2 in between. I know that $/ works off of a string and not a RE, so I am not sure this is possible. tye showed me a neat way to match a single character on a line by itself $/ = "\nB\n";. In this case though, I am looking to terminate on the newline between the end of line in Field 3 and the start of line in Field 1, without getting any of the actual data in Field 3 or 1.

Anyone know of a way to do this or should I stick with my iterative solution that uses a flag to see when I am starting a new record?

Thanks in advance - L~R

Replies are listed 'Best First'.
Re: Help to slurp records - $/ maybe?
by Joost (Canon) on May 02, 2003 at 13:38 UTC
    $/='Field 1:  ' should work, but it doesn't do what you want: $/ is the end-of-line string, which means that with your data the strings will contain:
    Update: first line was wrong:
    "Field 1:", "abc Field 2: asdfasdf Field 2: asdfasdfase Field 2: aaa Field 3: ss Field 1: ", "def Field 2: abc123 Field 3: blah Field 1: ", "asdfa"
    You probably want
    while(<FILE>) { next unless /^Field 1/; # ... }
    To skip everything but Field 1 lines
    -- Joost downtime n. The period during which a system is error-free and immune from user input.
      Joost,
      I can see how this would work - if you know what the data is you are losing, once you slurp it in you just prepend it back on before processing it.

      I am not sure I want to go this route, but I will keep it in my tool box. I am always looking for neat ways to do things.

      Cheers - L~R

        Well, you're not losing any data (with the $/ trick, that is). The value of the $/ variable is also put at the end of the read 'line'.

        If you want to lose that data, use chomp(), it will remove your current line-terminator.

        The only actual problem is matching the first line.

        -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: Help to slurp records - $/ maybe?
by Aristotle (Chancellor) on May 02, 2003 at 14:38 UTC
    I'd stick with the conventional approach here. All else is bound to be a dirty/bad hack. You don't like redo? Fine, you can use a nested loop.
    local $_; do { my $rec = ''; do { $rec .= $_ = <>; } until /^Field 3/ or !defined; # ... } while defined;

    Makeshifts last the longest.

Re: Help to slurp records - $/ maybe?
by JaWi (Hermit) on May 02, 2003 at 13:52 UTC
    You could use the following (coarse, not fully tested, yada yada yada) approach:
    #!/usr/bin/perl -w use strict; use warnings; my $data; { local $/ = undef; $data = <DATA>; } print "Record: [$1]\n" while ( $data =~ /(Field 1:.*?Field 3:[^\n]+)/sg ); __DATA__ Field 1: abc Field 2: asdasdasdf Field 2: asdsaads Field 2: asdf Field 3: asfssadfsad Field 1: abc Field 2: asdf Field 3: asfssadfsad Field 1: abc
    Which outputs:
    Record: [Field 1: abc Field 2: asdasdasdf Field 2: asdsaads Field 2: asdf Field 3: asfssadfsad] Record: [Field 1: abc Field 2: asdf Field 3: asfssadfsad] Record: [Field 1: abc Field 2: asdasdasdf2 Field 2: asdf3 Field 3: asfssadfsad]

    HTH,

    -- JaWi

    "A chicken is an egg's way of producing more eggs."

      JaWi,
      Thanks - I already know that trick. The problem is it isn't scalable. The larger the file is that you are slurping the more memory you need to have. I am processing logs that are in the vincinity of a gigabyte and have millions of records.

      Cheers - L~R

        In that case I would go for the "flag option" which isn't such memory consuming as slurping in the whole file.
        Interesting problem though...

        HTH,

        -- JaWi

        "A chicken is an egg's way of producing more eggs."

Re: Help to slurp records - $/ maybe?
by hmerrill (Friar) on May 02, 2003 at 13:38 UTC
    IMHO, simpler is better - stick with the iterative approach as it will be much easier for you and anyone else who looks at the code to understand and maintain it 6 months or a year from now when it needs to be changed.

    What you don't want to do is create some obfuscated code that takes someone a day to understand an another day to fix. Make it as simple as possible for *any* Perl person to understand.

    HTH.
      hmerrill,
      IMHO, it is much simpler to have $/ = ""; than it is to have labels, redo statements, and variable flags.

      Cheers - L~R

        If $/ = ""; works, great. But it won't for the problem as you stated.

        What happens when someone needs to insert a 'Field 2b' in the format six months down the road? You would have to fully rewrite a grok-at-once gimmick, but you'd only have to add the support for a new field type if it were a sensible loop.

        --
        [ e d @ h a l l e y . c c ]

        Debatable - I do conceed that the iterative approach means more code, keeping track of flags, etc. - it's not pretty. But I'd have to see a finished slurp example setting $/ = "" before judging that to be the better option. In all likelihood the slurp example may indeed come out on top, but its ease of understanding and maintainability would be greatly enhanced by a generous comment spelling out exactly what the thing does. Whatever you decide on as the final winner, post in this thread.
Re: Help to slurp records - $/ maybe? (or flip-flop)
by broquaint (Abbot) on May 02, 2003 at 15:25 UTC
    Sounds like a great use of the flip-flop operator
    my($i, @data) = 0; until(eof FILE) { $data[$i] .= $_ while ($_ = <FILE>) and /^Field 1/ .. /^Field 3/; $i++; }
    See. perlop for more info.
    HTH

    _________
    broquaint

Re: Help to slurp records - $/ maybe?
by jgallagher (Pilgrim) on May 02, 2003 at 14:55 UTC
    This is quite ugly, but I have to go to work and this is just the first thing that popped into my head. It uses the local $/ like you described.
    #!/usr/bin/perl -w use strict; local $/ = "\nField 1: "; my $data = <DATA>; $data =~ s/Field 1: $//; chop($data); print "[$data]\n"; while (<DATA>) { s/Field 1: $//; $_ = "Field 1: $_"; chop; print "[$_]\n"; } __DATA__ Field 1: abc Field 2: asdfasdf Field 2: asdfasdfase Field 2: aaa Field 3: ss Field 1: def Field 2: abc123 Field 3: blah Field 1: asdfa
    Output is:
    [Field 1: abc Field 2: asdfasdf Field 2: asdfasdfase Field 2: aaa Field 3: ss] [Field 1: def Field 2: abc123 Field 3: blah] [Field 1: asdf]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://254980]
Approved by valdez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-24 01:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found