Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Parsing a data field that is seperated by data/time entries

by TASdvlper (Monk)
on Apr 14, 2005 at 17:45 UTC ( [id://447883]=perlquestion: print w/replies, xml ) Need Help??

TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

All,

I am parsing a database, and one of the fields is the description of a problem (basically, a big text field). When new comments are added to the problem, the description is appended with a timestamp of when the updates where made. So, for example, the description field could look like the following.:

[Thu Dec 9 23:54:43 2004, <username>]: description of problems ... ju +st some verbiage ..... [Thu Dec 10 23:54:43 2004, <username>]: just more verbiage ..... [Thu Dec 11 23:54:43 2004, <username>]: just more and more verbiage . +.... [Thu Dec 12 23:54:43 2004, <username>]: just more and more and more v +erbiage ..... [Thu Dec 13 23:54:43 2004, <username>]: just more and more and more a +nd more verbiage .....
Where the very last entry, is the most recent update. What I would like to do is only capture the last entry made, including the date/time stamp. So, after parsing that description field, my output would only return (from Dec 13)
[Thu Dec 13 23:54:43 2004, <username>]: just more and more and more a +nd more verbiage .....
I'm not sure where to start. Any help would be greatly appreciated.

Replies are listed 'Best First'.
Re: Parsing a data field that is seperated by data/time entries
by Roy Johnson (Monsignor) on Apr 14, 2005 at 17:52 UTC
    Looks like you can effectively split it on double-newlines. Is that right? Does this give you what you want?
    my $whatIwant = substr($wholefield, rindex($wholefield, "\n\n")+2);
    rindex...there's an underused function.

    Caution: Contents may have been coded under pressure.
      Actually, I can't assume double lines. i just did it that way to make it easier to read. And sometimes, there are double lines within the description (e.g. if people are writing a couple of paragraphs per update). I'm guessing, I'm gonna have to something like finding the very last time stamp and capturing all the following text.
        You can vary what you look for, based on what you know about the data. If newline followed by an open-square-bracket is reliable enough, you can do
        my $whatIwant = substr($wholefield, rindex($wholefield, "\n[")+1);
        If you need to match the timestamp format to reliably know that you've found a comment boundary, you'll need to use a regex, as frodo72 illustrated in his example 3. Spell out as much as you need to to get a reliable boundary.

        No matter what you choose, it will be possible that one of the comments includes the pattern you look for. That's only a problem if it happens to be the last comment, in which case you'll end up with just the last portion of that comment instead of the whole thing.


        Caution: Contents may have been coded under pressure.
Re: Parsing a data field that is seperated by data/time entries
by polettix (Vicar) on Apr 14, 2005 at 17:59 UTC
    You'll have to Benchmark by yourself.

    1. If all logs are on one line (which I assume true), you can split by newline and take the last element:

    my @lines = split /\n/, $logtext; my $last = $lines[-1];

    2. I doubt that the previous could waste lots of resources in case of big text, in particular it's filling an array which is overkill. So, you could try this (only with Perl 5.8.x):

    open LOGSTRING, \$logtext; my $last; $last = $_ while (<LOGSTRING>); close(LOGSTRING);

    3. A third solution could be using a pattern match bound to the very end of the string, and using non-eager matching for improved efficiency:

    $logtext =~ /^(.*)\Z/m; my $last = $1;
    Couldn't tell you what's the best for very large log texts, anyway.
    Update: of all these... the best is the one from Roy Johnson!

    Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

    Don't fool yourself.
Re: Parsing a data field that is seperated by data/time entries
by graff (Chancellor) on Apr 15, 2005 at 02:57 UTC
    Given that the data source is a big text field from a database table, I wouldn't count on newlines as being a reliable guide for separating the entries. Better to use a capturing split with a regex that will match the date string -- especially if you know what "username" has to look like (e.g. all alphnumerics, between 2 and 8 characters, always starting with a letter, or whatever).
    my $datex = qr/\[ # match open square bracket [FMSTW][a-u]{2} \s # match day of week [ADFJMNOS][a-y]{2} \s+ # match month \d+ \s+ \d+:\d{2}:\d{2} \s \d{4} , \s+ # match date +, time, year \w+ # match username (could be more explicit) \]:/x; # match closing bracket, colon my @text_blocks = split /($datex)/, $textfield; my $initial_junk = shift @text_blocks unless ( $text_blocks[0] =~ /$da +tex/ ); my %entry = ( @text_blocks ); print "DATE=> $_ STRING=> $entry{$_}\n" for (keys %entry);
    (update: simplified the part that matches the time field)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://447883]
Approved by moot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-29 07:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found