Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

sorting a file by a date:"YYYY-MM-DD" field with cmp

by jason.printer (Initiate)
on Jun 16, 2015 at 17:31 UTC ( #1130647=perlquestion: print w/replies, xml ) Need Help??

jason.printer has asked for the wisdom of the Perl Monks concerning the following question:

Hello-- I have solved my problem, but I don't know how. (I have kept it below for ...posterity?) What's going on here? :) When I change my sort function to the following it works as expected, sorting the dates. What is the difference between parentheses absent and parentheses present?

Working code:

sub sortByDate { #get dates. will look like this: date:"2015-02-16", date:"YYYY-MM-DD +", my ($aDate) = $a =~ /date:\"(\d{4}\-\d{2}\-\d{2})"/; my ($bDate) = $b =~ /date:\"(\d{4}\-\d{2}\-\d{2})"/; return ($aDate cmp $bDate); }

Original post:

=========

I am newish to Perl and trying to sort a file but it is behaving unexpectedly. I have a file which has several items in json format (not in date order and with some blank lines), for example:

{ date:"2015-03-01", content:"asdf" } { date:"2015-05-01", content:"erwa" } { date:"2015-01-02", content:"erts" } { date:"2014-04-02", content:"w34r" }

when I run my code intended to sort by date, it seems to sort the file in another way which I don't quite understand. It puts all the blank lines first, and then all the other lines stay in the order they were in the file. Here is my code:

#!/user/bin/perl use strict; use warnings; # open file open (MYFILE, '<jsonfile.json') or print ("Can't open file."); # pull into list my @events = <MYFILE>; # close file close (MYFILE); # organize by date sub sortByDate { #get dates. will look like this: date:"2015-02-16", date:"YYYY-MM-DD +", my $aDate = $a =~ /date:\"(\d{4}\-\d{2}\-\d{2})"/; my $bDate = $b =~ /date:\"(\d{4}\-\d{2}\-\d{2})"/; return ($aDate cmp $bDate); } @events = sort sortByDate @events; &printDates; <code> <p>When run, I get</p> <code> 2015-03-01 2015-05-01 2015-01-02 2014-04-02
Any help would be appreciated. I am new to custom sorts and regex.

Replies are listed 'Best First'.
Re: sorting a file by a date:"YYYY-MM-DD" field with cmp
by davido (Cardinal) on Jun 16, 2015 at 17:44 UTC

    sub sortByDate { #get dates. will look like this: date:"2015-02-16", "YYYY-MM-DD", my $aDate = $a =~ /date:\"(\d{4}\-\d{2}\-\d{2})"/; my $bDate = $b =~ /date:\"(\d{4}\-\d{2}\-\d{2})"/; return ($aDate cmp $bDate); }

    ...should contain...

    my($aDate) = $a =~ /date\"(\d{4}-\d{2}-\d{2})"/; my($bDate) = $b =~ /date\"(\d{4}-\d{2}-\d{2})"/;

    The regexp binding operator returns true/false (1 or 0) in scalar context, and the match contents in list context. The parens around your variable names place the return value of the =~ operator into list context.

    By the way, even for small bits of JSON, I'd be inclined to convert them to a datastructure as the first thing I do, rather than treating the json as text.

    Update: An example of treating the data as JSON:

    use strict; use warnings; use JSON; use Data::Dump; my @data; while (<DATA>) { next unless /\N/; push @data, decode_json($_); } print "Unsorted:\n"; dd \@data; my @sorted = sort { $a->{date} cmp $b->{date} } @data; print "\nSorted:\n"; dd \@sorted; __DATA__ { "date":"2015-03-01", "content":"asdf" } { "date":"2015-05-01", "content":"erwa" } { "date":"2015-01-02", "content":"erts" } { "date":"2014-04-02", "content":"w34r" }

    ...produces the following output:

    Unsorted: [ { content => "asdf", date => "2015-03-01" }, { content => "erwa", date => "2015-05-01" }, { content => "erts", date => "2015-01-02" }, { content => "w34r", date => "2014-04-02" }, ] Sorted: [ { content => "w34r", date => "2014-04-02" }, { content => "erts", date => "2015-01-02" }, { content => "asdf", date => "2015-03-01" }, { content => "erwa", date => "2015-05-01" }, ]

    I did have to fix up your JSON, but I assume the real data you are working with is real JSON, not pseudo-JSON as the post seems to show. There are many reasons why I would prefer an approach that treats the input JSON as JSON rather than as strings, including:

    • Easy detection of malformed input.
    • As the input grows more complex, the solution scales up to encapsulate the complexity; you don't have to invent a new regexp every time the data takes a new turn.
    • It's more convenient to deal with data structures internally. We should favor approaches that convert to the most convenient format as early as possible, and convert away from that most convenient format as late as possible... if we value reduced code complexity and fewer bugs.
    • Someone has already written a JSON parser; you don't need to.

    Dave

Re: sorting a file by a date:"YYYY-MM-DD" field with cmp
by ikegami (Patriarch) on Jun 16, 2015 at 17:45 UTC

    In scalar context, the match operator and thus the bind operator returns whether a match was found or not. You want to evaluate the match in list context to get it to return the list of matches.

    Fix:

    my ($aDate) = $a =~ /\bdate:\s*"(\d{4}-\d{2}-\d{2})"/; my ($bDate) = $b =~ /\bdate:\s*"(\d{4}-\d{2}-\d{2})"/;

    Also:

    • Added \b to avoid catching enddate.
    • Added \s* in case the JSON encoder decides to start adding whitespace.
    • Removed some superfluous backslashes.

    Note: Will break if you get

    • { date:"\u0032015-05-01", content:"erwa" }
    • { foo:{date:"2014-05-01"}, date:"2015-05-01", content:"erwa" }
    • etc

    For a reliable but slower solution, you can use

    use JSON::XS qw( decode_json ); my @sorted = map { substr($_, 10) } sort map { decode_json($_)->{date} . $_ } @events;

    As for the blank lines, just remove them first using

    @events = grep /\S/, @events;

    Update: Added to my answer.

Re: sorting a file by a date:"YYYY-MM-DD" field with cmp
by MidLifeXis (Monsignor) on Jun 16, 2015 at 17:46 UTC

    You beat me to your solution.

    The why is context. Specifically, the original is in scalar context, which will return the size of the set of matched values (1)true or false. So your two values are both 1, and now (iirc) being a stable sort, returns the dates in the original order. The update takes the return value in list context, causing the results of the matches to be assigned to the provided variables.

    Update: test my understanding of the return value from a pattern match, and correct the statement.

    --MidLifeXis

Re: sorting a file by a date:"YYYY-MM-DD" field with cmp
by AnomalousMonk (Bishop) on Jun 16, 2015 at 22:28 UTC
    What's going on here? ... What is the difference between parentheses absent and parentheses present?

    As others have said: context. Please see Tutorials->Context in Perl->Context tutorial right here on the premises.


    Give a man a fish:  <%-(-(-(-<

Re: sorting a file by a date:"YYYY-MM-DD" field with cmp
by GotToBTru (Prior) on Jun 16, 2015 at 17:53 UTC

    Pull out only the date lines using grep.

    @events = sort sortByDate grep { /date:/ } @events;
    Dum Spiro Spero

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1130647]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2022-01-21 16:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (59 votes). Check out past polls.

    Notices?