Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: parse a file TXT similar to XML

by kcott (Archbishop)
on Mar 31, 2022 at 05:42 UTC ( [id://11142561]=note: print w/replies, xml ) Need Help??


in reply to RESOLVED - parse a file TXT similar to XML

G'day x-lours,

You seem very impressed with 'parse the file in "oneline"'. It is not impressive at all. It's exceptionally difficult to read, a maintenance nightmare, and extremely error-prone. I strongly recommend you avoid code like this.

You said you wanted to use the result in 'the rest of the script'; for that, you'd want to put the code in a subroutine; possibly in a module for reuse in other scripts. Below, I present a technique for getting the exact result you want: it's a standalone solution which you can adapt for a subroutine or module; it is very straightforward code that's easy to read and maintain; it has a basic sanity check and reports I/O errors.

Here's pm_11142528_parse_file.pl:

#!/usr/bin/env perl use strict; use warnings; use autodie; die "Usage: $0 input_file\n" unless @ARGV; my $in_file = $ARGV[0]; my $result = []; my $block_start = 'objet => debut'; my $block_end = 'objet => fin', my $rec_skip = '...'; { open my $fh, '<', $in_file; while (<$fh>) { chomp; next if $_ eq $rec_skip or $_ eq $block_end; if ($_ eq $block_start) { push @$result, {}; next; } my ($key, $value) = split /\s*=>\s*/; $value =~ s/^"?(.*?)"?$/$1/; $result->[-1]{$key} = $value; } } use Data::Dump; dd $result;

Sanity check:

$ ./pm_11142528_parse_file.pl Usage: ./pm_11142528_parse_file.pl input_file

I/O exception handling:

$ ./pm_11142528_parse_file.pl not_a_file Can't open 'not_a_file' for reading: 'No such file or directory' at ./ +pm_11142528_parse_file.pl line 17

Here's the input data you provided:

$ cat pm_11142528_parse_file.txt objet => debut index => 1 a => "premiere valeur" ... z => "dernier mot" objet => fin ... objet => debut index => 77 a => "autre valeur" ... z => "aurai-je le dernier mot ?" objet => fin

A sample run with expected results:

$ ./pm_11142528_parse_file.pl pm_11142528_parse_file.txt [ { a => "premiere valeur", index => 1, z => "dernier mot" }, { a => "autre valeur", index => 77, z => "aurai-je le dernier mot ?" + }, ]
'could you help me to proove him Perl is as efficient as Ruby ? (even if it is not a "oneline")'

It is a common misconception that it is somehow more efficient to write single lines of code that are hundreds of characters long. Writing code without whitespace is also not more efficient; itjustreducesthereadabilityofthecode.

Use Benchmark to measure the efficiency of your Perl code. I imagine Ruby has something similar which you could use for a comparison (but I've no idea what that might be).

— Ken

Replies are listed 'Best First'.
Re^2: parse a file TXT similar to XML
by eyepopslikeamosquito (Archbishop) on Mar 31, 2022 at 07:02 UTC

    You seem very impressed with 'parse the file in "oneline"'. It is not impressive at all. It's exceptionally difficult to read, a maintenance nightmare, and extremely error-prone. I strongly recommend you avoid code like this.

    kudos to kcott!

    From On Coding Standards and Code Reviews, three sobering guidelines to keep in mind before trying to outdo an office colleague in a clever one-liner contest:

    • Correctness, simplicity and clarity come first. Avoid unnecessary cleverness.
    • Favour readability over brevity.
    • Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.

      Thanks for advice. I will keep this in mind ;-)
Re^2: parse a file TXT similar to XML
by x-lours (Sexton) on Mar 31, 2022 at 11:01 UTC

    thank you for the code and the advice.

    you show me that ".." was useless and i appreciate ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11142561]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-16 04:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found