Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Reading in data that is

by basicdez (Pilgrim)
on Oct 10, 2001 at 19:22 UTC ( [id://118009]=perlquestion: print w/replies, xml ) Need Help??

basicdez has asked for the wisdom of the Perl Monks concerning the following question:

I have somewhat of a dilemma here. If I read in the following file... named basicdez.dat
"Jane""Doe" "123 W Beverly Ave" "Talahasee""Maine" "222-22-2222" "EOS"
with the following perl logic...
#!/usr/bin/perl use strict; my $i; my $data; open(DATA," basicdez.dat") || die "Cannot open datfile: $!\n"; while(<DATA>){ chomp; unless ($_=~ /"EOS"/){ $data .= "$_"; } elsif ($i ne 1) { $data .="\n"; } if ($_=~ /"EOS"/){ $i=1; } } close(DATA); my @data=split(/\n/, $data); for(@data){ my($first_name,$last_name,$address,$city,$state,$phone)=split; print "$first_name,$last_name\n"; print "$address\n"; print "$city, $state\n"; print "$phone \n"; } exit;
What I would expect would be something as follows...
Jane, Doe 123 W Beverly Ave Talahasee,Maine 222-22-2222
However, what I am getting is the following...
"Jane""Doe","123 W Beverly, Ave" "Talahasee""Maine"
Please help me out on this one as I am most entirely confused. I thank you in advance for any assistance on this issue and apologize for my ignorance in what should be a simple matter. peace dez L

Replies are listed 'Best First'.
Re: Reading in data that is
by projekt21 (Friar) on Oct 10, 2001 at 19:38 UTC

    Without commenting on anything else you wrote, perldoc -f split says:

    If EXPR is omitted, splits the `$_' string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)

    So your line will be split there:

    "Jane""Doe" "123 W Beverly Ave" "Talahasee""Maine" "222-22-2222" ^ ^ ^ ^ ^ ^

    And yes, those quoting signs are regular chars and will be printed. Sort them out.

    alex pleiner <alex@zeitform.de>
    zeitform Internet Dienste

Re: Reading in data that is
by rchiav (Deacon) on Oct 10, 2001 at 19:39 UTC
    There are a few problems here, but the biggest one is your split:
    my($first_name,$last_name,$address,$city,$state,$phone)=split;
    Look at the docs for split and you'll see that if you don't specify a delimiter, it will split on whitespace. That's what's happening in your case. What you're going to want to do is split on a quote character. Actually, you're going to want to split on one or more quote characters. Instead of giving you the exact answer, I think you're going to be better off looking at perlre to see how to specify "one or more double-quotes".

    Hope this helps,
    Rich

    update - you're actually going to have to get a little more tricky if you're going to use split you have instances where there's spaces between the quotes and then others where there's not spaces.

Re: Reading in data that is
by davorg (Chancellor) on Oct 10, 2001 at 19:43 UTC

    Your problem is that split with no arguments will split on whitespace - and your data record contains lots of white space within the fields.

    You probably want a split regualar expression something like this:

    split /"\s*"/;

    But that will still leave you with extra quotes in the first and last fields that you'd have to remove.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you don't talk about Perl club."

Re: Reading in data that is
by lestrrat (Deacon) on Oct 10, 2001 at 19:38 UTC

    split() would not get rid of the quotes and stuff. split() with the first argument omitted splits on whitespaces.

    You might want to look at Text::ParseWords

Re: Reading in data that is
by jryan (Vicar) on Oct 10, 2001 at 19:49 UTC

    Your code has a number of errors in it. Consider the following:

    #!/usr/bin/perl use strict; my $i; my $data; open(DATA," basicdez.dat") || die "Cannot open datfile: $!\n"; # this entire loop is pointless. its almost never a good # idea to read an entire file into a single string, unless # you are going to be spitting it right back out. while(<DATA>){ chomp; unless ($_=~ /"EOS"/){ $data .= "$_"; } elsif ($i ne 1) { $data .="\n"; } if ($_=~ /"EOS"/){ $i=1; } } close(DATA); my @data=split(/\n/, $data); # i find it odd that although you name every element of your # list, you don't name the loop variable. Either do one, or # both. Be consistant! for(@data){ my($first_name,$last_name,$address,$city,$state,$phone)=split; print "$first_name,$last_name\n"; print "$address\n"; print "$city, $state\n"; print "$phone \n"; } # not needed exit;

    Heres how I would have done what you wanted (with explanations)

    #!/usr/bin/perl -w # use warnings! use strict; # suck the entire file into an array, rather than just # putting it into a file. open(DATA," basicdez.dat") || die "Cannot open datfile: $!\n"; my @data = <DATA>; close(DATA); # remove your eof element. If you can help it, take it out of the # dat file completely, so you can remove this line of code. @data = @data[0..@data-1]; # name and scope your loop variable for my $data (@data) { # remove those pesky quotes. If you can help it, take those out # of the dat file too! If i remember correctly, they are # needed in vb; however, Perl isn't a weak language like # vb is. It can handle raw text ;) You should delimit # your text with a single obscure character (the standard # obscure character (the standard for flat dbs is the # pipe | symbol.) # However, given your data as it is now, this will work: $data =~ s/" "/""/g; $data =~ s/""/|/g; $data =~ tr/"//d; # split on the newly created pipe delimited entry. my($first_name,$last_name,$address,$city,$state,$phone) = split /\|/, $data; print "$first_name,$last_name\n"; print "$address\n"; print "$city, $state\n"; print "$phone \n"; }

    There you have it. I hope I was clear enough; if I wasn't, please reply. Good luck!

    Update: fixed tr and split, thanks to projekt21

      Your split won't work correctly: some fields contain whitespace (e.g. address).

      alex pleiner <alex@zeitform.de>
      zeitform Internet Dienste

Re: Reading in data that is
by dragonchild (Archbishop) on Oct 10, 2001 at 19:44 UTC
    I'm rather confused by your logic. I'm also confused by your problem domain.

    You have a file. It will have "EOS" at the last line. (Or, is it between each line? I'll assume the former.) You want to read its data.

    Now, what you're doing is creating a string, delimited in a certain fashion, then splitting on that delimitor immediately into a list. Why not just create a list?

    Also, your logic in your parser is, well, sketchy.

    my @data; while (<DATA>) { chomp; next if /EOS$/; s/"\s?"/\t/g; s/"//g; push @data, [ split /\t/, $_ ]; } foreach (@data) { my %hash; @hash{qw(First Last Addr City State Phone)} = @$_; print "$hash{First}, $hash{Last}\n"; print "$hash{Addr}\n"; print "$hash{City},$hash{State}\n"; print "$hash{Phone}\n"; print "\n"; }
    Basically, read a line. Skip it if we're reading an EOS. (We, essentially, don't care about them as all our data is on one line.) Then, convert the "" or " " into tabs, then remove the extra ", then split on tab and push a listref with our data onto our master data list.

    Then, when we read, we put the data for that line into a hashslice, then access the hash. :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      I'm curious why you used

      s/"//g;

      rather than

      tr/"//d;

      I don't advocate speed optimization at the expense of clarity, but in this case both seem clear to me, and I'd expect tr/// to not use the regex engine and thus be faster.

        *laughs* You know, I've never once used tr in a script that didn't involve cut'n'paste from someone else. TMTOWTDI at its best (worst?)! Yes, tr is faster in this specific instance. No, I'll probably never use it because I very rarely (if ever!) do straight character substitutions. If I'm stripping out characters, I'm doing that as part of a series of substitutions, usually involving s/\s//g and not some specific character(s).

        If you can read either, use the faster one. But, please note that speed of execution is not Perl's strong suit. It's good at that, but speed of development is why most people use Perl. In that vein, s/"//g is just as good.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Reading in data that is
by gbarr (Monk) on Oct 10, 2001 at 20:15 UTC
    You seem to be going to great lengths to read the file line by line, but the recontruct a single string only to split it back up again. Why not keep the lines separate to start with? for example

    while(<DATA>) { chomp; last if /^"EOS"$/; push @data, $_ }

    Also, in your output code you call split with no arguments. That will split $_ on whitespace. But from what you are expecting as output that is not what you want. It would seem that your fields are contained within pairs of "'s so you could just do

    my($first_name,$last_name,$address,$city,$state,$phone) = /"([^"]*)"/g

    Which will pick out each the strings between " pairs

      Thank you for your help. Can you possibly offer me a suggestion as to what I could do if this were the following file.
      "Jane" "Doe" "123 Lover's Lane" 09/22/2001 "Beverly" "CA" "EOS"
      See the 4GL that is creating these packets to be passed back to the interface code is using an export tool that puts "s around character fields and nothing around numberic, decimal or integer fields? I do not mean to be so confused. I am learning a ton and so grateful for any and all help you could offer to me. peace dez l
Modified code...
by Rhose (Priest) on Oct 10, 2001 at 19:53 UTC
    I simplified some of the logic, and threw together this example. I hope it helps.

    #!/usr/bin/perl -w use strict; my @mField; #open(DATA," basicdez.dat") || die "Cannot open datfile: $!\n"; while(<DATA>) { unless (/^\s*\"EOS\"/) { @mField = (split(/\"\s*\"*/)); print $mField[1],', ',$mField[2],"\n"; print $mField[3],"\n"; print $mField[4],', ',$mField[5],"\n"; print $mField[6],"\n"; print "\n"; } } #close(DATA); __DATA__ "Jane""Doe" "123 W Beverly Ave" "Talahasee""Maine" "222-22-2222" "John""Doe" "456 W Beverly Ave" "Seattle""Washington" "333-33-3333" "EOS"
Re: Reading in data that is
by pitbull3000 (Beadle) on Oct 10, 2001 at 19:51 UTC
    Why you donīt just leave the quoting signs out of your .dat file, if it is possible. and put something else in between as a delimeter, something like ":" or a tab... then you could do a split on your delimeter sign, which is used in the .dat file for example you use the ":" as delimeter your split will lokk like this my($first_name,$last_name,$address,$city,$state,$phone)=split(/:/);
Re: Reading in data that is
by nardo (Friar) on Oct 10, 2001 at 19:45 UTC
    As someone has already pointed out, you could try Text::ParseWords but that only ignores delimiters that are inside quotes, you don't have a delimiter between "Jane""Doe" thus I do not believe Text::ParseWords will work for you. You could use:
    my($first_name,$last_name,$address,$city,$state,$phone)=/"(.*?)"/g;
    which will grab everything between quotes, it does not attempt to handle escaped quotes, such an exercise is left up to the reader.
Re: Reading in data that is
by Anonymous Monk on Oct 11, 2001 at 19:32 UTC
    Okay other people seem to haveanswered the actual question. Now let me say what others seem to have refrained from: Why not use Sprite or JSprite (in DBD::Sprite), it seems you are using a flatfile database and that is exactly what these are for... Just a thought.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://118009]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-29 06:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found