Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

converting txt to csv and formatting

by csorrentini (Acolyte)
on Jul 05, 2014 at 23:29 UTC ( [id://1092407]=perlquestion: print w/replies, xml ) Need Help??

csorrentini has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am trying to convert a txt file to csv which isn't really too difficult, however I'm not too sure how I would go about it with a text file that isn't pre formatted to having all of a clients info on one line, second client on second line, etc. This is how the txt file currently reads:

Jacobs, April 750.467.9582 quam.quis@sedhendrerit.org Mays, Martena 870.348.1974 sollicitudin@nonummyFusce.org McNeil, Brennan 289.527.6151 lobortis@nisl.com Sexton, Melvin 599.927.5337 in.felis@varius.com Blackburn, Prescott 661.589.1228 sed@egetlaoreetposuere.edu

I need to basically format the info in a csv file to read as a table with headers of Name, Phone and email. Each different client on a new row.

This is what I currently have as far as reading in the text file and writing the csv file. My problem is I have no idea how I would go about grabbing data into specific fields with the way the txt file is currently formatted.

use strict; use warnings; my $infile="clients.txt"; open my $in, "<", $infile or die $!; open my $out, ">" , "clients.csv" or die $!; my $client = <$in>; chomp($client);

Any help greatly appreciated.

Replies are listed 'Best First'.
Re: converting txt to csv and formatting
by AppleFritter (Vicar) on Jul 06, 2014 at 00:11 UTC

    You're guaranteed to have three lines of information per client entry, right? You could do this (not the prettiest, perhaps):

    while(my $client = <>) { chomp $client; chomp(my $phone = <>); chomp(my $email = <>); ... }

    For writing your CSV file, I'd suggest using a module from CPAN (e.g. Text::CSV) rather than rolling your own.

      Thanks for the quick reply, Yes guaranteed three lines of information always.

      As far as writing to the csv this is a class assignment and the instructor said "You really don’t need any external modules (like the CSV module we used in an earlier script)." so I don't believe he wants us to use it for some strange reason. I input your piece of coding into the script, now without using Text::CSV is it just a simple print to the csv file?

        Generating valid CSV records is more trivial than parsing CSV records. In this case, all of your data are character strings, not numbers or timestamps, so it's appropriate to quote all three fields all the time.

        "Jacobs, April","750.467.9582","quam.quis@sedhendrerit.org" "Mays, Martena","870.348.1974","sollicitudin@nonummyFusce.org" "McNeil, Brennan","289.527.6151","lobortis@nisl.com" "Sexton, Melvin ""The Copymeister""","599.927.5337","in.felis@vari +us.com" "Blackburn, Prescott","661.589.1228","sed@egetlaoreetposuere.edu"

        The most important thing to anticipate when generating CSV records in which every field is quoted is the possibility of the presence of the quote character in the data. The most common convention nowadays for escaping literal occurrences of the quote character is to use the same character as an escape character ("").

        for (@client_values) { s/(?=")/"/g; s/^/"/; s/$/"/; } my $client_record = join ',', @client_values;

        Don't print the record piecemeal, one field at a time. Doing this is a worst practice, IMHO. Instead, generate a valid, whole CSV record and then print it.

        print "$client_record\n";

        If I might paraphrase someone else, there are no guarantees except that there are no guarantees when it comes to data formats. Just remember that the RS232 standard isn't. :)

        You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
Re: converting txt to csv and formatting
by 2teez (Vicar) on Jul 06, 2014 at 01:23 UTC

    Hi,

    ..My problem is I have no idea how I would go about grabbing data into specific fields with the way the txt file is currently formatted...

    I really don't think that is such a problem. With the data set you provided, one can do like this:

    use warnings; use strict; print join ("\t\t" => qw(Name Phone E-mail)),$/; while (<DATA>) { chomp; print $_, "\t"; print $/ if $. % 3 == 0; # note here } __DATA__ Jacobs, April 750.467.9582 quam.quis@sedhendrerit.org Mays, Martena 870.348.1974 sollicitudin@nonummyFusce.org McNeil, Brennan 289.527.6151 lobortis@nisl.com Sexton, Melvin 599.927.5337 in.felis@varius.com Blackburn, Prescott 661.589.1228 sed@egetlaoreetposuere.edu
    Output
    Name Phone E-mail Jacobs, April 750.467.9582 quam.quis@sedhendrerit.org Mays, Martena 870.348.1974 sollicitudin@nonummyFusce.org McNeil, Brennan 289.527.6151 lobortis@nisl.com Sexton, Melvin 599.927.5337 in.felis@varius.com Blackburn, Prescott 661.589.1228 sed@egetlaoreetposuere.edu
    The CSV part ( since you have mentioned that you can't use an external module ) and format to taste of the question is left for the OP.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      That would be perfect formatting just need to get it onto a csv file and read from the txt vs writing it. Changed it to read from the text file now I just need to have it print to the csv.
      use warnings; use strict; my $infile="clients.txt"; open my $in, "<", $infile or die $!; open my $out, ">" , "clients.csv" or die $!; my $clients = print join ("\t\t" => qw(Name Phone E-mail)),$/; while (<$in>) { chomp; print $_, "\t"; print $/ if $. % 3 == 0; # note here }

        ..just need to get it onto a csv file and read from the txt vs writing it. Changed it to read from the text file now I just need to have it print to the csv...

        That is exactly what you will have to do, at least I have shown you that it's not impossible to get each client data on a single row.

        And what do you hope to achieve with this line in your code?: my $clients = print join ("\t\t" => qw(Name Phone E-mail)),$/;

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: converting txt to csv and formatting
by kcott (Archbishop) on Jul 06, 2014 at 10:24 UTC

    G'day csorrentini,

    "This is what I currently have as far as reading in the text file and writing the csv file."

    I concur with earlier comments regarding the use of Text::CSV.

    "My problem is I have no idea how I would go about grabbing data into specific fields with the way the txt file is currently formatted."

    I would probably just read the lines in groups of three. Here's my take on what (the guts of) your code might look like:

    #!/usr/bin/env perl -l use strict; use warnings; use autodie; use constant INFO_LINES => 3; use Text::CSV; my $txt_in = 'pm_1092407_txt_in.txt'; my $csv_out = 'pm_1092407_csv_out.csv'; cat_file($txt_in, 'INPUT'); my $csv = Text::CSV::->new; open my $txt_in_fh, '<', $txt_in; open my $csv_out_fh, '>', $csv_out; my @captures; while (<$txt_in_fh>) { chomp; push @captures, $_; unless ($. % INFO_LINES) { $csv->print($csv_out_fh, [@captures]); @captures = (); } } close $txt_in_fh; close $csv_out_fh; cat_file($csv_out, 'OUTPUT'); sub cat_file { my ($file, $prompt) = @_; $prompt = 'FILE' unless defined $prompt; print '=' x 64; print "$prompt ($file):"; print '-' x 64; system cat => $file; }

    Output:

    ================================================================ INPUT (pm_1092407_txt_in.txt): ---------------------------------------------------------------- Jacobs, April 750.467.9582 quam.quis@sedhendrerit.org Mays, Martena 870.348.1974 sollicitudin@nonummyFusce.org McNeil, Brennan 289.527.6151 lobortis@nisl.com Sexton, Melvin 599.927.5337 in.felis@varius.com Blackburn, Prescott 661.589.1228 sed@egetlaoreetposuere.edu ================================================================ OUTPUT (pm_1092407_csv_out.csv): ---------------------------------------------------------------- "Jacobs, April",750.467.9582,quam.quis@sedhendrerit.org "Mays, Martena",870.348.1974,sollicitudin@nonummyFusce.org "McNeil, Brennan",289.527.6151,lobortis@nisl.com "Sexton, Melvin",599.927.5337,in.felis@varius.com "Blackburn, Prescott",661.589.1228,sed@egetlaoreetposuere.edu

    -- Ken

Re: converting txt to csv and formatting
by 1s44c (Scribe) on Jul 06, 2014 at 00:15 UTC

    You need to make some assumptions about the way your data is formatted. You have to check each line with something like the following:

    if ( $line =~ /\A[0-9]/ ) { $phone = $line; # It's a phone number } elsif ( $line =~ /.*@.*/ ) { $email = $line; # It's an email address print "\"$name\",\"$phone\",\"$email\""; #Assumption - input data is + ordered name, phone, email. } elsif ( $line =~ /[\A[A-Z][a-zA-Z]+,/ ) { $name = $line; # It's a name } else { print "WARNING: I could not understand input line: $line\n"; }

    Of course assumptions tend to be wrong the first time. You may have to adjust them later after checking the output. Also it's better to use a module to generate CSV, the print line above is for demonstrating the idea, not real production usage.

Re: converting txt to csv and formatting
by gurpreetsingh13 (Scribe) on Jul 07, 2014 at 06:03 UTC
    Something out of the way though, but I would say you can try this if you have Vim.

    1.Simply make a copy of the file and open it in Vim
    2.Use this search-and-replace to surround all names with quotes.
    %s/\(.*\),\(.*\)/"\1,\2"/g
    3. Again a search and replace to put comma at the end of each line.
    %s/\n/,\r/g
    4.Then record a simple Macro or make it manually as:
    :let @q="JJj"
    5. Replay the macro N number of times where N >= records or you can say N = (No. of lines/3)
    6. Finally remove the commas at the end of each line with the command:
    %s/,$//g
    Save the file and you are done. Just rename it to <Filename>.csv and the file is ready.
    You can probably emit some steps above if you are good at regex by putting comma in first search-and-replace only at the end of names and phone numbers.
    Or making a comma in macro itself.
    But better forget all that and this simple one would work I believe.
    Hope that solves the problem.
      Update:
      This simple regex would also work under vim instead of using macro.
      s/\(.*\),\(.*\)\n\(.*\)\n\(.*\)\n/"\1,\2",\3,\4\r/g
      Without using vim, under Unix, you can do:
      perl -e 'my $data =`cat test.pl`; $data=~s/(.*),(.*)\n(.*)\n(.*)\n/"\1 +,\2",\3,\4\n/g;print $data;'>newfile.csv
      Under windows, you can do:
      perl -e 'my $data =`type test.pl`; $data=~s/(.*),(.*)\n(.*)\n(.*)\n/"\ +1,\2",\3,\4\n/g;print $data;'>newfile.csv

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1092407]
Approved by boftx
Front-paged by Jim
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-26 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found