csorrentini has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks,
I am trying to convert a txt file to csv which isn't really too difficult, however I'm not too sure how I would go about it with a text file that isn't pre formatted to having all of a clients info on one line, second client on second line, etc. This is how the txt file currently reads:
Jacobs, April
750.467.9582
quam.quis@sedhendrerit.org
Mays, Martena
870.348.1974
sollicitudin@nonummyFusce.org
McNeil, Brennan
289.527.6151
lobortis@nisl.com
Sexton, Melvin
599.927.5337
in.felis@varius.com
Blackburn, Prescott
661.589.1228
sed@egetlaoreetposuere.edu
I need to basically format the info in a csv file to read as a table with headers of Name, Phone and email. Each different client on a new row.
This is what I currently have as far as reading in the text file and writing the csv file. My problem is I have no idea how I would go about grabbing data into specific fields with the way the txt file is currently formatted.
use strict;
use warnings;
my $infile="clients.txt";
open my $in, "<", $infile or die $!;
open my $out, ">" , "clients.csv" or die $!;
my $client = <$in>;
chomp($client);
Any help greatly appreciated.
Re: converting txt to csv and formatting
by AppleFritter (Vicar) on Jul 06, 2014 at 00:11 UTC
|
You're guaranteed to have three lines of information per client entry, right? You could do this (not the prettiest, perhaps):
while(my $client = <>) {
chomp $client;
chomp(my $phone = <>);
chomp(my $email = <>);
...
}
For writing your CSV file, I'd suggest using a module from CPAN (e.g. Text::CSV) rather than rolling your own.
| [reply] [d/l] |
|
Thanks for the quick reply, Yes guaranteed three lines of information always.
As far as writing to the csv this is a class assignment and the instructor said "You really don’t need any external modules (like the CSV module we used in an earlier script)." so I don't believe he wants us to use it for some strange reason. I input your piece of coding into the script, now without using Text::CSV is it just a simple print to the csv file?
| [reply] |
|
Generating valid CSV records is more trivial than parsing CSV records. In this case, all of your data are character strings, not numbers or timestamps, so it's appropriate to quote all three fields all the time.
"Jacobs, April","750.467.9582","quam.quis@sedhendrerit.org"
"Mays, Martena","870.348.1974","sollicitudin@nonummyFusce.org"
"McNeil, Brennan","289.527.6151","lobortis@nisl.com"
"Sexton, Melvin ""The Copymeister""","599.927.5337","in.felis@vari
+us.com"
"Blackburn, Prescott","661.589.1228","sed@egetlaoreetposuere.edu"
The most important thing to anticipate when generating CSV records in which every field is quoted is the possibility of the presence of the quote character in the data. The most common convention nowadays for escaping literal occurrences of the quote character is to use the same character as an escape character ("").
for (@client_values) {
s/(?=")/"/g;
s/^/"/;
s/$/"/;
}
my $client_record = join ',', @client_values;
Don't print the record piecemeal, one field at a time. Doing this is a worst practice, IMHO. Instead, generate a valid, whole CSV record and then print it.
print "$client_record\n";
| [reply] [d/l] [select] |
|
|
|
|
| [reply] |
Re: converting txt to csv and formatting
by 2teez (Vicar) on Jul 06, 2014 at 01:23 UTC
|
Hi,
..My problem is I have no idea how I would go about grabbing data into specific fields with the way the txt file is currently formatted...
I really don't think that is such a problem. With the data set you provided, one can do like this:
use warnings;
use strict;
print join ("\t\t" => qw(Name Phone E-mail)),$/;
while (<DATA>) {
chomp;
print $_, "\t";
print $/ if $. % 3 == 0; # note here
}
__DATA__
Jacobs, April
750.467.9582
quam.quis@sedhendrerit.org
Mays, Martena
870.348.1974
sollicitudin@nonummyFusce.org
McNeil, Brennan
289.527.6151
lobortis@nisl.com
Sexton, Melvin
599.927.5337
in.felis@varius.com
Blackburn, Prescott
661.589.1228
sed@egetlaoreetposuere.edu
Output
Name Phone E-mail
Jacobs, April 750.467.9582 quam.quis@sedhendrerit.org
Mays, Martena 870.348.1974 sollicitudin@nonummyFusce.org
McNeil, Brennan 289.527.6151 lobortis@nisl.com
Sexton, Melvin 599.927.5337 in.felis@varius.com
Blackburn, Prescott 661.589.1228 sed@egetlaoreetposuere.edu
The CSV part ( since you have mentioned that you can't use an external module ) and format to taste of the question is left for the OP.
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] [select] |
|
That would be perfect formatting just need to get it onto a csv file and read from the txt vs writing it. Changed it to read from the text file now I just need to have it print to the csv.
use warnings;
use strict;
my $infile="clients.txt";
open my $in, "<", $infile or die $!;
open my $out, ">" , "clients.csv" or die $!;
my $clients = print join ("\t\t" => qw(Name Phone E-mail)),$/;
while (<$in>) {
chomp;
print $_, "\t";
print $/ if $. % 3 == 0; # note here
}
| [reply] [d/l] |
|
..just need to get it onto a csv file and read from the txt vs writing it. Changed it to read from the text file now I just need to have it print to the csv...
That is exactly what you will have to do, at least I have shown you that it's not impossible to get each client data on a single row.
And what do you hope to achieve with this line in your code?:
my $clients = print join ("\t\t" => qw(Name Phone E-mail)),$/;
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] |
|
|
Re: converting txt to csv and formatting
by kcott (Archbishop) on Jul 06, 2014 at 10:24 UTC
|
G'day csorrentini,
"This is what I currently have as far as reading in the text file and writing the csv file."
I concur with earlier comments regarding the use of Text::CSV.
"My problem is I have no idea how I would go about grabbing data into specific fields with the way the txt file is currently formatted."
I would probably just read the lines in groups of three.
Here's my take on what (the guts of) your code might look like:
#!/usr/bin/env perl -l
use strict;
use warnings;
use autodie;
use constant INFO_LINES => 3;
use Text::CSV;
my $txt_in = 'pm_1092407_txt_in.txt';
my $csv_out = 'pm_1092407_csv_out.csv';
cat_file($txt_in, 'INPUT');
my $csv = Text::CSV::->new;
open my $txt_in_fh, '<', $txt_in;
open my $csv_out_fh, '>', $csv_out;
my @captures;
while (<$txt_in_fh>) {
chomp;
push @captures, $_;
unless ($. % INFO_LINES) {
$csv->print($csv_out_fh, [@captures]);
@captures = ();
}
}
close $txt_in_fh;
close $csv_out_fh;
cat_file($csv_out, 'OUTPUT');
sub cat_file {
my ($file, $prompt) = @_;
$prompt = 'FILE' unless defined $prompt;
print '=' x 64;
print "$prompt ($file):";
print '-' x 64;
system cat => $file;
}
Output:
================================================================
INPUT (pm_1092407_txt_in.txt):
----------------------------------------------------------------
Jacobs, April
750.467.9582
quam.quis@sedhendrerit.org
Mays, Martena
870.348.1974
sollicitudin@nonummyFusce.org
McNeil, Brennan
289.527.6151
lobortis@nisl.com
Sexton, Melvin
599.927.5337
in.felis@varius.com
Blackburn, Prescott
661.589.1228
sed@egetlaoreetposuere.edu
================================================================
OUTPUT (pm_1092407_csv_out.csv):
----------------------------------------------------------------
"Jacobs, April",750.467.9582,quam.quis@sedhendrerit.org
"Mays, Martena",870.348.1974,sollicitudin@nonummyFusce.org
"McNeil, Brennan",289.527.6151,lobortis@nisl.com
"Sexton, Melvin",599.927.5337,in.felis@varius.com
"Blackburn, Prescott",661.589.1228,sed@egetlaoreetposuere.edu
| [reply] [d/l] [select] |
Re: converting txt to csv and formatting
by 1s44c (Scribe) on Jul 06, 2014 at 00:15 UTC
|
You need to make some assumptions about the way your data is formatted. You have to check each line with something like the following:
if ( $line =~ /\A[0-9]/ ) {
$phone = $line; # It's a phone number
} elsif ( $line =~ /.*@.*/ ) {
$email = $line; # It's an email address
print "\"$name\",\"$phone\",\"$email\""; #Assumption - input data is
+ ordered name, phone, email.
} elsif ( $line =~ /[\A[A-Z][a-zA-Z]+,/ ) {
$name = $line; # It's a name
} else {
print "WARNING: I could not understand input line: $line\n";
}
Of course assumptions tend to be wrong the first time. You may have to adjust them later after checking the output. Also it's better to use a module to generate CSV, the print line above is for demonstrating the idea, not real production usage. | [reply] [d/l] |
Re: converting txt to csv and formatting
by gurpreetsingh13 (Scribe) on Jul 07, 2014 at 06:03 UTC
|
Something out of the way though, but I would say you can try this if you have Vim.
1.Simply make a copy of the file and open it in Vim
2.Use this search-and-replace to surround all names with quotes.
%s/\(.*\),\(.*\)/"\1,\2"/g
3. Again a search and replace to put comma at the end of each line.
%s/\n/,\r/g
4.Then record a simple Macro or make it manually as:
:let @q="JJj"
5. Replay the macro N number of times where N >= records or you can say N = (No. of lines/3)
6. Finally remove the commas at the end of each line with the command:
%s/,$//g
Save the file and you are done. Just rename it to <Filename>.csv and the file is ready.
You can probably emit some steps above if you are good at regex by putting comma in first search-and-replace only at the end of names and phone numbers. Or making a comma in macro itself.
But better forget all that and this simple one would work I believe.
Hope that solves the problem.
| [reply] [d/l] [select] |
|
Update:
This simple regex would also work under vim instead of using macro.
s/\(.*\),\(.*\)\n\(.*\)\n\(.*\)\n/"\1,\2",\3,\4\r/g
Without using vim, under Unix, you can do:
perl -e 'my $data =`cat test.pl`; $data=~s/(.*),(.*)\n(.*)\n(.*)\n/"\1
+,\2",\3,\4\n/g;print $data;'>newfile.csv
Under windows, you can do:
perl -e 'my $data =`type test.pl`; $data=~s/(.*),(.*)\n(.*)\n(.*)\n/"\
+1,\2",\3,\4\n/g;print $data;'>newfile.csv
| [reply] [d/l] [select] |
|
|