I am having a little bit of an issue using the Text::CSV_XS module. I've only ever used it to read csv files but never to create. The issue I am having is that when I print the output to file everything gets dumped to 1 line. The entire string gets wrapped in quotes so it treats each row as a column instead of each individual value. Here is the code I am using. I am sure this is probably something simple to resolve but it's been driving me crazy for a few hours and I need to move on. Thanks in advance for your help.
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use HTML::Strip;
use Text::CSV_XS qw(csv);
my $csv = Text::CSV_XS->new({ sep_char => "\t" });
my $hs = HTML::Strip->new();
my $file = 'test.html';
my $out = 'out.csv';
open my $fh ,"<", $file or die "Failed to open $file!: $!\n";
open my $io ,">", $out or die "Failed to open $out!: $!\n";
my $flag = 0;
while(my $line = <$fh>) {
chomp $line;
if ($line =~ /\<\/Table\>\<Pre\>/){ $flag = 1;}
elsif ($line =~ /\<A Name\=Footnotes\>\<\/A\>/){ $flag = 0;}
next if $line =~ /→/;
if ($flag) {
my @data = &cleanThis($line);
$csv->print($io, \@data);
}
}
close($fh);
close($io);
sub cleanThis {
my $string = shift;
my $clean_text = $hs->parse($string);
if ($clean_text =~ /^.+(\d+\:.+?[M|K])$/){
$clean_text = "$clean_text\tNone listed";
}
my ($asOf, $filer, $filing, $forOn, $docsSize, $agent) = $clean_te
+xt =~ m/(\d+\/\d+\/\d+)\s+(.+)\s+?(\w.+)\s+(\d+\/\d+\/\d+).+?(\d+\:\d
+.+?)\s+(.+)/;
my @formated = ($asOf,$filer,$filing,$agent);
foreach my $trim(@formated){
$trim =~ s/^\s+|\s+$//g;
}
return join("\t",@formated)
}
Here is a sample out the output
"1/27/16 Advanced Series Trust 497K Prudential Moneymar..Inc"
+"1/27/16 Advisors Series Trust 497K US Bancorp Fund Svcs LLC
+""1/27/16 Advisors Series Trust 497K US Bancorp Fund Svcs LL
+C""1/27/16 Advisors Series Trust 497K US Bancorp Fund Svcs L
+LC""1/27/16 Ark ETF Trust 497K Vintage/FA""1/27/16 Ark ET
+F Trust 497K Vintage/FA""1/27/16 Delaware Group Cash Reserve
+ 485BPOS DG3/FA""1/27/16 Federated Equity Income Fund Inc
+ N-CSR Federated Admin..Svcs/FA""1/27/16 Federated Inv Series F
+unds Inc N-CSR Federated Admin..Svcs/FA""1/27/16 Fidelity Ad
+visor Series I N-CSR Publishing Data...Inc/FA""1/27/16 Fidel
+ity Commonwealth Trust N-CSR Publishing Data...Inc/FA""1/27/16
+ Fidelity Court Street Trust N-CSR Publishing Data...Inc/FA""
+1/27/16 Fidelity Court Street Trust II N-CSR Publishing Data
+...Inc/FA""1/27/16 Fidelity Financial Trust N-CSR Publishing
+ Data...Inc/FA""1/27/16 Fidelity MT Vernon Street Trust N-CSR
+ None listed""1/27/16 Fidelity Phillips Street Trust N-CSR
+Fidelity Aberdeen St..Tr""1/27/16 Fidelity Rutland Square Trust II
+ 497K Fidelity Aberdeen St..Tr""1/27/16 Fidelity Rutland Squ
+are Trust II 497K Fidelity MT Vernon S..Tr""1/27/16 Fidelity
+ Salem Street Trust N-CSR Publishing Data...Inc/FA""1/27/16
+John Hancock ETF Trust 497K Data Communique Inc./FA"
As you can see it's a tab separated line but the entire line gets treated as a single value
Update
Changing my subroutine to return (@formatted) did the trick. Thanks for the help