Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Count Quoted Words

by flash4syth (Initiate)
on Jun 07, 2013 at 01:37 UTC ( #1037544=perlquestion: print w/replies, xml ) Need Help??

flash4syth has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Long time Perl student, first time Perl Monks post.

My program reads in a text file, and I want to count white-space words which are between double quotes. The quote marks are already delimited by space. For example:

'Here is " a quoted string " for you'

The quotes often extend beyond one line, so as I read in the file, I split each line and append the result onto an array of words, here's an example of the resulting array content:

( '"', 'quoted', 'words', '"', )

How can I count the words between the quotes for every instance of open/close double quotes in this array?

Thanks in advance

Replies are listed 'Best First'.
Re: Count Quoted Words
by Cristoforo (Curate) on Jun 07, 2013 at 02:10 UTC
    Here is an example, partly from the docs for Text::ParseWords, (part of core since perl 5). But, it is not using an array, but the whole body of text.
    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; my $text; do {local $/; $text = <DATA>}; my @words = quotewords('\s+', 1,$text); my $i = 0; foreach (@words) { if (/^".+"$/s) { printf "<%s> : COUNT %d\n", $_, scalar split; } } __DATA__ "Yes, yes, how was it now?" he thought, going over his dream. "Now, how was it? To be sure! Alabin was giving a dinner at Darmstadt; no, not Darmstadt, but something American. Yes, but then, Darmstadt was in America. Yes, Alabin was giving a dinner on glass tables, and the tables sang, _Il mio tesoro_--not _Il mio tesoro_ though, but something better, and there were some sort of little decanters on the table, and they were women, too," he remembered.
    The out put is:
    C:\Old_Data\perlp>perl t33.pl <"Yes, yes, how was it now?"> : COUNT 6 <"Now, how was it? To be sure! Alabin was giving a dinner at Darmstadt; no, not Darmstadt, but something American. Yes, but then, Darmstadt was in America. Yes, Alabin was giving a dinner on glass tables, and the tables sang, _Il mio tesoro_--not _Il mio tesoro_ though, but something better, and there were some sort of little decanters on the table, and they were women, too,"> : COUNT 66
Re: Count Quoted Words
by smls (Friar) on Jun 07, 2013 at 03:51 UTC
    Given an array @words like shown in the question, you could do:
    my $count; foreach (@words) { my $quote = ($_ eq '"'); if ($quote ... $quote) { if (!$quote) { $count++ } elsif ($count) { print "$count quoted words\n"; $count = 0 } } }
    Although I admit that the ... operator is slighty obscure... :)
Re: Count Quoted Words
by smls (Friar) on Jun 07, 2013 at 04:14 UTC
    Unless the words array is also needed for something else, my preferred solution would be to skip it entirely and just use a regex on the whole file content (what can I say, I like regexes):
    use File::Slurp; my $text = read_file('input.txt'); while ($text =~ /" (.*?) "/sg) { print "Found quoted string with ".split(' ', $1)." words: $1\n"; }
    The regex might need to be adjusted depending on the exact definition of what should be counted as a quoted string within the input data.

      Thanks! I ended up going with this solution as it is easy to read and allows me to get other data about the entire text using regex's.

Re: Count Quoted Words
by jaredor (Priest) on Jun 07, 2013 at 06:33 UTC

    If you just want the count of all white space words within double quotes in a text file, you don't need to keep much information hanging around. Try filtering:

    perl -E 'say 0+map{split}grep{$i++%2}split/"/,do{undef$/;<>};' fil.txt

    (Please pardon the golfing, I have this thing about keeping one-liners on one line ;-)

Re: Count Quoted Words
by Anonymous Monk on Jun 07, 2013 at 02:31 UTC

    Mwahahahaha

    #!/usr/bin/perl -- #~ #~ #~ #~ # perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END if " -otr -op +r -ce -nibc -i=4 -pt=0 "-nsak=*" #!/usr/bin/perl -- use strict; use warnings; use autodie; # error checking for open/close... Main( @ARGV ); exit( 0 ); sub Main { my( @files ) = @_; if( not @files ) { my $lines = q{ And "then" something "happened" on "the first and second and third lines" and then it was over}; @files = ( \'"the boat"', \$lines, \"$lines $lines $lines" ); } for my $file ( @files ) { print "## { $file }{ wordcount= ", Vote( $file ), " }\n"; } } ## end sub Main sub Vote { my( $fyle ) = @_; open my( $wyld ), '<:raw', $fyle; my $in_quotes = 0; my $words = 0; while( my $line = readline $wyld ) { pos( $line ) = 0; WORDCOUNTER: while( length( $line ) > pos( $line ) ) { $line =~ m{ \G\s*\x22 # quote after optional whitespace }gxcs and do { $in_quotes = !$in_quotes; ## flip it next WORDCOUNTER; }; $in_quotes and $line =~ m{ [^\x22\s]+ ## not quote or whitespace }gxcs and do { $words++; next WORDCOUNTER; }; $line =~ m{ \G[^\x22]+ ## not quote }gxcs and do { next WORDCOUNTER; }; } ## end of WORDCOUNTER } ## end of readline return $words; } ## end sub Vote __END__ ## { SCALAR(0xbb713c) }{ wordcount= 2 } ## { SCALAR(0xad05a4) }{ wordcount= 9 } ## { SCALAR(0x3f8fec) }{ wordcount= 27 }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1037544]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2021-03-02 05:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favorite kind of desktop background is:











    Results (38 votes). Check out past polls.

    Notices?