DigitalKitty has asked for the wisdom of the Perl Monks concerning the following question:
Hi all.
With help from: parv, dhoss, Fairy_Nuff, and planetscape, I started writing a chatterbox history tool for educational reasons.
use warnings;
use strict;
use LWP::Simple;
use DBI;
my $data = '';
my $dbh = '';
my $url = 'http://www.perlmonks.org/?node_id=207304';
my $pat = qr{ .*<author>(.*)<\/author>.*<text>(.*)<\/text }xs;
$data = get( $url );
$dbh = DBI->connect( "dbi:SQLite:dbname=C:\\testdb", "", "" );
while ( ( my($auth, $text) = ( $data =~ m/$pat/gc ) ) ) {
for( $text ) {
s/[ ]+/ /g;
s/^\s+//;
s/\s+$//;
}
printf "%s: %s\n\n" , $auth , $text;
$dbh->do('insert into monks values(?,?)', undef, $auth, $text );
}
I was hoping some of you could offer suggestions regarding how I might improve the design/functionality of the (currently beta quality) program. At the present time, it only displays the most recent author/comment as opposed to several speakers and their respective comments.
I took the liberty of including my (simple) table design as well:
SQLite 3.5.6
CREATE TABLE monks(
monk varchar(25),
comment varchar(255)
);
Thanks,
~Katie
Re: Perl, SQLite3, and Parsing the Chatterbox Feed.
by McDarren (Abbot) on Feb 14, 2008 at 06:27 UTC
|
um, two comments..
- You're parsing XML with a regex. Tsk! Tsk!. You should know better than that :p
Use a proper XML parser such as XML::Twig or XML::Simple.
- Given that you're creating a CB history, wouldn't you think it a good idea to include a date/time field in your database? ;)
Cheers,
Darren :) | [reply] |
|
I'd like to second both of these suggestions, as well as add a couple of my own...
Have you considered parsing the posts for links in the CB? I imagine it would lend itself to some very interesting correlations down the road: "What percentage of posts link to cpan? Which Monk links to his/her scratchpad most often? etc..."
To make this really work well, you would definately need at least a time value as suggested already (and if you plan to keep more than 24 hours worth of data, a date value will be necessary as well).
| [reply] |
Re: Perl, SQLite3, and Parsing the Chatterbox Feed.
by holli (Abbot) on Feb 14, 2008 at 10:27 UTC
|
#!/usr/bin/perl
use lib qw( /mnt/web4/10/47/51683347/htdocs/lib/site_perl/5.8.5 );
use warnings;
use strict;
use DBI;
use WWW::Mechanize;
use XML::Simple;
my ($sth, $dbh, $xml);
my $messages = [];
my $mech = WWW::Mechanize->new();
while (1)
{
my $resp = $mech->get( 'http://www.perlmonks.org/index.pl?node_id=
+207304' );
if ( $resp->is_success )
{
my $xml = $resp->content;
my $jatter = XMLin( $xml, ForceArray => ['message'] );
if ( $jatter->{info}->{count} > 0 )
{
print STDERR "adding ", scalar @{$jatter->{message}}, "\n"
+;
unless ( $dbh )
{
$dbh = DBI->connect("DBI:mysql:database=DB354211;host=
+rdbms.strato.de", 'U354211', 'pw354211');
$sth = $dbh->prepare('INSERT INTO pmf_jatterboxx (user
+_id, author, epoch, message_id, message) VALUES (?, ?, ?, ?, ?)');
}
for ( @{$jatter->{message}} )
{
$sth->execute( $_->{user_id}, $_->{author}, $_->{epoch
+}, $_->{message_id}, $_->{text} );
}
}
else
{
print STDERR "snooze\n";
}
}
sleep(5);
}
note: It is not obvious, but the chatterbox feed somehow notices the caller and returns only the chat-lines that are new; even without passing a date flage or something. I am curious how that works.
| [reply] [d/l] |
Re: Perl, SQLite3, and Parsing the Chatterbox Feed.
by hipowls (Curate) on Feb 14, 2008 at 06:36 UTC
|
use XML::Simple;
use LWP::Simple;
use Data::Dumper;
my $url = 'http://www.perlmonks.org/?node_id=207304';
my $text = get($url);
my $ref = XMLin( $text, ForceArray => ['message'], );
print Dumper $ref;
__END__
$VAR1 = {
'info' => {
'sitename' => 'PerlMonks',
'count' => '2',
'gentimeGMT' => '2008-02-14 06:32:17',
'lastid' => '703987',
'content' => 'Rendered by the New Chatterbox XML Ticker',
'xmlmaker' => 'XML::Fling 1.001',
'site' => 'http://perlmonks.org/',
'xmlstyle' => 'clean,new',
'fromid' => '00703985',
'ticker_id' => '207304'
},
'message' => [
{
'message_id' => '703986',
'epoch' => '1202970679',
'text' => 'testing',
'time' => '01:31:19',
'date' => '2008-02-14',
'user_id' => '660179',
'author' => 'hipowls'
},
{
'message_id' => '703987',
'epoch' => '1202970708',
'text' => 'just ignore it',
'time' => '01:31:48',
'date' => '2008-02-14',
'user_id' => '660179',
'author' => 'hipowls'
}
]
};
Update: Added ForceArray => ['message'] so that messages are always in a list even when there is only one. | [reply] [d/l] [select] |
Re: Perl, SQLite3, and Parsing the Chatterbox Feed.
by pc88mxer (Vicar) on Feb 14, 2008 at 06:15 UTC
|
You definitely need to use less greedy regex's. Instead of:
my $pat = qr{ .*<author>(.*)<\/author>.*<text>(.*)<\/text }xs;
use:
my $pat = qr{ .*?<author>(.*?)<\/author>.*?<text>(.*?)<\/text }xs;
Also, I'm not sure you are using the /g option correctly. I've had better luck with:
while ($data =~ m/$pat/gc) {
my ($auth, $text) = ($1, $2);
for( $text ) {
s/[ ]+/ /g;
s/^\s+//;
s/\s+$//;
}
printf "%s: %s\n\n" , $auth , $text;
}
| [reply] [d/l] [select] |
|
I don't see why either of you are using /c. It's definitely not useful, and I suspect it's harmful.
| [reply] [d/l] |
|
# Without /g, it would be an endless loop for match will
# always start at the start of $data.
while ( $data =~ m/$parse/g )
{
my ( $auth , $text ) = ( "$1" , "$2" );
...
}
(Circa 2001-2005, there are some examples of XML::(Twig|Simple) use to parse the chatterbox XML around here somewhere.)
| [reply] [d/l] [select] |
|
|