Re^3: Parsing Gutenberg Catalog Index


Don't ask to ask, just ask
	PerlMonks

Re^3: Parsing Gutenberg Catalog Index

by tachyon (Chancellor)

on Aug 30, 2004 at 07:58 UTC ( [id://386862]=note: print w/replies, xml )

Need Help??

in reply to Re^2: Parsing Gutenberg Catalog Index
in thread Parsing Gutenberg Catalog Index

I would second that, given that the RDF looks like:

<rdf:Description rdf:ID="etext13218">
  <dc:publisher>&pg;</dc:publisher>
  <dc:title rdf:parseType="Literal">Don Orsino</dc:title>
  <dc:creator>Crawford, F. Marion (Francis Marion) (1854-1909)</dc:cre
+ator>
  <dc:language>en</dc:language>
  <dc:created>2004-08-19</dc:created>
  <dc:rights rdf:resource="&lic;" />
</rdf:Description>
[download]

Then all you need is something trivial like this to create a file ready for a MySQL 'load data local infile .....'

#!/usr/bin/perl
local $/ = "\n\n";
open RDF, $ARGV[0] or die $!;
while(<RDF>){
    next unless m/<rdf:Description rdf:ID="etext(\d+)"/;
    my $id = $1;
    next unless m/<dc:title[^>]+>([^<\n]+)</;
    my $title = $1;
    next unless m/<dc:creator>([^<\n]+)</;
    my $author = $1;
    $title =~ s/\s+/ /g;
    $author =~ s/\s+/ /g;
    print "$id\t$title\t$author\n";
}
[download]

cheers

tachyon

Comment on Re^3: Parsing Gutenberg Catalog Index Select or Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://386862]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others musing on the Monastery: (2)

As of 2024-04-25 21:50 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found