Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Scratch DB

by Fox (Pilgrim)
on Sep 29, 2011 at 14:07 UTC ( [id://928582]=CUFP: print w/replies, xml ) Need Help??

Have you ever spent too much time rolling your own script to take some statistics from tables stored in files ?

Well, I go through this in a weekly basis, it always starts with something simple to do in a perl one-liner and quickly (d)evolves into a massive ball of nearly all tools in your $PATH.

The usual way to solve that for me is loading the files into a database and making my queries from there, it works great but importing and exporting the files takes too much time and finding a place on the server to load the database is another problem.

Scratch DB, is a thin layer around DBD::SQLite, that gives you a temporary database to load your text files as tables and query them with all SQL have, without the need for a server.

# creates a new db on db_file, or stores the database on RAM $ perl sdb.pl [db_file] # loads a file into a table, defaults to tab delimited fields sdb> load file.txt [tablename [field_separator]] sdb> *any SQL statement* # writes the contents of table_x to filename sdb> dump table_x [filename [field_separator]]
Also, thanks to DBD::SQLite, the database is stored all in RAM so it is quite fast, and no temporary files to mess with.
use strict; use warnings; use String::Escape qw( unprintable ); use Text::ParseWords; use Term::ReadLine; use File::Basename; use Text::Table; use Text::CSV; use DBI; my $db; run(shift @ARGV || ':memory:'); exit(0); sub run { my $dbfile = shift; $db = DBI->connect("dbi:SQLite:dbname=$dbfile", undef, undef, { sqlite_see_if_its_a_number => 1, }); my $term = Term::ReadLine->new($0); $term->ornaments(0); # no fancy underline while(defined ($_ = $term->readline("sdb> "))) { next unless /\S/; s/^\s*//; if(/^quit/i) { last; } elsif(/^(load|dump)/i) { no strict 'refs'; my ($method, @args) = shellwords($_); my $output = eval { &$method(@args) }; warn $@ if $@; print "$output\n" if $output; } else { my $output = eval { sql_command($_) }; warn $@ if $@; print "$output\n" if $output; } } } sub load { my $file = shift || die "usage: load <file> [<table> [<field_separ +ator>]]\n"; my ($table_name, undef, undef) = shift || fileparse($file, qr/\.[^ +.]*/); my $FS = unprintable(shift) || "\t"; my $asv = Text::CSV->new({ binary => 1, sep_char => $FS}); open my $fh, '<', $file or die "open [$file]: $!"; my $table; push @$table, $asv->getline($fh) while(!eof $fh); my $n_col = scalar @{$table->[0]}; $db->prepare("CREATE TABLE $table_name (".join(', ',map { "col".$_ + }0..$n_col - 1).");") ->execute() or die $!; my $i = $db->prepare("INSERT INTO $table_name VALUES(".join(", ",( +'?')x$n_col).");"); for(@$table) { $i->execute(@$_) or die $!; } return @$table." row(s) loaded on table $table_name."; } sub dump { my $table = shift || die "usage: dump <table> [<file> [<field_sepa +rator>]]\n"; my $file = shift || $table; my $FS = unprintable(shift) || "\t"; my $asv = Text::CSV->new({ binary => 1, sep_char => $FS, eol => $/ +}); my $s = $db->prepare("SELECT * FROM $table") or die $!; open my $fh, '>', $file or die "open [$file]: $!"; $s->execute() or die $!; while(my $row = $s->fetch) { $asv->print($fh, $row); } return $s->rows." row(s) written to file $file."; } sub sql_command { my ($str) = @_; my $s = $db->prepare($str) or die $!; $s->execute() or die $!; my $table; push @$table, [ map { defined $_ ? $_ : "undef" } @{$s->{'NAME'}} +]; while(my $row = $s->fetch) { push @$table, [ map{ defined $_ ? $_ : "undef" }@$row ]; } box_format($table); return $s->rows.' row(s) affected.'; } sub box_format { my $table = shift; my $n_cols = scalar @{$table->[0]}; my $tb = Text::Table->new(\'| ', '', (\' | ','')x($n_cols-1), \' | +'); $tb->load(@$table); my $rule = $tb->rule(qw/- +/); my @rows = $tb->body(); print $rule, shift @rows, $rule, @rows, $rule if @rows; }

Replies are listed 'Best First'.
Re: Scratch DB
by Tux (Canon) on Sep 29, 2011 at 19:49 UTC
      You know, actually I started this script as a extension of DBD::File, the problem is that the SQL supported by SQL::Statement( used by DBD::File) is not as flexible as I'm used to, things like:

      SELECT *, 'extra', 'columns' FROM table
      or
      select * from (select 1)
      to name a few, aren't supported,
      so I ended up using DBD::SQLite because you can be sure that it at least support the official syntax.
Re: Scratch DB
by jdporter (Paladin) on Sep 29, 2011 at 19:38 UTC

    Thanks! There have been many times I could have used something like this. I always ended up with — as you say — a massive ball of tools. ;-)

    What about having it use a temporary file as an option? Many times, in my experience, the data involved are actually way too large to fit in memory.

    I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.
      What about having it use a temporary file as an option? Many times, in my experience, the data involved are actually way too large to fit in memory.
      Nice idea !, also allows you to store your dbfile with you in case you need to use it again later, I updated the script.
Re: Scratch DB
by mje (Curate) on Sep 30, 2011 at 08:18 UTC

    Nice one and thanks. I'm sure I'll find this useful.

    BTW, on line 21 I had to add an argument to Term::Realine's constructor to make it work here - I just passed $0. According to http://search.cpan.org/~flora/perl-5.14.2/lib/Term/ReadLine.pm the constructor takes a string.

    You also need a newish Text::CSV or you won't have getline_all (I had 1.17 and that did not have it).

    Lastly, I notice the table name is optional and if I have a csv file called xxx.csv only containing something like

    1,fred 2,dave 3,john

    you seem to have to name the table or you get:

    sdb> load xxx.csv DBD::SQLite::db prepare failed: unknown database xxx at /home/martin/b +in/scratch_db.pl line 56, <$fh> line 3. Can't call method "execute" on an undefined value at /home/martin/bin/ +scratch_db.pl line 56, <$fh> line 3.
      BTW, on line 21 I had to add an argument to Term::Realine's constructor to make it work here - I just passed $0. According to http://search.cpan.org/~flora/perl-5.14.2/lib/Term/ReadLine.pm the constructor takes a string.
      Doh !, That doesn't happen with me ! I guess it is because, when available, Term::ReadLine uses Term::ReadLine::Gnu, and this one doesn't require the argument
      if I have a csv file called xxx.csv
      Uh oh, SQLite seems to interpret the dot as database.table, I guess the best option here is to strip the suffix of the file.

      Thanks for the feedback, I updated the script.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://928582]
Approved by Corion
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (1)
As of 2024-04-25 00:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found