Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

DBD-SQLite Regexp

by welle (Beadle)
on Nov 26, 2012 at 17:41 UTC ( [id://1005725]=perlquestion: print w/replies, xml ) Need Help??

welle has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I am trying to extract from a column of a SQLITE-DB (containing text sentences) sentences that contain a string ($query). I normally use:

$results = $dbh->selectall_arrayref("SELECT ID, sentence FROM texts +WHERE sentence LIKE '%$query%'"

which is working pretty good (every sentence containing $query somewhere is extracted). What I'd like to implement is a matching only for entire words. I thought the following would do the game (note the white spaces between and after $query):

$results = $dbh->selectall_arrayref("SELECT ID, sentence FROM texts +WHERE sentence LIKE ' %$query% '" <\code> <p> Of course it doesn't work as I expected: <code> $query="tag" sentence1= "Put tags around your paragraphs" sentence2= "Put a tag around your paragraphs" sentence3= "I am tagging your paragraphs" sentence4= "There are many advantages"

I get rid only of sentence4 (no white spaces left and right of "tag"

What I am doing wrong?"

Replies are listed 'Best First'.
Re: DBD-SQLite Regexp
by kennethk (Abbot) on Nov 26, 2012 at 18:08 UTC
    Assuming you only want sentence1 returned, you need to put your whitespace inside your wild card character, like
    SELECT ID, sentence FROM texts WHERE sentence LIKE '% $query %'
    That way, you get your literal whitespace on either side of your tag. This suffers from limitation that it will miss for leading or trailing words and is unlikely to behave the way you intend around punctuation. You could do better with:
    SELECT ID, sentence FROM texts WHERE sentence LIKE '% $query %' OR sentence LIKE '$query %' OR sentence LIKE '% $query' OR sentence = '$query'
    But this still doesn't do a great job. If I were to do this, I would just do the simple query
    SELECT ID, sentence FROM texts WHERE sentence LIKE '%$query%'
    and then filter the results against an appropriate regular expression, like /\b\Q$query\E\b/.

    As a side note, you should consider using Placeholders_and_Bind_Values (see DBI) for database access. It protects you from all sorts of security exploits and handles messy escaping issues no trouble. I would actually do the above as

    my $sth = $dbh->prepare(<<'EOSQL'); SELECT ID, sentence FROM texts WHERE sentence LIKE ? ESCAPE ? EOSQL $sth->execute("%$query%", '\\'); my $results = []; while (my @row = $sth->fetchrow_array) { push @$results, \@row if $row[1] =~ /\b\Q$query\E\b/; }

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: DBD-SQLite Regexp
by clueless newbie (Curate) on Nov 26, 2012 at 18:42 UTC

    DBD::SQLite has a wonderful $dbh->sqlite_create_function( $name, $argc, $code_ref ) which allows you to write perl functions and call them from SQLite. It also has a pre-declared but not defined REGEXP. You can use the sqlite_create_function to define this REGEXP.

Re: DBD-SQLite Regexp
by snoopy (Curate) on Nov 26, 2012 at 20:07 UTC
    Another option is to use the PCRE (Perl Compatable Regular Expressions) loadable module for SQLite for familar pattern matching via the REGEXP function.
    #!/usr/bin/perl use warnings; use strict; use 5.10.1; use DBI; my $dbh = DBI->connect("dbi:SQLite:dbname=/tmp/text.db", "", "", {Rais +eError => 1}); # so we can use SQLite REGEXP operator $dbh->sqlite_enable_load_extension(my $_enabled = 1); $dbh->prepare("SELECT load_extension('/usr/lib/sqlite3/pcre.so')"); my $word = 'tag'; my $word_re = '\b'.quotemeta($word).'\b'; my $results = $dbh->selectall_arrayref("SELECT ID, sentence FROM tex +ts WHERE sentence REGEXP ?", {}, $word_re); foreach my $result (@$results) { my ($id, $sentence) = @$result; say "matched $id: $sentence"; }
    Depending on your system it may be available as a package (Ubuntu has sqlite3-pcre).

    It's also available from github https://github.com/oojah/sqlite3-pcre where you can build and install it yourself.

Re: DBD-SQLite Regexp
by Anonymous Monk on Nov 26, 2012 at 23:08 UTC
    ... and then again, you just might be satisfied to get close. Give an "unadorned" SQLite enough information to reduce the number of returned rows to a reasonable subset that does not omit what you must include, then use Perl regexes to filter further from there. If you know that the whole shebang is happening on the current machine, it just might be a wash.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1005725]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-19 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found