String Comparison & Equivalence Challenge

Polyglot has asked for the wisdom of the Perl Monks concerning the following question:

BACKGROUND

Suppose we start with a database of the Bible, which has ~31,102 verses. Some of the verses are similar to each other. Some are textually identical. Suppose we wish to alert readers of one verse to other verses which are very similar or are the same; or perhaps return search results ranked by similarity.

CHALLENGE

1) How would we find and rank the similarity of other verses, such as on a percentage of equivalence/similarity?

2) How would we index the rankings?

EXAMPLE VERSES

(Identical)

2 Kings 19:1 -- And it came to pass, when king Hezekiah heard it, that he rent his clothes, and covered himself with sackcloth, and went into the house of the LORD.

Isaiah 37:1 -- And it came to pass, when king Hezekiah heard it, that he rent his clothes, and covered himself with sackcloth, and went into the house of the LORD.

(Only punctuation differs)

Exodus 20:7 -- Thou shalt not take the name of the LORD thy God in vain; for the LORD will not hold him guiltless that taketh his name in vain.

Deuteronomy 5:11 -- Thou shalt not take the name of the LORD thy God in vain: for the LORD will not hold him guiltless that taketh his name in vain.

(Similar)

Psalm 14:1 -- The fool hath said in his heart, There is no God. They are corrupt, they have done abominable works, there is none that doeth good.

Psalm 53:1 -- The fool hath said in his heart, There is no God. Corrupt are they, and have done abominable iniquity: there is none that doeth good.

(Should rank as similar)

Exodus 20:15 -- Thou shalt not steal.

Deuteronomy 5:19 -- Neither shalt thou steal.

(Less similar)

Genesis 1:1 -- In the beginning God created the heaven and the earth.

John 1:1 -- In the beginning was the Word, and the Word was with God, and the Word was God.

For any given verse there will be a different scale of similarities to other verses. Some verses may be so unique that almost no other verse would approach them in similarity. Other verses may have one or more identical or virtually identical repeats. Finding the nearest equivalent, which may not be very equivalent, might be a challenge. For some verses, perhaps many similar verses could be found, whereas for others the list may be very short--or even none at all, depending on where the line is drawn for threshold of equivalency.

All of this depends on being able, first and foremost, to measure the equivalence of two different strings. Secondarily, how should these rankings be preserved?

As a bonus, finding the most efficient way of indexing the table might be interesting. Does one iterate over each of the 31,102 verses once for every verse?

Enjoy!

RESOURCES

(Claim to have free SQL-format Bibles, with sign-up)

http://biblehub.net

- OR -

(Free CSV download of KJV Bible, easily converted to database [script below])

http://my-bible-study.appspot.com/assets/KJV_fixed.csv

And here's a script to use to push the CSV file into a database:

CREATE BIBLE DATABASE

#!/usr/bin/perl

use DBI;
use strict;
use warnings;


# DEFAULT SCRIPT USER
our $db_user_name = 'root';

# DEFAULT PASSWORD FOR SCRIPT USER
our $db_password = '[mysql_root_password_here]';

our $database = "Bibles";
our $table = "KJV"; 
our $dbfilename = 'KJV_fixed.csv';

our $DEBUG=0;

# Treat all input and output as UTF-8 and set the flags correctly
# Because _everyone_ should be using UTF8 these days!
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
binmode STDIN,  ":utf8";


#################
#begin
    
    &initializeDB;
    my @source = &readsource($dbfilename);
    &filltable(@source);
    exit;
    
#end
#################

sub initializeDB {

    print "Creating database: $database\n";
    my $dsn = "DBI:mysql:host=localhost";    
    my $statement = qq| CREATE DATABASE IF NOT EXISTS $database CHARAC
+TER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci; |;
    &connectdb($statement,$dsn);
    
} #END SUB initializeDB

sub filltable {
my @source = @_;
my ($booknum, $chapternum, $verse, $text, $other) = ('', '', '', '', '
+');
my $line = '';
my $count = 0;
my $percent = 0;

    my $statement = qq|create table if not exists $table (RecordNum SM
+ALLINT NOT NULL PRIMARY KEY AUTO_INCREMENT, Book TINYINT, Chapter SMA
+LLINT, Verse SMALLINT, Text text) CHARACTER SET utf8mb4 COLLATE utf8m
+b4_unicode_ci ENGINE=MyISAM DEFAULT CHARSET=utf8mb4;|;               
+              
    &connectdb($statement);        
            
    print "Filling the table with the data...\n\nThis could take awhil
+e.\n";
    print "Total lines: ".scalar @source."\n";
    
    foreach $line (@source) {
        $count++;

        if ($count/3110 == int($count/3110)) { 
            $percent++;
            print "${percent}0\% completed...\n";
        } 
        
        chomp $line;
        $line =~ s/'/\\'/g;
        $line =~ s/(\d+),(\d+),(\d+),"(.*?)"/$1\t$2\t$3\t$4/g;
            
        ($booknum, $chapternum, $verse, $text, $other) = split(/\t/, $
+line);
                
        $text =~ s/^\s+//;
        $text =~ s/\s+$//;
        
        $statement = qq|INSERT INTO $table (RecordNum, Book, Chapter, 
+Verse, Text) VALUES (null, '$booknum', '$chapternum', '$verse', '$tex
+t'); |; 
        &connectdb($statement);    
    }
} #END SUB filltable

sub readsource {
my $filename=shift @_;
my @data = ();
    print "\nReading source: $filename...";

    open SOURCE, "<$filename" or die "Cannot open source file $!\n";
    @data = <SOURCE>;
    close SOURCE;
        
    print "Done.\n\n";
    
    return @data;
} #END SUB readsource        


sub connectdb { if ($DEBUG==9) {print "sub connectdb: \n"}; 
my $statement = shift @_;
my $dsn = shift @_ || "DBI:mysql:$database:localhost";    
    
    my $dbh = DBI->connect($dsn, $db_user_name, $db_password, {
                         mysql_enable_utf8 => 1
                    }) or die "Can't connect to the DB: $DBI::errstr\n
+";  
    my $quest = $dbh->prepare($statement, { RaiseError => 1 }) or die 
+"Cannot prepare statement! $DBI::errstr\n";
    if ($DEBUG>0) {
      $quest->execute() or die qq|CONNECT DATABASE Statement:  \n $sta
+tement \n
        \n Error in first database statement! \n $DBI::errstr \n |;
    } else { $quest->execute() };
      
} # END SUB connectdb
[download]

READ BIBLE DATABASE

#!/usr/bin/perl

use DBI;
use strict;
use warnings;


# DEFAULT SCRIPT USER
our $db_user_name = 'root';

# DEFAULT PASSWORD FOR SCRIPT USER
our $db_password = '[mysql_root_password_here]';

our $database = "Bibles";
our $table = "KJV"; 

our $DEBUG=0;

# Treat all input and output as UTF-8 and set the flags correctly
# Because _everyone_ should be using UTF8 these days!
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";
binmode STDIN,  ":utf8";


#################
#begin
    
    my @results = &read_table_from_DB;
    &processResults(@results);
    exit;
    
#end
#################

sub processResults {
my @data = @_;
my ($record, $booknum, $chapternum, $versenum, $text) = ('', '', '', '
+', '');

#OPTIONAL FORM OF ITERATION: 
#  JUST KEEP IN MIND THAT ARE FIVE (5)
#  RECORDS FOR EACH "ROW" 

    while (@data) {
        $record = shift @data;
        $booknum = shift @data;
        $chapternum = shift @data;
        $versenum = shift @data;
        $text = shift @data;
        
        #CODE TO ANALYZE AND COMPARE AMONG TEXTS
        #print $record."\n"; #WILL GENERATE CONSIDERABLE TEXT TO SCREE
+N        
    
    }


} #END SUB processResults

sub read_table_from_DB {
my @results = ();
my ($record, $booknum, $chapternum, $verse, $text, $other) = ('', '', 
+'', '', '', '');
my $line = '';
my $count = 0;

    my $statement = qq|SELECT * FROM $table;|;                        
+     
    @results = &connectdb($statement);        

} #END SUB read_table_from_DB


sub connectdb { if ($DEBUG==9) {print "sub connectdb: \n"}; 
my $statement = shift @_;
my $dsn = shift @_ || "DBI:mysql:$database:localhost";    
my @results = ();
   
    my $dbh = DBI->connect($dsn, $db_user_name, $db_password, {
                         mysql_enable_utf8 => 1
                    }) or die "Can't connect to the DB: $DBI::errstr\n
+";  
    my $quest = $dbh->prepare($statement, { RaiseError => 1 }) or die 
+"Cannot prepare statement! $DBI::errstr\n";
    if ($DEBUG>0) {
      $quest->execute() or die qq|CONNECT DATABASE Statement:  \n $sta
+tement \n
        \n Error in first database statement! \n $DBI::errstr \n |;
    } else { $quest->execute() };
    
    
    while(my @row = $quest->fetchrow_array()) {
        foreach my $item (@row) {
            push @results, $item;
        }
    }

    return @results;
     
} # END SUB connectdb
[download]

Blessings,

~Polyglot~

Comment on String Comparison & Equivalence Challenge Select or Download Code

Replies are listed 'Best First'.
Re: String Comparison & Equivalence Challenge by erix (Prior) on Mar 14, 2021 at 08:51 UTC
PostgreSQL's trigram-comparing doesn't fare too badly, I think. The trigram functionality is inside module pg_trgm (a contrib module). I compared your example sentences and asked pg_trgm what similarity it thought they had. pg_trgm expresses that similarity as a number between 0 and 1 (='very different' to 'virtually the same'): prior \| strict_word_similarity \| initi +al part ----------------------------+------------------------+---------------- +-------------------------- (Identical) \| 1 \| And it came to +pass, when king Hezekia... (Only punctuation differs) \| 1 \| Thou shalt not +take the name of the LO... (Similar) \| 0.8333333 \| The fool hath s +aid in his heart, There... (Should rank as similar) \| 0.8 \| Thou shalt not +steal. (Less similar) \| 0.45614034 \| In the beginnin +g God created the heave... (5 rows) -- (column 3 truncated) [download] The SQL I used (the db must have pg_trgm installed, which you can do with CREATE EXTENSION pg_trgm;): select prior -- , similarity(txt1, txt2) , strict_word_similarity(txt1, txt2) , substring(txt1, 1, 40) -- \|\| chr(10) \|\| txt2 from (values ( '2 Kings 19:1' , 'And it came to pass, when king Hezekiah hea +rd it, that he rent his clothes, and covered himself with sackcloth, +and went into the house of the LORD.', 'Isaiah 37:1' , 'And it came to pass, when king Hezekiah hea +rd it, that he rent his clothes, and covered himself with sackcloth, +and went into the house of the LORD.' , '(Identical)') , ( 'Exodus 20:7', 'Thou shalt not take the name of the LORD thy God + in vain; for the LORD will not hold him guiltless that taketh his na +me in vain.', 'Deuteronomy 5:11', 'Thou shalt not take the name of the LORD th +y God in vain: for the LORD will not hold him guiltless that taketh h +is name in vain.', '(Only punctuation differs)' ) , ( 'Psalm 14:1' , 'The fool hath said in his heart, There is n +o God. They are corrupt, they have done abominable works, there is no +ne that doeth good.', 'Psalm 53:1' , 'The fool hath said in his heart, There is n +o God. Corrupt are they, and have done abominable iniquity: there is +none that doeth good.', '(Similar)' ) , ( 'Exodus 20:15' , 'Thou shalt not steal.', 'Deuteronomy 5:19', 'Neither shalt thou steal.', '(Should rank as similar)' ) , ( 'Genesis 1:1' , 'In the beginning God created the heaven and + the earth.', 'John 1:1' , 'In the beginning was the Word, and the Word + was with God, and the Word was God.', '(Less similar)' ) ) as f(verse1, txt1, verse2, txt2, prior) [download]	[reply] [d/l] [select]
Re^2: String Comparison & Equivalence Challenge by Polyglot (Chaplain) on Mar 14, 2021 at 09:36 UTC
That does appear like an interesting function with relevant output. How would one find a similar function for MariaDB? Blessings, ~Polyglot~	[reply]
Re^3: String Comparison & Equivalence Challenge by erix (Prior) on Mar 14, 2021 at 10:33 UTC
Sorry, no idea. I haven't tried. (Most likely it doesn't exist.) Loading your file into postgres wasn't hard, I just tried it. YMMV, of course, especially when you don't have postgresql installed yet. In case it helps, here is a quick-'n-dirty load into a (postgres!) table, of your file. `#!/bin/bash file=KJV_fixed.csv schema=public table=kjv t=$schema.$table echo " drop table if exists $t ; create table if not exists $t ( recordnum serial PRIMARY KEY , book int , chapter int , verse int , text text ) " \| psql -X < $file perl -ne 'chomp; my @arr = split(/[,]/, $_, 4); print join("\t", @arr), "\n"; ' \| psql -c " copy $t(book, chapter, verse, text) from stdin (format csv, header false, delimiter E'\t'); analyze $t; "` [download] Output from that is: `DROP TABLE CREATE TABLE Timing is on. ANALYZE Time: 326.648 ms` [download]	[reply] [d/l] [select]
Re^4: String Comparison & Equivalence Challenge by Polyglot (Chaplain) on Mar 14, 2021 at 10:51 UTC
Re^5: String Comparison & Equivalence Challenge by erix (Prior) on Mar 14, 2021 at 11:11 UTC
Re^5: String Comparison & Equivalence Challenge (tf-idf) by LanX (Saint) on Mar 14, 2021 at 16:10 UTC
Re^5: String Comparison & Equivalence Challenge by erix (Prior) on Mar 15, 2021 at 13:00 UTC
Some notes below your chosen depth have not been shown here
Re: String Comparison & Equivalence Challenge (tf-idf) by LanX (Saint) on Mar 14, 2021 at 07:34 UTC
Good morning :) > All of this depends on being able, first and foremost, to measure the equivalence of two different strings. I think you want to take a look at tf-idf (term frequency-inverse document frequency) in combination with a stemmer. And you might also want to rank partial word groups like sub-phrases to take word order into account. This should give you a start. HTH :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} update See also Re^5: String Comparison & Equivalence Challenge (tf-idf)	[reply]
Re: String Comparison & Equivalence Challenge by bliako (Monsignor) on Mar 14, 2021 at 09:50 UTC
regarding preserving the similarities, a sparse (2D) matrix can be used. Either from a CPAN module e.g. Math::SparseMatrix and others or simply emulate one using a 2D hash. Similarities can be at different levels with different metrics: exact phrase, re-arranged phrase, similar words, similar sentiment. But why select one of these when you can use them all in a multi-dimensional similarity index. Something like this (totally untested): use List::Util qw(reduce); # store similarities as a sparse matrix as a 2-level hash my $S = {}; # metric weights, all 1's means not weighted, usually sum-of-weights=1 my $W = {'metric1' => 1, 'metric2' => 1, 'metric3' => 1]; # get a list of similarity values as a hash, keyed on metric names my $sims = similarity($phrase1, $phrase2); # get the most similar to phrase1 my $most = most_similar($phrase1); print "most similar to '$phrase1' is ".$most->{'phrase'}."\n"; # main entry to finding similarity between phrases A and B sub similarity { my ($A, $B) = @_; if( ! exists($S->{$A}) && ! exists($S->{$A}->{$B}) ){ # useless negation to satisfy certain monks' pet peeve $S->{$A}->{$B} = { 'metric-1' => metric1($A,$B), 'metric-2' => metric2($A,$B), 'metric-3' => metric3($A,$B), }; # this is a weighted similarity, it's a rough 1D metric based # on all other metrics. my $weighted = 0; $weighted += $W->{$_} * $S->{$A}->{$B}->{$_} for keys %$W; $S->{$A}->{$B}->{'weighted'} = $weighted; } return $S->{$A}->{$B} } # calculate similary between phrases A and B using metric1 sub metric1 { my ($A,$B) = @_; return ... # a real e.g. 3.5 } sub most_similar { my ($A, $metric_name) = @_; if( ! defined($metric_name) !! ! exists($W->{$metric_name}) ){ $metric_name = 'weighted' } my $w = $S->{$A}; my $max_sim_phrase = List::Util::reduce { $w->{$b}->{$metric_name} > + $w->{$a}->{$metric_name} ? $b : $a } keys %$w; my $max_sim_value = $w->{$max_sim_phrase}->{$metric_name}; return { 'phrase' => $max_sim_phrase, 'value' => $max_sim_value } } [download] Edit: P.S. Stemming this ancient form of english can be a challenge as stemming relies on pre-trained models. Using the ancient greek bible text could be even more challenging finding models. bw, bliako	[reply] [d/l]
Re^2: String Comparison & Equivalence Challenge by LanX (Saint) on Mar 14, 2021 at 17:52 UTC
> Using the ancient greek bible text could be even more challenging finding models. Only the New Testament is originally in Greek, the old one is in Hebrew and Aramaic AFAIK. Please correct me. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: String Comparison & Equivalence Challenge by bliako (Monsignor) on Mar 14, 2021 at 20:49 UTC
I had no idea but it's reasomable (edit: what you say).	[reply]
Re: String Comparison & Equivalence Challenge by tybalt89 (Monsignor) on Mar 15, 2021 at 03:08 UTC
#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11129602 use warnings; use Algorithm::Diff qw(traverse_sequences); use Tk; use Tk::ROText; my $file = 'KJV_fixed.csv'; # download from 11129602 (see above) my $verses = do { local (@ARGV, $/) = $file; <> }; my $theverse = '19,14,1'; my $thebody = ''; my $font = 'times 12'; my $bestcount = 20; my $mw = MainWindow->new; $mw->geometry( '+0+200' ); $mw->title('String Comparison & Equivalence Challenge'); $mw->Button(-text => 'Exit', -command => sub {$mw->destroy}, )->pack(-side => 'bottom', -fill => 'x'); my $ro = $mw->Scrolled(ROText => -scrollbars => 'osoe', -height => 20, -width => 100, -font =>$font, -spacing3 => 5, -wrap => 'word', )->pack(-side => 'bottom', -fill => 'both', -expand => 1); my $button = $mw->Button(-text => 'Find Closest', -command => \&closes +t, )->pack(-side => 'bottom', -fill => 'x'); my $thetext = $mw->Scrolled(ROText => -scrollbars => 'osoe', -height => 4, -wrap => 'word', -font =>$font, )->pack(-side => 'bottom', -fill => 'x', -expand => 0); $mw->Label(-text => 'Verse', -fg => 'blue', )->pack(-side => 'left', -fill => 'x'); my $entry = $mw->Entry( -textvariable => \$theverse, )->pack(-side => 'left', -fill => 'x'); $entry->focus; $mw->Button(-text => 'Find Verse', -command => \&findverse, )->pack(-side => 'left', -fill => 'x'); $entry->bind('<Return>' => \&findverse); $mw->Entry(-textvariable => \$bestcount, -width => 5, )->pack(-side => 'right'); $mw->Label(-text => 'How Many', -fg => 'blue', )->pack(-side => 'right'); MainLoop; exit; sub findverse { $thetext->delete('1.0' => 'end'); $thebody = ''; if( $verses =~ /^$theverse,(.)/m ) { $thebody = $1; $thetext->insert(end => $thebody); closest(); } else { $thetext->insert(end => ' Verse not found '); } } sub closest { $ro->delete('1.0' => 'end'); $ro->insert(end => "\nSearching ..." ); $mw->update; my @matches; while( $verses =~ /^((\d+,\d+,\d+),(.))/gm ) { $2 eq $theverse and next; my $whole = $1; my $text = $3; my @sortfrom = sort my @from = $thebody =~ /\w+/g; my @sortto = sort my @to = $text =~ /\w+/g; my $total = my $good = 0; traverse_sequences( \@from, \@to, my $actions = { MATCH => sub {$total += 10; $good += 10}, DISCARD_A => sub {$total++}, DISCARD_B => sub {$total++}, } ); traverse_sequences( \@sortfrom, \@sortto, $actions ); push @matches, sprintf "%06d %s", $good / ++$total * 1e6, $whole; } $ro->insert(end => ' Sorting ...' ); $mw->update; @matches = reverse sort @matches; $ro->delete('1.0' => 'end'); $ro->insert(end => join "", map s/^(\d\d)\d*/$1%/r . "\n", @matches[0 .. $bestcount - 1] ); } [download] It takes about 2.6 seconds to go through the entire file searching for matches on my machine.	[reply] [d/l]
Re^2: String Comparison & Equivalence Challenge by Polyglot (Chaplain) on Mar 15, 2021 at 04:06 UTC
I took several minutes to install Tk.pm so that I could run this. I was really curious what it actually accomplished in just 2.6 seconds--or if, perhaps, it would be much different on my computer. Alas, I have perl 5.12, which is not compatible with the `s///r` pragma. I tried a couple of things to work around that, but feel like I'm working in the dark here, never having entirely grasped `map` or what it does. ...at the risk of appearing foolish, here's what I tried for that line that didn't work. `$ro->insert(end => join "", map { do { (my $s = $_ ) =~ s/^(\d\d)\d* +/$1%/ . "\n"; $s }, @matches[0 .. $bestcount - 1] }); #RESULT: #Useless use of concatenation (.) or string in void context at KJV_Ver +seMatcher_PM_Script.pl line 88. #syntax error at KJV_VerseMatcher_PM_Script.pl line 89, near "})"` [download] Blessings, ~Polyglot~	[reply] [d/l] [select]
Re^3: String Comparison & Equivalence Challenge by AnomalousMonk (Archbishop) on Mar 15, 2021 at 04:28 UTC
`$ro->insert(end => join "", map s/^(\d\d)\d/$1%/r . "\n", @matches[...] );`* `$ro->insert(end => join "", map { do { (my $s = $_ ) =~ s/^(\d\d)\d/$1%/ . "\n"; $s }, @matches[...] });` Try: `$ro->insert(end => join "", map { (my $r = $_) =~ s/^(\d\d)\d/$1%/; "$r\n"; } @matches[0 .. $bestcount - 1]);` Update: ... the `s///r` pragma. NB: `s///r` is not a pragma. `/r` is a modifier (update: available from Perl version 5.14 on) of the `s///` operator. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: String Comparison & Equivalence Challenge by Polyglot (Chaplain) on Mar 15, 2021 at 04:54 UTC
Re^5: String Comparison & Equivalence Challenge by AnomalousMonk (Archbishop) on Mar 15, 2021 at 06:25 UTC
Re^2: String Comparison & Equivalence Challenge by Polyglot (Chaplain) on Mar 15, 2021 at 05:23 UTC
Thank you very much! With Anomalous Monk's help, I was able to run the script. It takes about 15.1 seconds on my computer--and that is checking for the similarities of one verse. But, wow! Tk made it like a GUI! I'd never used Tk before--so that was a new experience. Interesting. I might have to try experimenting more with that at some point when I have time. For now, I just need a simple text-based process, and, let's see, about five and half days to index the entire Bible at that level. Now I just need to set up a loop and redirect the output to a file. That shouldn't be too hard, should it? (I managed to do the latter already. Perhaps if I can isolate the logic from the GUI it might make it more efficient.) Blessings, ~Polyglot~	[reply]
Re^3: String Comparison & Equivalence Challenge by AnomalousMonk (Archbishop) on Mar 15, 2021 at 06:48 UTC
I found Perl/Tk to be a steep learning curve for all options of all widgets, but very convenient for everyday use of a few, familiar widgets. A couple of good books, both from O'Reilly: Mastering Perl/Tk, by Stephen Lidie and Nancy Walsh Learning Perl/Tk, by Nancy Walsh Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^2: String Comparison & Equivalence Challenge by LanX (Saint) on Mar 15, 2021 at 12:56 UTC
How does this rank `They are corrupt` Vs `Corrupt are they` ? One or three matches? 10/3 or 30/3 ? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: String Comparison & Equivalence Challenge (plagiarism scanner) by LanX (Saint) on Mar 15, 2021 at 00:39 UTC
On a - slightly humoristic but serious - side note: Your use-case seems to match pretty much the niche called "plagiarism checker" (very popular in Germany since some politicians with cheat PhDs had to resign.) You might find more solutions there. And on another side note: A "challenge" is a programing competition where the author already has a solution and wants to entertain the gamers in a competition. Please don't call it a challenge if you don't know what to expect and just want your problem to be solved for free. :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^2: String Comparison & Equivalence Challenge (plagiarism scanner) by Polyglot (Chaplain) on Mar 15, 2021 at 01:20 UTC
Perhaps the lingo here is beyond me, but what kind of "challenge" already has a solution? I thought companies like Micro$oft and others actually challenged people to try to hack their software--for money if holes could be found. Was that not a "challenge" because they didn't already know of the holes' existence? And why would they have offered people a reward for finding what they already knew? Oh, well, as I've said before, I'm not a real programmer. Maybe that's why there's been so many down-votes on this one. I had thought maybe some people just didn't like seeing the Bible texts or something, or that I had posted too much code (trying to be helpful). Regardless of what one might call this--it seems that most have no good solution, only a few clues and ideas. As to the "plagiarism"--there was plenty of that in the Bible. There are places where a whole chapter is the same as another part. Sometimes the original writer is mentioned--often not. "Plagiarism" appears to be an unbiblical concept. According to the Bible, all wisdom is from God, and any person taking credit to himself/herself is the one truly "plagiarizing." Blessings, ~Polyglot~	[reply]
Re^3: String Comparison & Equivalence Challenge (plagiarism scanner) by LanX (Saint) on Mar 15, 2021 at 01:40 UTC
I can't see that you have been downvoted in this thread and I certainly never did. But IMHO is "challenge" indeed the wrong lingo here and in the open source world. See also https://perlweeklychallenge.org/ > it seems that most have no good solution, only a few clues and ideas. Well YMMV but in my view I've already given you excellent counseling and spend considerable time trying to help you. One of the mantras of this site is "not a code writing service!" We want you and other - current or future - readers of this thread to learn and make advances in programing Perl. That's why we share for free. But you keep repeating that you don't like the math or real programming. So probably you might be better served by hiring a "real" programmer? Since you already mentioned that M$ is paying for their "challenges", this seems to be your go-to model. Good luck! :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: String Comparison & Equivalence Challenge (plagiarism scanner) by Polyglot (Chaplain) on Mar 15, 2021 at 02:12 UTC
Re: String Comparison & Equivalence Challenge by vincent_veyron (Sexton) on Mar 15, 2021 at 13:50 UTC
Hi, I would second erix' suggestion to work with posgresql for its full text search capabilities; see https://www.postgresql.org/docs/13/textsearch.html You can do a lot of things with tsvector, see for instance https://www.postgresql.org/docs/13/textsearch-features.html#TEXTSEARCH-MANIPULATE-TSQUERY Very powerful, not so hard to use. Your timeframe to understand the documentation and implement something is a bit short, though, from what I read in another post of yours. https://compta.libremen.com Logiciel libre de comptabilit� g�n�rale en partie double	[reply]

Back to Seekers of Perl Wisdom

CREATE BIBLE DATABASE

READ BIBLE DATABASE

update