Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^5: String Comparison & Equivalence Challenge

by erix (Prior)
on Mar 14, 2021 at 11:11 UTC ( #11129610=note: print w/replies, xml ) Need Help??

in reply to Re^4: String Comparison & Equivalence Challenge
in thread String Comparison & Equivalence Challenge

That ngram looks to be a similar thing (although details will differ), but it's for mysql, not mariaDB. I don't know if it's available for your MariaDB. (I won't be able to help you with it, but perhaps some other monk will step up)

In any case, it seems to me you cannot easily compare everything with everything - it would amount to around a billion comparisons, no? (as you said 31000 square, minus a few). So that's hardly feasible whichever route you take. You need a reduced plan (I think...).

As for the 'indexing': pg_trgm of postgres (and mysql ngram as well, I imagine), consists of converting the words of text into triples of characters (trigrams, or, in the case of ngrams, maybe some other number than n=3), and then comparing the sets of such triples that resulted from each line/verse/record. You can do that without index, on the fly, or with an index, where all the triples are stored beforehand for later use. Of course, it generates large index files (but with this smallish table of 31000 records that's still ok).

  • Comment on Re^5: String Comparison & Equivalence Challenge

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11129610]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (1)
As of 2022-05-21 22:58 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (78 votes). Check out past polls.