We don't bite newbies here... much | |
PerlMonks |
Re: [OT] Finding similar program codeby erix (Prior) |
on Aug 27, 2021 at 15:34 UTC ( [id://11136135]=note: print w/replies, xml ) | Need Help?? |
n-gram comparison may be helpful. You could keep data as external files via a foreign table (via file_fdw). Or read them into a regular table: tens of thousands of lines doesn't sound too large: you could slurp all code into a postgres table (a line a record) and use the n-gram comparison machinery (see module pg_trgm [1] in the fine manual). That module works with trigrams and it gives (amongst others) a 'similarity' function that might be useful, for instance comparing similarity of the lines that you already identified and have 'extracted', to all others, hopefully finding the still 'hidden' ones. (there's even n-gram indexing (i.e. fast search) although that seems not really necessary) (postgres also has a module called 'fuzzystrmatch' [2] which contains several string comparison functions, for instance Levenshtein. But I've always had more luck with the n-gram stuff.) [1] pg_trgm module - postgresql manual [2] fuzzystrmatch module - postgresql manual Edit: A different/similar example with postgres n-gram comparison: Re: String Comparison & Equivalence Challenge
In Section
Seekers of Perl Wisdom
|
|