Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Re^2: general advice finding duplicate codeby Anonymous Monk |
on Jun 21, 2011 at 06:49 UTC ( [id://910698]=note: print w/replies, xml ) | Need Help?? |
looks like will only identify duplicated but individual lines of code across the scripts
Every approach is this approach :) its like a search engine You iterate over you files, and you index each file To index, you pick a unit (ex one word, or three adjacent lines of code) Generate a list of all units for a file Normalize each unit. For words you would stem (remove prefix/suffix..) to find the root, for lines you would remove insignificant whitespace, insignificant commas... normalize quoting characters... Hash each unit (sha1), and associate all this in a database Then, to find duplication, query the database to find duplicate hashes This is not unlike what git (git gc) does, so I wouldn't be surprised if git provides provided a tool to help you visualize these duplications, although I don't know of one It goes without saying before making code changes, you need a comprehensive test suite :)
In Section
Seekers of Perl Wisdom
|
|