Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hi Yulivee,

It depends on the nature of the duplication.

Do equally named subs have identical code?

Cut&paste programing involves mutations.

General approach for refactoring

a) identify all sub definitions in a file

Possible Tools

b) identify their dependencies
  • where are they called
  • which subs do they call
  • which global or closure variables do they use
c) normalize sub code

Formatting can differ

d) diff potentially equal subs to measure similiarity

What "potentially" means depends on the quality your code.

probably changes happened to

  • sub name
  • local variable names
  • ...

e) try to visualize dependencies to decide where best to start

like with grapviz or a tree structure

f) create a test suite to assure refactoring quality
(The code might also show good inspection techniques)

g) start refactoring incrementally, while constantly testing the out come

depending on the quality of your tests you might first start with only one demon in production.

h) care about a fall back scenario

Especially use version control!


Sorry, very general tips, because it really depends on the structure of your legacy code. Probably grep is already enough...

(Think about it, you might also need "nested refactoring" because new modules still have duplicated code and need using other modules and so on)


I did some googling yesterday after our conversation for "refactoring" and "duplication" and the term "plagiarism detection" popped up.

like in these discussions:

Couldn't find a general refactoring project for Perl, but also didn't spend much time yet.

I think to cover all edge cases of a worst case scenario one certainly would need the use of PPI ( at least) or even a patched B::Deparse to scan the Op-Tree with PadWalker to identify variable dependencies and side effects.

HTH! :)

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

In reply to Re: Searching for duplication in legacy code (refactoring strategy) by LanX
in thread Searching for duplication in legacy code by yulivee07

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2023-03-20 13:40 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (59 votes). Check out past polls.