Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Finding a Block of Text in a Larger Block of Text

by Anonymous Monk
on Oct 13, 2011 at 15:56 UTC ( #931292=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have multiple text lines that constitute a block of text such as two examples shown below. The second block of text begins with the 2nd "ADD RULE" statement:

ADD RULE DESCRIPTION = ABCD0009 XYZ Salary RULESETNAME = ABCD0009 DESTINATIONNAME = /MAIN/HRP MAIN/PEC/ ADD RULECOMPONENT VARIABLE = &RPT_IX-Payroll_Type OPERATOR = Equal VALUE = *Salary* CONTAINSWILDCARD = Y OPENPARENTHESISCOUNT = 1 LOGICALINDICATOR = AND ADD RULECOMPONENT VARIABLE = &RPT_IX-Pay_Group OPERATOR = Equal VALUE = *875* CONTAINSWILDCARD = Y CLOSEPARENTHESISCOUNT = 1 ADD RULE DESCRIPTION = EFGH0010 LMN Salary RULESETNAME = EFGH0010 DESTINATIONNAME = /MAIN/HRP MAIN/PEC/ ADD RULECOMPONENT VARIABLE = &RPT_IX-Payroll_Type OPERATOR = Equal VALUE = *Salary* CONTAINSWILDCARD = Y OPENPARENTHESISCOUNT = 1 LOGICALINDICATOR = AND ADD RULECOMPONENT VARIABLE = &RPT_IX-Pay_Group OPERATOR = Equal VALUE = *875* CONTAINSWILDCARD = Y CLOSEPARENTHESISCOUNT = 1

My objective is to take a block of text that constitutes a single "ADD RULE" statement, with all it's associated components, and search a larger file containing literally thousands of similar ADD RULE statements and identify duplicates of two types. 1) An exact duplicate: all components of the ADD RULE statement match exactly, or 2) A functional duplicate: The RULESETNAME, VARIABLE, OPERATOR, and VALUE, are an exact match.

I have written some code that can grab a block of text that constitutes one ADD RULE statement. I can't figure out how to compare that block of text with the larger file and produce a list of exact or functional duplicates.

Any thoughts would be appreciated.

HawgDriver

Replies are listed 'Best First'.
Re: Finding a Block of Text in a Larger Block of Text
by wfsp (Abbot) on Oct 13, 2011 at 16:04 UTC
    Have a look at the input record separator in perlvar for a hint on how to read the "larger file" as records.
Re: Finding a Block of Text in a Larger Block of Text
by kennethk (Abbot) on Oct 13, 2011 at 16:28 UTC
    I would suggest breaking this down as follows:
    1. Write a parser that transforms a generic ADD RULE statement into a useful data object. This could just be a hash/hashref or an object (see Moose) if you want more functionality. This will allow you to more easily determine if objects have similar content.
    2. Read in the full rule set and treat it one rule at a time. This is definitely a TIMTOWTDI moment. I would enable slurp mode (local $/;, see $/ in perlvar) and then split the input into blocks and iterate over them (for (split /\n(?=ADD RULE$)/m) {...}), stashing the parsed objects into an array for later comparison. You could easily accomplish this a number of different ways.
    This should give you plenty to get started.
Re: Finding a Block of Text in a Larger Block of Text
by Anonymous Monk on Oct 14, 2011 at 00:42 UTC
Re: Finding a Block of Text in a Larger Block of Text
by pvaldes (Chaplain) on Oct 13, 2011 at 16:21 UTC

    1) An exact duplicate

    perldoc -f index
      Note this will only work if 'exact duplicate' is order sensitive. If the modifier-order is irrelevant, this comparison with fail since index will check for identical strings.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://931292]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2022-06-30 10:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (97 votes). Check out past polls.

    Notices?