Finding a Block of Text in a Larger Block of Text

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have multiple text lines that constitute a block of text such as two examples shown below. The second block of text begins with the 2nd "ADD RULE" statement:

ADD RULE
DESCRIPTION = ABCD0009 XYZ Salary
RULESETNAME = ABCD0009
DESTINATIONNAME = /MAIN/HRP MAIN/PEC/
ADD RULECOMPONENT
VARIABLE = &RPT_IX-Payroll_Type
OPERATOR = Equal
VALUE = *Salary*
CONTAINSWILDCARD = Y
OPENPARENTHESISCOUNT = 1
LOGICALINDICATOR = AND
ADD RULECOMPONENT
VARIABLE = &RPT_IX-Pay_Group
OPERATOR = Equal
VALUE = *875*
CONTAINSWILDCARD = Y
CLOSEPARENTHESISCOUNT = 1
ADD RULE
DESCRIPTION = EFGH0010 LMN Salary
RULESETNAME = EFGH0010
DESTINATIONNAME = /MAIN/HRP MAIN/PEC/
ADD RULECOMPONENT
VARIABLE = &RPT_IX-Payroll_Type
OPERATOR = Equal
VALUE = *Salary*
CONTAINSWILDCARD = Y
OPENPARENTHESISCOUNT = 1
LOGICALINDICATOR = AND
ADD RULECOMPONENT
VARIABLE = &RPT_IX-Pay_Group
OPERATOR = Equal
VALUE = *875*
CONTAINSWILDCARD = Y
CLOSEPARENTHESISCOUNT = 1
[download]

My objective is to take a block of text that constitutes a single "ADD RULE" statement, with all it's associated components, and search a larger file containing literally thousands of similar ADD RULE statements and identify duplicates of two types. 1) An exact duplicate: all components of the ADD RULE statement match exactly, or 2) A functional duplicate: The RULESETNAME, VARIABLE, OPERATOR, and VALUE, are an exact match.

I have written some code that can grab a block of text that constitutes one ADD RULE statement. I can't figure out how to compare that block of text with the larger file and produce a list of exact or functional duplicates.

Any thoughts would be appreciated.

HawgDriver

Comment on Finding a Block of Text in a Larger Block of Text Download Code

Replies are listed 'Best First'.
Re: Finding a Block of Text in a Larger Block of Text by wfsp (Abbot) on Oct 13, 2011 at 16:04 UTC
Have a look at the input record separator in perlvar for a hint on how to read the "larger file" as records.	[reply]
Re: Finding a Block of Text in a Larger Block of Text by kennethk (Abbot) on Oct 13, 2011 at 16:28 UTC
I would suggest breaking this down as follows: Write a parser that transforms a generic ADD RULE statement into a useful data object. This could just be a hash/hashref or an object (see Moose) if you want more functionality. This will allow you to more easily determine if objects have similar content. Read in the full rule set and treat it one rule at a time. This is definitely a TIMTOWTDI moment. I would enable slurp mode (`local $/;`, see $/ in perlvar) and then split the input into blocks and iterate over them (`for (split /\n(?=ADD RULE$)/m) {...}`), stashing the parsed objects into an array for later comparison. You could easily accomplish this a number of different ways. This should give you plenty to get started.	[reply] [d/l] [select]
Re: Finding a Block of Text in a Larger Block of Text by Anonymous Monk on Oct 14, 2011 at 00:42 UTC
Parse::RecDescent.	[reply]
Re: Finding a Block of Text in a Larger Block of Text by pvaldes (Chaplain) on Oct 13, 2011 at 16:21 UTC
1) An exact duplicate `perldoc -f index`	[reply] [d/l]
Re^2: Finding a Block of Text in a Larger Block of Text by kennethk (Abbot) on Oct 13, 2011 at 18:11 UTC
Note this will only work if 'exact duplicate' is order sensitive. If the modifier-order is irrelevant, this comparison with fail since index will check for identical strings.	[reply]


No such thing as a small change
	PerlMonks