Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^2: Best method to eliminate substrings from array

by AnomalousMonk (Archbishop)
on Jun 28, 2019 at 18:28 UTC ( [id://11102090]=note: print w/replies, xml ) Need Help??

in reply to Re: Best method to eliminate substrings from array
in thread Best method to eliminate substrings from array

Some comments:

  • In your code here, an input part number group item is treated as a subset of another group only if it is anchored at the left end of the larger group. E.g., the items  7K3377|3H5788 8W1152 4P0489|2757803 added to the list of test input data will not be excluded from output, but, of course,  2N0472|6N8595 2N0472 will be.

    In the OPed code, the if-block
        if ($strChain ne $_ && index($_, $strChain) >= 0) { $found = true;  last; }
    implies that a part number group is a subset if it is found anywhere (per the  >= comparison) in the larger group (and is not identical to the larger group).

  • Additionally, the OPed code implies that duplicated items in the input appear unchanged in the output (if they are not part of any larger group), e.g.,  123 ... 123 in the input would appear as  123 ... 123 in the output. In your code, these items would be made unique.
  • Also, the OPed code would produce output in the same order as the input items (less subsets), although this implied requirement seems less imperative than the others. Because it's taken directly from a hash, your code will produce output in random order.

Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^3: Best method to eliminate substrings from array
by Paladin (Vicar) on Jun 28, 2019 at 21:39 UTC

    Here OP says that the list of part numbers are to be treated as sets/subsets, so while the original code matches sub-strings, OP says later that is incorrect. My code treats the long strings as ordered sets, which seems to be what the OP wanted. If the OP really wants to treat the list of parts as a non-ordered set, it's easy enough to add a sort to the join line.

    OP also says here they are sorting the original list anyways, so the input order seems to be irrelevant.

    I'm not quite sure what you mean by the duplicated items part. Essentially what my code does is break each line (set), into individual part numbers (elements), then checks if for each prefix of elements, does that one already exist in the final result, and if it does, remove it from the final result, as this current line will supersede it. So if the current line was "A|B|A|B|C", it first checks if "A" is in the result; If so, remove it. Then checks "A|B", then "A|B|A", etc. until finally adding the entire line "A|B|A|B|C" to the final result. If later in the file, the line "A|B|A|B|C|N" is found, at that point, the "A|B|A|B|C" would get removed.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11102090]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-21 02:10 GMT
Find Nodes?
    Voting Booth?

    No recent polls found