Remove duplicate from the same line..

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a basic question to remove a duplicate word from a line.

The word is inside a string, ok, lets consider it as $string, I have a paragraph inside that string and there is a line where duplicates is coming and I want to remove it...

Example data

---------------------------

Allscripts LLC    -    Long Island City, NY


May 31, 2013



Job Summary



Company    Allscripts LLC Allscripts LLC

Location    Long Island City, NY

Job Type    Regular

Job Classification    Full Time

Experience    not provided

Education    not provided

Company Ref #    014625014625

AJE Ref #    561949312
-----------------------------
[download]

Here Just next to 'Company' there is 'Allscripts LLC Allscripts LLC', I just need it once and it should be like 'Allscripts LLC' instead of 'Allscripts LLC Allscripts LLC'. So the output want is like,

------------------------------------
Allscripts LLC    -    Long Island City, NY


May 31, 2013



Job Summary



Company    Allscripts LLC          # (Changes in this line)

Location    Long Island City, NY

Job Type    Regular

Job Classification    Full Time

Experience    not provided

Education    not provided

Company Ref #    014625014625

AJE Ref #    561949312
------------------------------------
[download]

The name should be any names & not only just "Allscripts LLC". It can be some other names like "TechnoCats" "GLOBEMASTERS" etc.. etc.. I need a universal solution.

I am not getting how to do this properly, Can any Monks pls suggest me a way to do this effectively.

Regards,

Galonet

Comment on Remove duplicate from the same line.. Select or Download Code

Replies are listed 'Best First'.

Re: Remove duplicate from the same line.
by Athanasius (Archbishop) on Jun 01, 2013 at 15:05 UTC

The following regex will remove any word or phrase that duplicates its immediate predecessor:

$string =~ s/ \b (.+) \b \s* \1 /$1/gx;
[download]

But note that an address such as “Long Island City, NY NY” will be reduced to “Long Island City, NY”.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]

Re^2: Remove duplicate from the same line.

by Anonymous Monk on Jun 01, 2013 at 17:30 UTC

Thx. Athanasius. but I have some more doubts, pls can you check the reply I gave to rpnoble

[reply]

Re: Remove duplicate from the same line..
by rpnoble419 (Pilgrim) on Jun 01, 2013 at 15:14 UTC

Are you getting this duplication only on the Company Name line? If so, wrap the solution from Athanasius in an if test when you read the line from your file. Otherwise you can damage any address information as warned by Athanasius. Can you get a look at the system that is causing the problem in the first place? That might be your better long term solution..

[reply]

Re^2: Remove duplicate from the same line..

by Anonymous Monk on Jun 01, 2013 at 15:36 UTC

Thx. Athanasius... That was good.

but now I got 1 more problem, I have a company name like "Goldman Sachs Group, ... Goldman Sachs Group, Inc." Here I want only 'Goldman Sachs Group'. Is there any option for that?

& Thx rpnoble, I will do it like that... :)

Thx.

[reply]

Re^3: Remove duplicate from the same line..

by gam3 (Curate) on Jun 01, 2013 at 18:51 UTC

Smith Smith & Feeley LLP

-- gam3
A picture is worth a thousand words, but takes 200K.

[reply]

Re^4: Remove duplicate from the same line..

by hdb (Monsignor) on Jun 02, 2013 at 06:12 UTC

Re^5: Remove duplicate from the same line..

by sundialsvc4 (Abbot) on Jun 02, 2013 at 17:05 UTC