Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hi Monks,

this is probably a trivial question but I could not find a good answer - most likely because I've searched with the wrong keywords.

My problem: I have a list of unique ids. I want to shorten the ids so that they have a maximum $length but the ids should still be unique. As another requirement, the new ids should be as similar as possible to the old ones.

The following quick and dirty solution works in my cases, but can fail (for example, set $length to 2 - I was just too lazy to fix this). It could be improved in several ways, for example instead of appending numbers from 1 to n, cat pre- and suffixes (in the DATA example, Lenoc3_caA, Lenoc3_caB instead of Lenoc3_ca1,...).

My question is now is there already something (better) on CPAN? If not, do you think it's worth packaging or do you have better solutions?

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub unique_ids { my ( $length, $ids_ref ) = @_; # first check if we need to do something my %u_ids; %u_ids = map { my $s = substr($_, 0, $length); $s => ++$u_ids{$s} +} @{$ids_ref}; if (scalar(keys %u_ids) == scalar @{$ids_ref}) { return @{$ids_ref}; } # fix the non-unique ids my @u_ids; my %incr; for my $id (@{$ids_ref}) { my $s = substr($id, 0, $length); if ($u_ids{$s} > 1) { die 'Length too small' if length($u_ids{$s}) > $length; #this string could be not unique!!! $s = substr($id,0,$length-length($u_ids{$s})) . ++$incr{$s +}; } push @u_ids, $s; } return @u_ids; } my @ids = <DATA>; chomp @ids; my @u_ids = unique_ids(10,\@ids); warn Dumper \@u_ids; __DATA__ A2990_duallayer_1 A2990_duallayer_2 A2990_duallayer_3 A2990_duallayer_4 A2990_duallayer_5 A2990_duallayer_6 A2990_duallayer_7 A2990_duallayer_8 A2990_duallayer_9 A2990_duallayer_10 LXP_01 LXP_02 LXP_03 LXP_04 LXP_05 LXP_06 LXP_07 LXP_08 LXP_09 LXP_10 LXP_11 LXP_12 LXP_13 LXP_14 LXP_15 LXP_16 LXP_17 LXP_18 Normal_1 Normal_2 Normal_3 Normal_4 Normal_5 Normal_6 Lenoc3_carina_A Lenoc3_carina_B Lenoc3_carina_C Lenoc3_duallayer_1 Lenoc3_duallayer_2 Lenoc3_duallayer_3 Lenoc5_carina_1 Lenoc5_carina_2 Lenoc5_carina_3 Lenoc5_duallayer_1 Lenoc5_duallayer_2 Lenoc5_duallayer_3

In reply to Generate unique ids of maximum length by lima1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2024-04-19 09:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found