Hi Monks,
this is probably a trivial question but I could not find a good answer - most likely because I've searched with the wrong keywords.
My problem: I have a list of unique ids. I want to shorten the ids so that they have a maximum $length but the ids should still be unique. As another requirement, the new ids should be as similar as possible to the old ones.
The following quick and dirty solution works in my cases, but can fail (for example, set $length to 2 - I was just too lazy to fix this). It could be improved in several ways, for example instead of appending numbers from 1 to n, cat pre- and suffixes (in the DATA example, Lenoc3_caA, Lenoc3_caB instead of Lenoc3_ca1,...).
My question is now is there already something (better) on CPAN? If not, do you think it's worth packaging or do you have better solutions?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
sub unique_ids {
my ( $length, $ids_ref ) = @_;
# first check if we need to do something
my %u_ids;
%u_ids = map { my $s = substr($_, 0, $length); $s => ++$u_ids{$s}
+} @{$ids_ref};
if (scalar(keys %u_ids) == scalar @{$ids_ref}) {
return @{$ids_ref};
}
# fix the non-unique ids
my @u_ids;
my %incr;
for my $id (@{$ids_ref}) {
my $s = substr($id, 0, $length);
if ($u_ids{$s} > 1) {
die 'Length too small' if length($u_ids{$s}) > $length;
#this string could be not unique!!!
$s = substr($id,0,$length-length($u_ids{$s})) . ++$incr{$s
+};
}
push @u_ids, $s;
}
return @u_ids;
}
my @ids = <DATA>;
chomp @ids;
my @u_ids = unique_ids(10,\@ids);
warn Dumper \@u_ids;
__DATA__
A2990_duallayer_1
A2990_duallayer_2
A2990_duallayer_3
A2990_duallayer_4
A2990_duallayer_5
A2990_duallayer_6
A2990_duallayer_7
A2990_duallayer_8
A2990_duallayer_9
A2990_duallayer_10
LXP_01
LXP_02
LXP_03
LXP_04
LXP_05
LXP_06
LXP_07
LXP_08
LXP_09
LXP_10
LXP_11
LXP_12
LXP_13
LXP_14
LXP_15
LXP_16
LXP_17
LXP_18
Normal_1
Normal_2
Normal_3
Normal_4
Normal_5
Normal_6
Lenoc3_carina_A
Lenoc3_carina_B
Lenoc3_carina_C
Lenoc3_duallayer_1
Lenoc3_duallayer_2
Lenoc3_duallayer_3
Lenoc5_carina_1
Lenoc5_carina_2
Lenoc5_carina_3
Lenoc5_duallayer_1
Lenoc5_duallayer_2
Lenoc5_duallayer_3
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.