Help with removing dupes from a string with perl

CG_man has asked for the wisdom of the Perl Monks concerning the following question:

I've tried many different things on this question that I found on different forums and nothing seems to work. My question is how can you remove duplicate entries from a SQL query in a perl script, where it keeps the newest entry? Right now the results are as follows from the query;

Jobname status Comment Reason
------- ------ ------- ------
job A HOLD 12:35 per test
job B HOLD 1:45 per user
job C HOLD 3:00 maint
job C HOLD 3:30 per Failure

I need the results to only give me the newest job C and not both.
Here is my perl script

#!/users/contrib/bin/perl

use Sybase::Sybperl;
use Env qw(BATCH_USER_NAME BATCH_PASSWORD);

##############
############## Subroutines
##############

# Send input string to the DB server and get the results
# $_[0] - sql string
sub DBEXEC {
&dbcmd($dbproc, $_[0]);
&dbsqlexec($dbproc);
&dbresults($dbproc);
}
# Turn UTC in to people time
sub GETTIME {
if ($_[0] eq "999999999") {
$time = ("******** **********");
} else {
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($_[0]);
$mon++;
$time = sprintf ("%.2d:%.2d:%.2d %.2d/%.2d/%.2d", $hour, $min, $sec, $mon, $mday, $year + 1900);
}
}
# set job status
sub JOBSTATUS {
if ($_[0] == 9) {$jstatus1 = ("ACTIVATED");}
elsif ($_[0] == 5) {$jstatus2 = ("FAILURE");}
elsif ($_[0] == 8) {$jstatus3 = ("INACTIVE");}
elsif ($_[0] == 11) {$jstatus4 = ("ON_HOLD");}
elsif ($_[0] == 7) {$jstatus5 = ("ON_ICE");}
elsif ($_[0] == 12) {$jstatus5 = ("QUE_WAIT");}
elsif ($_[0] == 10) {$jstatus6 = ("RESTART");}
elsif ($_[0] == 1) {$jstatus7 = ("RUNNING");}
elsif ($_[0] == 3) {$jstatus8 = ("STARTING");}
elsif ($_[0] == 4) {$jstatus9 = ("SUCCESS");}
elsif ($_[0] == 6) {$jstatus10 = ("TERMINATED");}
}

# set event status
sub EVENTSTCODE {
if ($_[0] == 110) {$jstatus1 = ("JOB_ON_ICE");}
elsif ($_[0] == 112) {$jstatus2 = ("JOB_ON_ICE");}
elsif ($_[0] == 106) {$jstatus3 = ("ALARM");}
elsif ($_[0] == 108) {$jstatus4 = ("FORCE_START");}
elsif ($_[0] == 105) {$jstatus5 = ("KILLJOB");}
elsif ($_[0] == 111) {$jstatus5 = ("JOB_OFF_ICE");}
elsif ($_[0] == 113) {$jstatus6 = ("JOB_OFF_HOLD");}
elsif ($_[0] == 117) {$jstatus7 = ("COMMENTS");}

}

format STDOUT6 =
@<<<< @<<<< @<<<< @<<<<
$name, $jstatus, $estamp, $etext
.

$reporttime = &GETTIME(time);

# login to the database
$dbproc = &dblogin(USER, PSSWD, 'SQL_DEV_SERVER');

############
############ Loop for job status check on HOLD.
############
foreach $statusnumber (11) {
foreach $eventnumber (117) {
$statusname = &JOBSTATUS($statusnumber);
$eventstatus = &EVENTSTCODE($eventnumber);
# Submit SQL Statement
$~ = STDOUT6;
print "Jobs on HOLD w/comments last Issued for job \n";
print " \n";
print "Jobname Status Comment reason\n";
print "------- ------ ------- ---------\n";
&DBEXEC("select proc_eventvu.job_name, jobst.status, proc_eventvu.stamp, proc_eventvu.text from proc_eventvu, jobst where proc_eventvu.joid = jobst.joid and proc_eventvu.event in (112,$eventnumber) and jobst.status = $statusnumber order by job_name");
while(@status = &dbnextrow($dbproc)){
$name = ($status[0]);
$jstatus = &JOBSTATUS($status1);
$estamp = ($status2);
$etext = ($status3);

write;}
print " \n";
print " \n";
}
}

I'd appreciate any help, code, and explination so I can understand what I am missing

Comment on Help with removing dupes from a string with perl

Replies are listed 'Best First'.
Re: Help with removing dupes from a string with perl by 1Nf3 (Pilgrim) on Jul 09, 2009 at 05:57 UTC
Please, use `<code> </code>` tags when posting your code. As to the question, a quick fix solution you could try is getting the data from the database sorted descending by time, so that the newest times come first. Then, you can push the first occurence of a job name onto a list of data rows, and when the while loop ends, you can print the rows. # I'm not sure if it's the correct statement # for getting the data sorted newest first, test it first &DBEXEC("select proc_eventvu.job_name, jobst.status, proc_eventvu.stam +p, proc_eventvu.text from proc_eventvu, jobst where proc_eventvu.joid = jobst.joid and proc_eventvu.event in (112,$eventnumber) and jobst.status = $statusnumber order by proc_eventvu.stamp desc"); my %already_processed; my @results_lol; while(@status = &dbnextrow($dbproc)) { $name = ($status[0]); $jstatus = &JOBSTATUS($status1); $estamp = ($status2); $etext = ($status3); # only the job data for a jobname # that wasn't processed before # make it to the result list unless (defined $already_processed{$name}) { push @results_lol, [($name, $jstatus, $estamp, $etext)]; $already_processed{$name} = 1; } } # here, if you want the rows sorted by name, you could # write some necessary code for my $row_ref (@results_lol) { my ($name, $jstatus, $estamp, $etext) = @$row_ref; write; } [download] It's untested, but try if it works. And I wholeheartedly second the suggestion about `use strict; use warnings;`, as well as the advice about the JOBSTATUS subroutine. Regards, Luke	[reply] [d/l] [select]
Re: Help with removing dupes from a string with perl by jrsimmon (Hermit) on Jul 09, 2009 at 12:00 UTC
JavaFan has the right idea. This is more easily done by a properly formatted SQL select statement. `use strict; use warnings; use DBI; my @alljobData = (); my $dbh = DBI->connect(...); my $stmt = "SELECT JOBNAME, STATUS, TIMESTAMP, COMMENT FROM YOUR_TABLE +S WHERE JOBNAME = ? AND STATUS = ? ORDER BY TIMESTAMP DESC"; my $sth = $dbh->prepare($stmt); foreach my $job (@jobname){ my $rc = $sth->execute($job, "HOLD"); #either check for an error or let failing on fetch do that for you if(my @thisjobData = $sth->fetchrow_array){ push(@allJobData, @thisjobData); } }` [download] You could actually write the sql in a way that it would pull only the most current row for every job name in a single statement, but I don't know the syntax off the top of my head.	[reply] [d/l]
Re^2: Help with removing dupes from a string with perl by tilly (Archbishop) on Jul 09, 2009 at 16:02 UTC
There are several ways to do it, depending on your database. The most portable is: `SELECT t.jobname , t.status , t.timestamp , t.comment FROM your_table t JOIN ( SELECT jobname, MAX(timestamp) as timestamp FROM your_table GROUP BY jobname ) max_timestamp ON max_timestamp.jobname = t.jobname AND max_timestamp.timestamp = t.timestamp` [download] Of course this assumes that you don't have 2 entries with the same name and the same timestamp. It is usually less efficient than selecting back and filtering outside of the database. Google tells me that Sybase doesn't yet support the better SQL 2003 solution of analytic queries. Also a note about your solution. I would strongly recommend ordering on jobname and then timestamp. This will put all of the records for each jobname together, making it easy to figure out which is first and which you should throw. Otherwise you need to keep a potentially large hash of which jobs you've seen. (You skipped that step in your code.)	[reply] [d/l]
Re^3: Help with removing dupes from a string with perl (last) by tye (Sage) on Jul 09, 2009 at 18:48 UTC
Heh, I just rolled my own first() and last() aggregators in Postgresql yesterday because I wanted something like this (googling for analytic queries found a somewhat related link that mentioned first() and last() aggregators as I was curious if my solution was in-line with SQL 2003's solution). The process was a bit more interesting and difficult than I expected. It convinced me that Postgresql doesn't have a general solution for user-defined polymorphic aggregators because it doesn't allow polymorphic user-defined data types (you can't use "anyelement" as part of a "create type" construct -- at least in the versions I was dealing with, 8.2 and 8.3). So I was able to create some polymorphic aggregators but can see cases where these approaches would not work (and the first route I tried failed). Read more... (1479 Bytes) The following test query: `select first(v), first_nonnull(v), last(v), last_nonnull(v) from ( select NULL::text as v union all select 'one' as v union all select 'two' as v ) as q union all select first(v), first_nonnull(v), last(v), last_nonnull(v) from ( select 'one'::text as v union all select 'two' as v union all select NULL as v ) as q;` [download] produces the following results: `first \| first_nonnull \| last \| last_nonnull -------+---------------+------+-------------- \| one \| two \| two one \| one \| \| two (2 rows)` [download] - tye	[reply] [d/l] [select]
Re^4: Help with removing dupes from a string with perl (last) by tilly (Archbishop) on Jul 10, 2009 at 00:35 UTC
Re^4: Help with removing dupes from a string with perl (last) by tye (Sage) on Jul 05, 2011 at 22:41 UTC
Re^3: Help with removing dupes from a string with perl by mpeppler (Vicar) on Jul 24, 2009 at 18:17 UTC
In Sybase you can use a a group by and a having clause, even without using any aggregates in the main query. So something like `select jobname, status, timestamp, comment from the_table group by jobname having timestamp = max(timestamp)` [download] should return something reasonable without the need for a sub-query. Michael	[reply] [d/l]
Re^4: Help with removing dupes from a string with perl by tilly (Archbishop) on Jul 27, 2009 at 07:20 UTC
Re^5: Help with removing dupes from a string with perl by mpeppler (Vicar) on Jul 30, 2009 at 06:02 UTC
Re^2: Help with removing dupes from a string with perl by Herkum (Parson) on Jul 09, 2009 at 13:11 UTC
The SQL for looking up a unique column will roughly look like this(depending upon your database), `SELECT DISTINCT(JOBNAME), STATUS, TIMESTAMP, COMMENT FROM YOUR_TABLES +WHERE STATUS = ? ORDER BY TIMESTAMP DESC` [download] Now you remove that loop for looking up individual job names.	[reply] [d/l]
Re^3: Help with removing dupes from a string with perl by jrsimmon (Hermit) on Jul 09, 2009 at 14:06 UTC
With DB2, using DISTINCT in that manner won't guarantee that you get the most recent result. DISTINCT will be evaluated before the ORDER BY clause and may not find the most recent record first. I'm not sure whether this behavior is standard among databases or not. From the db2 doc: When values match in the specified columns of two or more rows, the distinct operator retains only one of the duplicate rows in the result data set and discards the others. When duplicate rows are found, only the first row that is read during processing is retained. Because the order of rows is not guaranteed during processing, choose carefully when you select columns for the distinct operation. To ensure you get the most recent occurrence, you'd probably need to use a temp table in the statement (at least that's how I've seen it done).	[reply]
Re: Help with removing dupes from a string with perl by Anonymous Monk on Jul 09, 2009 at 05:13 UTC
This is better way to write JOBSTATUS `sub JOBSTATUS { my %foo = ( 1 => "RUNNING", 3 => "STARTING", 4 => "SUCCESS", 5 => "FAILURE", 6 => "TERMINATED", 7 => "ON_ICE", 8 => "INACTIVE", 9 => "ACTIVATED", 10 => "RESTART", 11 => "ON_HOLD", 12 => "QUE_WAIT", ); return $foo{ $_[0] }; }` [download] While you're changing EVENTSTCODE , see reasons to avoid $jstatus1,$jstatus2.... Why it's stupid to `use a variable as a variable name` - part 1 Why it's stupid to `use a variable as a variable name` - part 2 Why it's stupid to `use a variable as a variable name` - part 3 All your programs should start with Use strict warnings and diagnostics or die ( Alternate title: Read this if you want to cut your development time in half! )	[reply] [d/l]
Re: Help with removing dupes from a string with perl by JavaFan (Canon) on Jul 09, 2009 at 08:56 UTC
I would let the database do the work. Write your SQL so that it doesn't return rows you don't want.	[reply]