Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Perl solution for SQL problem?

by Win (Novice)
on Jan 05, 2004 at 12:25 UTC ( [id://318825]=perlquestion: print w/replies, xml ) Need Help??

Win has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have created the following table:
CREATE TABLE Geo_m_count ( Cause VARCHAR(30) NULL, OAcode VARCHAR(20) NOT NULL, Sex FLOAT(3) NOT NULL, Age_start INT NULL, Age_end INT NULL, Age_range VARCHAR(25) NULL, Person_count INT NULL )
I have filled in all the values of this table with the exception of Person_count.
Person_count requires cross reference to the table:
CREATE TABLE Cases ( Year_of_death varchar(14) NOT NULL, Ageyears float(8) NOT NULL, Sex float(12) NULL, Undcause varchar(10) NULL, OAcode varchar(20) NULL )
Foreach record of the table Geo_pop_count I wish to count the number of Cases that meet the SQL condition
FROM Cases C, Geo_m_count G WHERE G.OAcode LIKE C.OAcode AND C.Ageyea +rs >= G.Age_start AND C.Ageyears <= G.Age_end AND C.Sex = G.Sex)
Do I require a bit of Perl script to do this or can I do it with just T-SQL (for MS SQL Server)? I already run the T-SQL statements through a DBI connection. And I left my SQL book at home.

Replies are listed 'Best First'.
Re: Perl solution for SQL problem?
by Abigail-II (Bishop) on Jan 05, 2004 at 12:42 UTC
    I'm trying to make sense of your question, but I fail. First, Person_count is of type INT, the table Cases doesn't have a column of type INT, so it's totally unclear how Person_count is a cross reference to that table.

    Having said that, your question seems about a fragment of an SQL statement, and that fragment doesn't mention Person_count, so, why this information about said column is relevant remains unclear.

    And then the question Do I require a bit of Perl script to do this or can I do it with just T-SQL. Considering you give the query as an SQL fragment, what makes you think you can't do it in SQL? It's like pointing to a red bird and asking whether a bird can be red....

    Abigail

Re: Perl solution for SQL problem?
by dragonchild (Archbishop) on Jan 05, 2004 at 14:23 UTC
    This is a data problem. Data problems should be solved by the data layer and not the application layer. So, even if you could do this in Perl (which you can), you should be doing it in T-SQL (which, I hope, isn't much different from SQL.)

    What you're looking for is a trigger. In Oracle, I would write it something like:

    CREATE TRIGGER TRG_SUM_PERSONCOUNT AFTER INSERT OR UPDATE OR DELETE ON CASES BEGIN UPDATE geo_m_count g SET person_count = ( SELECT COUNT(*) FROM cases c WHERE g.oacode LIKE c.oacode AND g.sex = c.sex AND c.ageyears BETWEEN g.age_start and g.age_end ) END; /
    (The ending slash is important, for Oracle. YMMV)

    Basically, what that does is whenever any of the rows in CASES changes, person_count is updated in all rows in GEO_M_COUNT. Now, this can be very inefficient, if you are doing more changes than selects. You might want to consider changing how you are viewing your data. Without knowing much, I am going to hazard a guess that you're doing some sort of statistical analysis of death information. Based on that, I would recommend looking at things differently. I'd have the following tables:

    • CAUSE_OF_DEATH
      • ID
      • DESCRIPTION
    • OA_CODE
      • ID
      • DESCRIPTION
    • CASES
      • ID
      • GENDER
      • DATE_BIRTH
      • DATE_DEATH
      • OA_CODE - REFERENCES OA_CODE(ID)
      • CAUSE_OF_DEATH - REFERENCES CAUSE_OF_DEATH(ID)

    In every base table, there is an ID field. This is a database unique identifier, assigned when the row is inserted. It has nothing to do with the data, other than to identify a given row.

    Your other table (Geo_m_count) is a summary table. It doesn't hold data - it organizes it. So, you could do a view (materialized or not - your choice). Look up how views are done in MS-SQL. (I've never really used it, being an Oracle person.) But, your basic select statement could look like:

    SELECT cause_of_death ,oa_code ,gender ,MIN( date_death - date_birth ) AS age_start ,MAX( date_death - date_birth ) AS age_end ,COUNT( * ) FROM cases GROUP BY cause_of_death ,oa_code ,gender ;

    A few thoughts on your SQL in general.

    1. You should really consider making your naming convention a little easier to use. For example, OAcode isn't very person-friendly. I'm sure there's a better name. Others:
      • Gender is better than Sex
      • Age is better than Ageyears (Age is almost always in years)
    2. It's better to store DATEs instead of INTs, when dealing with things that span time, like Age_start and Age_end
    3. This is just convention, but I would strongly recommend keeping all names in SQL in uppercase. Many databases (such as Oracle) are case-insensitive, but I wouldn't depend on it. Also, it makes your names stand out.
    4. I hope you're specifying things like a PRIMARY KEY or a useful set of foreign key(s) for each table. Constraints are probably the most important feature of relational databases - they are the relational part. Without constraints, your data doesn't relate one table to the other. For example, you have OAcode in both tables. I would hope that OAcode is foreign-keyed to some other table that lists all the legal OAcode values. Otherwise, I 100% guarantee that you will eventually have bad data.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Perl solution for SQL problem?
by exussum0 (Vicar) on Jan 05, 2004 at 13:03 UTC
    Answering another problem you may run into, you are using a lot of data types that may or may not be kinda.. big.. for what they hold.

    sex is usually a boolean of sorts, evena bit, or a charachter, but certainly not a float. Unless you are counting half instances and large ones, since you are using a float of 12 digits or 3.

    I'm assuming ageyears, is supposed to be the age in years. It's usually smarter to use a date-like format and calculate it on the fly. (where age >= 01-01-1980 and age < 01-01-1981 for instance).

    A little oversizing isn't bad. But too much and you wind up with a db that takes up a LOT more space than it should. You can also get truncating errors if you think a float(12) sex in cases, would relate properly to float(3) in geo_m_count for instance, if it were a foreign key.


    Play that funky music white boy..
Re: Perl solution for SQL problem?
by rdfield (Priest) on Jan 05, 2004 at 13:44 UTC
    Use an "update" statement with a correlated sub-query, e.g.
    update geo_m_count g set person_count = ( select count(*) from cases c where <as per OP> )
    No Perl required.

    rdfield

      Unless denormalization is really needed, I wouldn't do this - I would either create a view or leave the query as a subselect, depending on the features of the underlying DB.

      Of course, it might be prudent to cache that information, as the database load will be not really small if queries hit person_count often, but I don't see Win hitting that part yet.

      perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://318825]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-24 00:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found