Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: SQL Crosstab, a hell of a DBI idiom

by rdfield (Priest)
on Dec 11, 2003 at 16:38 UTC ( #314074=note: print w/replies, xml ) Need Help??

in reply to SQL Crosstab, a hell of a DBI idiom

Given that only 9 records in your database can generate 600 lines of SQL, what happens when you're crosstabbing databases of millions or even tens of millions of rows? From experience with Oracle, ~8000 lines was about the limit for parsing. If you've got, say, 50-100 distinct values in each of 4 dimensions (arranged 2x2) you're looking at millions of lines of SQL are you not?


  • Comment on Re: SQL Crosstab, a hell of a DBI idiom

Replies are listed 'Best First'.
Re: Re: SQL Crosstab, a hell of a DBI idiom
by gmax (Abbot) on Dec 11, 2003 at 17:05 UTC

    SQL crosstab complexity depends on the number of distinct values in the columns involved with the crosstab. The 600 lines I mentioned were due to a query asking for COUNT, SUM, AVG, MIN, MAX with row and column subtotals, thus requiring a UNION for each level of row header.

    In a database with the same structure but with one million records, the query would not have been much longer, provided that the data is properly checked on input.

    Of course, if you try to do a crosstab by person's name in a table of one million records, you are likely to run out of space, but OTOH crossing data by names wouldn't let you in much better shape with any statistical tool.

    About having 50-100 values in each of 4 dimensions, yes, it's true that you would get an unbearable number of combinations. But you'd get such complexity with any tool, and even if you manage to get such result, it is not readable. Theoretical limits and practical limits need to be considered here. The main purpose of crosstabs is to give an overview of a situation, mostly something that is useful for human consumption. Nobody in his right state of mind would consider reading a document with 50,000 columns and 100,000 rows (provided that I find the paper to print it!)

    Databases with statistical needs and data warehouses are designed in such a way that data can be grouped by some meaningful element. If the designers allow such element to reach thousands of values, then it becomes useless for this kind of overview.

    Anyway, consider that one side of the crosstab (rows) can grow up to the limits of the system, so if one of your values has a large set of distinct values you can always decide to move it from column to row header, keeping in mind that if you generate too many rows it may not be valuable as a statistical report.

    I ran some tests on my database of chess games (2.3 million records) and I got meaningful results in decent times. I generated a few thousand columns, just for fun, but I would never want to be the one in charge of analyzing such a report!

     _  _ _  _  
    (_|| | |(_|><

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://314074]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2021-04-17 08:41 GMT
Find Nodes?
    Voting Booth?

    No recent polls found