Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

SQL crosstab complexity depends on the number of distinct values in the columns involved with the crosstab. The 600 lines I mentioned were due to a query asking for COUNT, SUM, AVG, MIN, MAX with row and column subtotals, thus requiring a UNION for each level of row header.

In a database with the same structure but with one million records, the query would not have been much longer, provided that the data is properly checked on input.

Of course, if you try to do a crosstab by person's name in a table of one million records, you are likely to run out of space, but OTOH crossing data by names wouldn't let you in much better shape with any statistical tool.

About having 50-100 values in each of 4 dimensions, yes, it's true that you would get an unbearable number of combinations. But you'd get such complexity with any tool, and even if you manage to get such result, it is not readable. Theoretical limits and practical limits need to be considered here. The main purpose of crosstabs is to give an overview of a situation, mostly something that is useful for human consumption. Nobody in his right state of mind would consider reading a document with 50,000 columns and 100,000 rows (provided that I find the paper to print it!)

Databases with statistical needs and data warehouses are designed in such a way that data can be grouped by some meaningful element. If the designers allow such element to reach thousands of values, then it becomes useless for this kind of overview.

Anyway, consider that one side of the crosstab (rows) can grow up to the limits of the system, so if one of your values has a large set of distinct values you can always decide to move it from column to row header, keeping in mind that if you generate too many rows it may not be valuable as a statistical report.

I ran some tests on my database of chess games (2.3 million records) and I got meaningful results in decent times. I generated a few thousand columns, just for fun, but I would never want to be the one in charge of analyzing such a report!

 _  _ _  _  
(_|| | |(_|><

In reply to Re: Re: SQL Crosstab, a hell of a DBI idiom by gmax
in thread SQL Crosstab, a hell of a DBI idiom by gmax

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2023-02-07 15:58 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (40 votes). Check out past polls.