Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Storing Large Amounts of Trivial Data

by Anonymous Monk
on Jan 13, 2002 at 08:57 UTC ( [id://138402]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

A simple explanation of what I am inquiring about would be Perlmonks voting. Somehow, it has saved every single vote I have ever made and can (quickly) calculate sums for nodes and display voting options for nodes which I have not voted for.

This seems like there would be literally hundreds of thousands of rows if a single mySQL table was used, with one row per vote. How does Perlmonks keep track? What are the speeds involved? Is there a place which I might find the schema of the database tables used here for Perlmonks? I feel I would learn a great deal from these.

Thanks!

Replies are listed 'Best First'.
(jcwren) Re: Storing Large Amounts of Trivial Data
by jcwren (Prior) on Jan 13, 2002 at 09:19 UTC

    You can download the source for the Everything engine from http://www.everydevel.com. The database schemas will be in there. You won't need to actually build and install the package, but I think the initial creation scripts are what contain the database schemas. In particular, check items related to the voting nodelet (which I think is a separate download).

    --Chris

    e-mail jcwren

Re: Storing Large Amounts of Trivial Data
by IlyaM (Parson) on Jan 13, 2002 at 09:20 UTC
    100000 of rows is not really large ammount for MySQL. I have no idea how Perlmonks does it but basic schema should be trivial. Something like:
    create table vote ( /* primary key */ vote_id unsigned not null auto_increment, primary key (vote_id), /* reference on record in table which stores users */ user_id unsigned not null, /* reference on record in table which stores nodes */ node_id unsigned not null, /* vote value: adjust field type for your requirements */ value int not null, )
    It is very likely that also indexes for person_id and node_id fields are needed. It is very likely to have those fields in WHERE part of queries.

    --
    Ilya Martynov (http://martynov.org/)

Re: Storing Large Amounts of Trivial Data
by xylus (Pilgrim) on Jan 13, 2002 at 10:22 UTC
    Much of the speed involved is based upon the database system. It can do alot of the data handling. Like said above, you can review the source code for the perlmonks engine/site and see what other processing they might do, however, most of it will be crafted into the database server.

    For sites that contain a lot of data, the thing that will slow it down is a slow database server (or connection to it).
    
    ----------------------------------------------------------
    #!/usr/bin/perl
    @==qw/p e r l m o n k s/;$|*=1;@;=qw/8 15 7 9 -1 7 7 2 0/;
    foreach$-(@=){for(++$|..$;[$:++]){$-++}$..=$-}$.=~s/m/l/g;
    $*=$;[4]+1;for($;[9]..$;[0]/2){$,.=substr($.,$*++,1);$*++;
    }print$,;#http://www.perlmonks.org/index.pl?node_id=98506;
    
Re: Storing Large Amounts of Trivial Data
by n3dst4 (Scribe) on Jan 13, 2002 at 19:56 UTC
    Database performance comparisons are usually made in terms of millions of rows. MySQL, even running on a fairly regular PC, can easily cope with a few tens of millions of rows. Oracle can scale even higher. The reason is indices. When designing the database schema, you decide which columns you'll want to be able to search on, order by etc. and tell the DBMS to keep an index. Indices slow down INSERTs, because the item has to be indexed, but drastically improve SELECT speeds.

    In a sense you're right, because without an index every query pertaining to votes would have to loop across the entire table, which would be way slow. But Vroom or whoever designed the voting system wisely chose to index the 'monk' column - thereafter making it very readily accessible

    Learn a bit about modern DBMSs and you'll find youtself doing a few things:

    • Realising that they're not just convenient alternatives to flat files - they're a whole world of data handling
    • Laughing at co-workers who say "I've got a really big table - nearly a thousand rows!"
    • Being immensely chuffed at how much work you can make the DBMS do really quickly - so you don't have to do it in your application (any time you find yourself SELECTing more rows from the DB than you need and then filtering them with perl, warning bells should go off).
    I have no idea if Everything does store all the votes since ever - but the point is, it could.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://138402]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found