Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^11: Architecture design for full stack development.

by erix (Prior)
on Jun 25, 2017 at 06:30 UTC ( [id://1193480]=note: print w/replies, xml ) Need Help??


in reply to Re^10: Architecture design for full stack development.
in thread Architecture design for full stack development.

Update: Ah, didn't see your update in time. Good, we seem to (more or less) agree then :) /Update

No, you are still mistaken (or, just possibly, do not put forward your point clearly enough).

Let's read what's on the page that you mentioned:

https://www.postgresql.org/docs/current/static/sql-altertable.html

It says:

CLUSTER ON This form selects the default index for future CLUSTER operations. + It does not actually re-cluster the table.

So it says: 'future CLUSTER operations'. That means the next CLUSTER command. That is the CLUSTER ON command. On the html-page, that CLUSTER is actually a link into sql-cluster.html. This is the CLUSTER I was talking about all along (naturally, as there is no other CLUSTERing in postgres).

So no, Postgres does not have clustered indexes, except in the sense of the CLUSTER ON command (which re-organises the underlying table on the basis of a /single/ index; and which is non-persistent (as you rightly say))

Replies are listed 'Best First'.
Re^12: Architecture design for full stack development.
by anonymized user 468275 (Curate) on Jun 25, 2017 at 06:55 UTC
    Yes and I am grateful to you for putting me right - I could have gone on thinking that the clustered index could be relied upon to persist rather than requiring scheduled cluster operations (e.g. a cron or autosys job) instead.

    One world, one people

      ... cron or autosys job

      In view of the 'billions of records' you mentioned earlier, in combination with a busy site:

      Keep in mind that a table is completely blocked for both read and write during a CLUSTER operation. This often simply means that it is not feasible to use CLUSTER (unless you can suffer the 'downtime').

      I'd say, just forget about primary key- (or index-) 'fragmentation' as a problem. It is not, really. It should not be an application level concern.

      I did a comparison between gapped-values versus contiguous-values tables (starting out with 1 billion rows (=34GB), deleting 90%, then comparing to same-size contiguous-numbered table), and it turns out that indeed, there is no difference in performance.

      $ ./testfrag.sh -- rowcount1 [1_000_000_000] -- rowcount2 [100000000] -- gap_size [10] -- running tests... fragmented: average (100 searches): 23.754 ms contiguous: average (100 searches): 26.553 ms

      (The difference seen here is coincidence -- after another CLUSTER it might be the other way round)

      The test was done on my old desktop, and above are cold-cache timings (cached, both go down to ~2.0 ms (again no difference)).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1193480]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-19 13:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found