http://qs321.pair.com?node_id=1193370

anonymized user 468275 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am developing a website from scratch which will hopefully have plenty of activity some time in the future after it goes live. Although I am still developing it on Windows, I will migrate it to Linux/Apache, with Perl and Postgres completing the rest of the picture before and after migration. I am inclined to avoid PHP for performance reasons.

While loading the database with some initial reference data (about 1GB), using one-off code that got fixed as I went along, I somehow managed to "lose" some PK values, because sequences do not roll back if accessed with nextval.

But it occurred to me that when the system goes live, it will need to manage deleted values. Although I was careful to use bigint for PK values where necessary, I don't want to have the situation where the PK values stay fragmented and increase fragmentation over time.

It was at this point that I realised I needed DBIx::Class, otherwise I would have layers of raw SQL hiding in any homegrown DBMS/Perl framework and that had to be avoided at all costs.

So I am at the point where I have installed DBIx class and feel obliged to determine the rest of the architecture (MVC perhaps) and at the same level of detail, before writing any methods to dynamically defrag my keys (probably through use of an idpool table/class where inserters will look before calling nextval if nothing is in the pool for the table).

It becomes clear that the defrag code belongs in the "Controller" part of the framework. But having never built a Controller before, I am hoping for advice on what the classes should look like in there, e.g. should I have a dbController class that uses methods in DBIx. Should it inherit from the Schema and/or should I have some kind of parent-of-all utility class? I am keen to make a good choice now rather than have to make fundamental architecture mods later. Many thanks in advance,

Simon

One world, one people

  • Comment on Architecture design for full stack development.

Replies are listed 'Best First'.
Re: Architecture design for full stack development.
by erix (Prior) on Jun 23, 2017 at 15:29 UTC

    don't want to have the situation where the PK values stay fragmented and increase fragmentation over time.

    Can you explain the rationale behind this? 'Primary Key Fragmentation' seems a strange worry to have, generally, and it seems peculiar to quote it as a reason to use/choose an ORM (in this case, DBIx::Class).

      There is always a first requirement that comes along for choosing an architectural feature. It is purely a matter of chance what the requirement was. As it happens for this case, where the index is clustered (true for some of the join tables which will grow to billions of records over time) contiguity of values has a direct effect on how well the index performs because of the way clustering works. I already stated in the OP why this prompted an ORM.

      One world, one people

        CLUSTER dependent on the contiguity of /values/? I have to say, I find this a bit hard to believe. Do you perhaps have any recent postgres-based evidence that explains or supports this idea? Or, if you could perhaps outline an approach that demonstrates it, so that I can implement that, and see how it turns out? Is it measurable at all? I would really like to see this in action.

Re: Architecture design for full stack development.
by talexb (Chancellor) on Jun 23, 2017 at 18:20 UTC
      While loading the database with some initial reference data (about 1GB), using one-off code that got fixed as I went along, I somehow managed to "lose" some PK values, because sequences do not roll back if accessed with nextval.

    To simplify, it sounds like you're worried that a list of numbers with a gap (1, 2, 3, 7, 8, 9) won't sort as efficiently as a list without a gap (1, 2, 3, 4, 5, 6). I believe you don't need to worry about this, because a) it's not going to be an issue, and b) anyway, it's the database's problem.

      It was at this point that I realised I needed DBIx::Class, otherwise I would have layers of raw SQL hiding in any homegrown DBMS/Perl framework and that had to be avoided at all costs.

    Hmm. I don't understand how this follows from the issue of having gaps in your Primary Key values. DBIx::Class is a fine module, and I'm sure it will work well .. but having raw SQL code in your codebase isn't a terrible situation.

    Looking forward to hearing about the launch! :)

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      raw SQL not terrible? Think about it like this: I am at the beginning of a complex development where the database is going to undergo many architectural changes most likely. If I can rely on DBIx::Class::Schema::Loader to update my model classes, then I wouldn't want to undermine that with raw SQL, but should co-operate with this gift I am given.

      One world, one people

        This really depends on your familiarity with Object Oriented Database design and coding, and your development goals. Sometimes the primary goal is getting to a Minimum Viable Product done, in order to show investors and potential users. Other times, you have the luxury of time, familiarity with the tools, and the desire to build it right from the ground up.

        I have used DBIx::Class on the job, and a few years back, I re-worked the examples in the POD -- but I really can't say that I'm comfortable with the module. If I were going to build a large project using this framework, I'd know ahead of time that I might be a bit slow at the beginning. I might even build a toy CRUD/BREAD application to get my eye in before starting on the real project.

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Architecture design for full stack development.
by Mr. Muskrat (Canon) on Jun 23, 2017 at 19:06 UTC

    Worrying about gaps in a sequence is silly. Thinking that DBIx::Class will fix that problem is even more silly as it will produce the same gaps under the same circumstances (a transaction created new PK values and the transaction was rolled back).

      There are good reasons to prevent gaps. One is that otherwise there is a security lookhole - an attacker can force your sequence to hit the limit over time if you allow gaps creating denial of service. The limit depends on the size of the integer, so is not so accessible for bigint, but I use sequences even on smallint PKs for some cases where I don't even want the number of values to exceed a low ceiling. It just isn't good maintainability thinking to let gaps multiply like rabbits if one expects a lot of activity with users allowed to delete, insert and maintain records. For clustered indexes it will ultimately have exactly the same effect has fragmented blocks on a disk.

      In regard to why this prompts an ORM, looks like from posts I wasn't clear in the OP... The moment I have to consider using SQL for dealing with something across multiple tables (pks have fks in other tables), embedded in my Perl, is the moment where I need an ORM. It has nothing to do with the particular issue of keys, it is just that if not this requirement that another requirement is bound to come along that prompts awkward-maintenance use of embedded SQL best factorised into an ORM instead.

      One world, one people

        Thanks for the follow up on why you need an ORM.

        If you're concerned about an attacker then take steps to harden your system (not just the database). If you're concerned about running out of values for a sequence then you should consider non-integer approaches; one such approach would be UUIDs.

A reply falls below the community's threshold of quality. You may see it by logging in.