Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re^2: key value in text format

by pwagyi (Monk)
on Nov 06, 2019 at 06:35 UTC ( [id://11108351] : note . print w/replies, xml ) Need Help??

in reply to Re: key value in text format
in thread key value in text format

Yes, it sounds great. SQLite is quite lightweight too. Just to clarify, records data are only textual (numeric/string) only. But how would Relational DB like SQlite address potential sparse key -value pair ??

Replies are listed 'Best First'.
Re^3: key value in text format
by Corion (Patriarch) on Nov 06, 2019 at 08:40 UTC

    I think ideally you will model your (composite) key as separate columns. Then you can query either all key columns or just a subset of them.

    If you want to keep things simple, keep the optional key-value pairs at the end as a single string. If you want to also query them, an approach a slight step better is to format and store them as JSON. Then you can query them in the database almost as if they were additional columns. The ideal way is to convert these optional things either into a fixed set of additional columns or add another table that consists of three columns, (row-key, keyname, value). But doing that makes the queries somewhat more ugly.

      As long as there were no security concerns, another table could store sparse records from a number of the databases.

      Providing a primary key of the filename, along with the three columns Corion already suggested. The records db would then hold a sparse column with a boolean value denoting the record is held on that db. Checking the boolean value and retrieving from the other table when requiring the sparse values.

      Another approach that has just occured to me, of the ugly variety, with additional overhead, may be to hash the filenames before entering them onto the database. The first row primary key being the filename itself hashed, with the secondary row being the filename concatenated with for example, the term 'sparse' before being hashed. Already I can feel the glares.

      The issues with this would be losing the key information, you would necessarily need to know beforehand the key values of sparse data. Using an additional table to denote the keys could be a solution. At this stage it would then be a matter of performance requirements, size of records and whether there is any advantage of having relatively few empty record rows with additional tables denoting keys for each db, along with the requirement to hash/dehash, versus additional mostly empty column(s).

      Likely the better solution is to serialise in such a format as JSON to keep the db contained. But at this point would you not just store the whole optional hash record as a JSON blob anyway, meaning there would only need be the one additional column.

      On reflection, this is a partial conversion of the binary data, but for this kind of mixed data there is more than one step needed.

Re^3: key value in text format
by The_Dj (Sexton) on Nov 06, 2019 at 15:34 UTC
    I second SQlite. I use it a lot, and if your dataset is big(ish), you don't want to be parsing flat files for every lookup.

    How you store this depends on just how many [ key1=value1, key2=value2] there are.
    If there is a fixed list of 'key1','key2' and it's not too many: bite the bullet and add the columns.

    Otherwise, you get to use a link (This is Database 101. search for 'database normalization')

    each record should be
    [id], field1, field2

    then in a separate table
    [link_id], key1, value1 [link_id], key2, value2

    You can also have a 3rd table with
    [link_id], source_file_name
    or whatever other metadata you need to keep.

    SQLite very kindly has a magic ROWID column. It wont' return it on SELECT *, but will on SELECT ROWID,*.
    Of course you get to do two queries for each lookup: SELECT ROWID,* FROM main_data and then SELECT * FROM extra_data WHERE link_id = ?


    TO clarify: if your 'real' data is
    name=bob, age=75, [ hair=balding, glasses=bifocal] name=john, age=20, [ sport=chess ]

    Then you get:
    Primary table:
    1, bob, 45 2, john, 20

    lookup table:
    1,hair,balding 1,glasses,bifocal 2,sport,chess

      The link method to handle key/value pair is a good idea. but I think I might just go with JSON (column) for all key/value pairs. What is the trade-off between the two approaches?


        using a linked table, you can do
        SELECT * FROM main_data WHERE ROWID IN ( SELECT link_id FROM extra_data WHERE key1 = ? AND value1 LIKE ? )

        Consider also:
        do you want to put a storage method (JSON) inside a different storage method (SQL)?