Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Reducing application footprint: large text files

by Anonymous Monk
on Feb 28, 2018 at 20:38 UTC ( [id://1210091]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Firstly, I must 'fess up. I don't know Perl. I'm needing to move a large Perl application to a small embedded system and it needs to go on a serious diet. The application has many 10's of MB of computer generated data files that contain text like this:
%my_flds = ( "DIS" => [ 0, 33, 1, 0x0000000200000000, 0x00, 1, 0x0000000000000 +000, "" ], "L2" => [ 1, 24, 8, 0x00000000FF000000, 0x00, 1, 0x0000000000000 +000, "" ], "L1" => [ 2, 16, 8, 0x0000000000FF0000, 0x00, 1, 0x0000000000000 +000, "" ], "L0" => [ 3, 15, 1, 0x0000000000008000, 0x00, 0, 0x0000000000000 +000, "" ], "LDIS" => [ 4, 14, 1, 0x0000000000004000, 0x00, 1, 0x0000000000000 +000, "" ], "LCNT" => [ 5, 13, 1, 0x0000000000002000, 0x00, 1, 0x0000000000000 +000, "" ], ); %my_def = ( NAME => "CONFIG", ADDRESS => 0x400000, LENGTH => 64, FLAGS => 0x001, NOTE => "", RESET => 0x0000000000000000, FIELDS => \%my_flds );
Can this be packed down into a simple binary file? If so, how would I modify the app to read this binary file? Are there any other strategies you would recommend for reducing the footprint of an application? Thanks, Matt.

Replies are listed 'Best First'.
Re: Reducing application footprint: large text files
by LanX (Saint) on Feb 28, 2018 at 21:28 UTC
      Thanks, I'll look into pack and unpack.

      I think the workflow would then become:

      1. During the build procedure, run the PM's through pack to create the binary version of the file.
      2. During runtime, read the binary file via unpack
      Is that what a Perl expert would do?

      Matt.

        Most of the comments I've seen have assumed this is data. But your phrasing of "run the PM's thru pack..." implies that this isn't data, per se, but an actual perl module (.pm that you're accessing via use Some::Module). If this is true, I am not sure that rolling your own is the best choice. You might want to clarify on the point. Is the file you're trying to read, which you called "the PM's", pure data, data in perl format, or data plus other perl code (such as functions, for loops, etc), or something else?

        I don't know specifically of a CPAN module that allows loading of a compressed module, but it would surprise me if there wasn't one (a quick search for "perl compress module" finds perl modules that compress something else, not perl modules that allow you to compress your source code). Or something like the Acme::Buffy, which will modify the source code. I just don't know of what that module would be... but maybe my phrasing will spark something in a more experienced monk

        I hesitate to recommend Module::Crypt: I hesitate, because Module::Crypt doesn't really do what the name implies: never rely on Module::Crypt to protect your source code from prying eyes; it will not keep it secret! But I mention it nonetheless because I think that maybe the XS output from Module::Crypt would be smaller than your 10MB++ perl module. I don't know if it would be, but it might be something to try.

        > Is that what a Perl expert would do?

        A Perl expert would ask for more details.

        You can certainly do what you described...

        ...just probably there is an even better solution.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

Re: Reducing application footprint: large text files
by johngg (Canon) on Feb 28, 2018 at 22:53 UTC

    Difficult to tell without seeing some more data; for instance, are those fields consistent in number or do they vary, what are the max & min values of the hex numbers? If the fields vary to the extent that would make pack templates impractical you might want to have a look at the core Storable module, perhaps in conjunction with IO::Compress::Bzip2 and its Uncompress sibling.

    Cheers,

    JohnGG

        Very interesting. Sereal deserves some attention too. I'll read through that.

        Thanks, Matt.

      There are two data structures that remain the same... the first structure describe bit-fields within a 64-bit register. The 2nd structure describes some meta-attributes about the register.

      min to max will be 0 to 2^64 - 1.

      So given these data structures are not varying, it is sounding like pack templates might be the way to go. Perhaps there will be a challenge in that the 1st data structure is an array with varying numbers of elements, although the structure will always be the same.

      Thanks for pointing out Storable and BZip2. That is more food for thought along the way.

      Thanks, Matt.

        There are two data structures that remain the same... the first structure describe bit-fields within a 64-bit register. The 2nd structure describes some meta-attributes about the register. min to max will be 0 to 2^64 - 1. So given these data structures are not varying, it is sounding like pack templates might be the way to go. Perhaps there will be a challenge in that the 1st data structure is an array with varying numbers of elements, although the structure will always be the same.

        The OP shows two hashes; one of which is a hash of arrays. Above you say "the 1st data structure is an array with varying numbers of elements,"? The OP mentions "many 10's of MB of computer generated data files" and shows two small data structures. My point is that you are not giving us clear information. If you want actual help rather than speculative possibilities, you need to be more clear and accurate in the specifications of the problem.

        Ie. Is this two files containing a huge version of one of the OP data structures in each? Or are the myriad files for each type of data structure? Or myriad files containing the two versions of the OP data structures?

        • How many MBs?
        • Spread across how many files?
        • Are the sub data structures fixed or variable in length?

          Note: If the top level entity in a file has a variable length, that's easily accommodated; but if the sub structures vary in length that's harder. Ie. if the hash of arrays, contains a variable number of hash elements, but the values are fixed length arrays, that easily handled; but if the arrays vary in length that's much harder.

        • Does the application need to load all of the "10s of MBs" at once for every run, or does it only use a small subset for each run?
        • So many more questions, before I would choose an approach to solving your problem.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Reducing application footprint: large text files
by LanX (Saint) on Feb 28, 2018 at 22:52 UTC
    Apart from pack and unpack...

    Storable might be another and easier option, but I don't know much about the achievable compression.

    The 3rd option is to zip the raw data. You could use Archive::Zip for it.

    Best choice depends on the details...

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

Re: Reducing application footprint: large text files
by QM (Parson) on Mar 01, 2018 at 10:59 UTC
    These are essentially Perl structures compatible with JSON. A simple idea comes to mind, convert these to JSON, and zip them. (If necessary, you can convert the live structures into JSON and write files.) In the embedded system, you can unzip the data, streaming into a perl script, convert from JSON to Perl structures, and populate Perl vars.

    I haven't looked farther than that, but I suspect there is some simple boilerplate to read in a zipped JSON file and get a ref to the structure. The last piece of the puzzle is putting the data into the expected variables (you have shown %my_flds and %my_def).

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Reducing application footprint: large text files
by Anonymous Monk on Feb 28, 2018 at 21:17 UTC
    Well, could you put that data into SQLite database tables?
      sqlite is a nice suggestion. Unfortunately, I don't that option in this case.
Re: Reducing application footprint: large text files
by Anonymous Monk on Feb 28, 2018 at 22:51 UTC
    One thing that I would definitely do is to put all of this logic into one – possibly two – Perl modules which are tasked with maintaining the entire storage system. I would also "future-proof" the design by prefixing the file with a file-version identifier so that the file, whatever it turns out to be, is "self-describing" to programming that is in the know ... programming which occurs in exactly one place or set of places.

      @theOP: One thing you definitely want to do, is ignore the guy (sundialsvc4 �incognito�™) I'm responding to.

      See http://perlmonks.com/?node=worst+nodes and scroll to the bottom to see why.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
        I went to that link, scrolled down to the bottom (and middle, and toward the bottom :)), but the reference eludes me, sorry.

        Matt.

      The designers of this (not me) are on the same page. This data file is a separate module with a versioned module name.

      Good to know this, at least, meets the best practice.

      Thanks,
      Matt.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1210091]
Approved by Corion
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-25 07:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found