Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

1. Read in a file into a scalar
2. Split scalar into an array using split(/\n{2}/, $scalar)

That's rather wasteful. You can probably do it all at once, if you're slick, iterating over each line in the file as you go. You can probably get #3, and #4 in there too. For #5 (the phone numbers), maybe keep a temporary hash outside the loop with the phone numbers in it, and as you go along, if it doesn't exist in the hash, add it, if it does, drop it. The biggest speed increase you could have, I would imagine, would be getting all of these points into a single loop over the file, and I believe it can be done.

It's a lot of code, and I don't want to embarass myself by coming up with something right now, but some ideas:

Iterate over each line of the file, of course. Keep a variable handy to store the current serial number (if any) and whether a record is open or not (in case there's any line noise between records and such, probably not important). Also, your records hash, and a hash of phone numbers that you're just going to get rid of in the end.

When you get a "SERIAL NUMBER (\d+)" line, put $1 in your current serial number value. When you get a { line, set open to true, and } set it to false. Anything else, while it's open, is stuff to shove into the hash. And, when you get a phone number, check so see if it already exists (in your phone number hash), and if it does, you can just delete() the current record out of the hash (or, do the check when you get a closing brace so if you have stuff after the phone number, it won't re-enter the record).

Oh, I forgot, when you get a SERIAL NUMBER thing, you can do the check there for repeating numbers and figure out a new one. It's not that important that you don't have all the records already, as if your new number is taken by a later record, that later record will be incremented too. However, if that's not the behavior you want, my entire suggestion goes out the window.

I hope you could follow. I could send you some actual code but it would take me some time to write up and test, so /msg me or something if you want some code.

local $_ = "0A72656B636148206C72655020726568746F6E41207473754A"; while(s/..$//) { print chr(hex($&)) }


In reply to RE: Efficiency and Large Arrays by reptile
in thread Efficiency and Large Arrays by Kozz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-26 00:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found