Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: RFC: XML::Entities

by hossman (Prior)
on Nov 18, 2007 at 02:02 UTC ( [id://651478]=note: print w/replies, xml ) Need Help??


in reply to RFC: XML::Entities

at a minimum, you should head jdporter's suggestions in response to the snippet you already posted. particularly since now you have all those chr calls inside a function body, so using the same set twice will redo all the computation.

Personally: I don't like duplicating data in a new format, you never know when the "authoritative" copy might change. i much prefer to have a tool to translate the data from the authoritative format to the format i want.

In this case, instead of a module with a bunch of hardcoded data structures -- how about an XML::Entities modules that knows how to parse the raw .ent files to generate the data structures? this would have the added benefit of working with all the various known entity sets and not just the one set you are currently interested in (not to mention, any custom entity sets someone else might make in the future)

That module could be used by the Makefile.PL of other modules (named things like XML::Entity::ISO8879) to automaticly download the .ent files the perl data structures to write as source code for fast reuse. (or at the very least, you could use a module like i describe at build time to generate modules like the one you've already made easily -- but other people could use it too.)

Replies are listed 'Best First'.
Re^2: RFC: XML::Entities
by Sixtease (Friar) on Nov 18, 2007 at 12:39 UTC

    Yes, you're definitely right. I did automatize the process of retrieving the entities from the webpages but I parsed it from the .html files actually. :-) And I use bash, wget and perl for it, which I find much more comfortable in this case than pure perl.

    I surrounded the things in subs so it won't get evaluated if someone is only interested in one set and not in others. I should definitely cache the functions' return values. And are you sure it's better to add the semicolons by a map? I didn't really benchmark it, but I think this should be a bit faster and the code is simpler and more transparent, if twice as big. Dunno... it seems to me this is the better way but I am certainly open to counterarguments.

    Update: Looking at the .ent files, I see now that half of my effort was needless. :-) Oh well...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://651478]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-25 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found