Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Is Text::RE::Foo a good name space?

by fernandes (Monk)
on Oct 11, 2007 at 16:49 UTC ( [id://644277]=perlquestion: print w/replies, xml ) Need Help??

fernandes has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I would like to upload 5 very simple modules for CPAN, and, following the suggested protocol, I’m here to ask about their name spaces. These modules provide RE for Text::Statistics’ delimiters. So, they are useful mostly in this framework. Although as other RE providers, they can be used for any imaginable uses as well. Mostly of course as script-based delimiters for text/natural language processing.

For example, Text::RE::Latin provides the following utf-8 RE:
/[-@]|[\[-`]|[{-¿]|[ɐ-˩]|[ʹ-�]/
The remaning name spaces I'm proposing are:

Text::RE::Arabic
Text::RE::Cyrillic
Text::RE::Devanagari
Text::RE::GreekandCoptic

This is an update on Text::Statistic framework for saving memory of storage in CPAN.

Thank you!

Replies are listed 'Best First'.
Re: Is Text::RE::Foo a good name space?
by moritz (Cardinal) on Oct 11, 2007 at 16:54 UTC
    Sorry for not having a better idea, but the name is not good at all. When I Read Text::RE::Latin I have no idea what the module does (and since you didn't describe it verbosely I can just guess).

    Perhaps Text::RE::Statistics::Foo or Text::Statistics::RE::Foo might be better.

      Thank you!
      I will follow your sugestion - Text::Statistics::RE::Foo - it was really my second option. Thanks again.
Re: Is Text::RE::Foo a good name space?
by ikegami (Patriarch) on Oct 11, 2007 at 17:16 UTC

    Not a comment on the question, but on the regexp.

    /[a-b]|[c-d]|[e-f]/
    can be written as a single character class
    /[a-bc-de-f]/

    Also, it might be easier to specify what's valid and negate the set.
    /[\x00-@\[-`]...]/
    would become
    /[^a-zA-Z...]/
    Both are equivalent, but the latter is more readable.

      Probably your mother language is English. It is low frequency language on comma occurrences. Then, your spell out module – this is a chomskian expression - process [a-bc-d] “more” productively than [a-b]|[c-d]. But, try to think out of your box, and you will can see that it is not necessary so. To me, what you gain in reducing length, suppressing OR operators, you loose in readability. Any way, thanks for the free wisdom.

        I must admit I realized it was less readable as I was posting. I wonder if there's a significant speed difference, since this regexp is likely to be used repeatedly. Either way, the regex could be built dynamically when the module is loaded. That would allow for much more readable code than any of the alternatives we've seen in this thread.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://644277]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-20 10:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found