Research ideas

Wodin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re (tilly) 1: Research ideas by tilly (Archbishop) on Feb 20, 2001 at 04:49 UTC
Well I had an idea for how to optimize NFA RE engines that I think has potential, but I am not planning on doing any time soon. OK, it isn't directly Perl, but it does involve parsing and if it works well it would be appropriate for including in the RE engines of languages like Perl and Python. Take a look at this proposed optimization, and see if trying to implement and do basic tests on that interests you. If it does, feel free to contact me privately. (Not that I have done anything with it other than some preliminary analysis...) For the record the point is that, at the cost of extra work at compilation, it should allow any RE which can be solved by a DFA to be solved by an NFA without the exponential backtracking disasters that are traditionally problems for NFA engines.	[reply]
Re: Re (tilly) 1: Research ideas by gryng (Hermit) on Feb 20, 2001 at 05:05 UTC
You mean solved by the DFA right? Like egrep? Sounds like a good idea :) Ciao, Gryn	[reply]
Re (tilly) 3: Research ideas by tilly (Archbishop) on Feb 20, 2001 at 05:23 UTC
No, I meant solved by an NFA. Today NFAs can solve all problems that DFAs can solve, but it is possible to by accident write an NFA that will take several years to finish. My suggestion allows you to convert an NFA into an equivalent NFA which backtracks at fewer points. However it remains an NFA, and where an NFA would find a different match than a DFA, it will find the match that an NFA would find, and not the one that a DFA would. (Thereby allowing it to be used in an NFA engine without changing the behaviour.) Now if the RE is one that you could use a DFA to solve, then you can convert the NFA into an NFA with no backtracking anywhere. However it still retains NFA characteristics. For instance with alternation it will prefer to match the first alternative in the sequence, and won't match the second if that allows a longer overall match. Unfortunately optimizing the NFA that far may result in a combinatorial explosion from n states to n!, but since the optimization goes in steps, you can just optimize until you hit some threshold, and then stop. (Or you could run, profile, incrementally optimize, wash, rinse, and repeat.)	[reply]
Re: Re (tilly) 3: Research ideas by gryng (Hermit) on Feb 20, 2001 at 05:50 UTC
Re (tilly) 5: Research ideas by tilly (Archbishop) on Feb 20, 2001 at 10:12 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Re (tilly) 3: Research ideas by demerphq (Chancellor) on Mar 22, 2002 at 16:37 UTC
Re: Re (tilly) 1: Research ideas by Anonymous Monk on Mar 21, 2002 at 05:24 UTC
Are you sure you know the theory?	[reply]
Re: Re: Re (tilly) 1: Research ideas by tilly (Archbishop) on May 01, 2003 at 15:32 UTC
I'm positive that I know the theory, and that my suggestion will work. Figuring things like this out is what I studied math for. :-)	[reply]
Re: Research ideas by chromatic (Archbishop) on Feb 20, 2001 at 04:59 UTC
I'd like to see a module that can take a code reference and dump out a rough approximation of the code that created it. It would end up something like Data::Dumper or Storable, but it would allow me to serialize highly complex data structures. I guess the tricky bits are closures and code with eval EXPR bits in them. It's still worth some research, in my opinion.	[reply]
Re: Re: Research ideas by Trinary (Pilgrim) on Feb 20, 2001 at 05:09 UTC
I believe Damian Conway is working on something like this. Doing incredibly weird things with closures and such that are far beyond my understanding. Definately worth the effort, however. Trinary	[reply]
Re: Research ideas by Trinary (Pilgrim) on Feb 20, 2001 at 03:34 UTC
one of my favorites was the transitions from directed graphs interpreted as state machines to regular expressions. I did some translation back and forth in some undergrad theory classes, but never got to deal with doing things like traversal problems while the data is in regexp form. From a pure theory standpoint it offers an interesting way to use the solution of a known set of problems (graph coverage, etc) to be effective in a new space (operating on regexps). I wish I'd explored them further. G'luck Trinary	[reply]
A Research idea by gryng (Hermit) on Feb 20, 2001 at 03:28 UTC
Wodin, My slant is towards AI -- so my suggestion is genetic programming. That is, a program that creates programs to solve problems. I'm not sure if this is the kind of thing you are looking for, or allowed to do, but it's very interesting (at least to me :) ). Good luck :) Ciao, Gryn	[reply]
Re: A Research idea by t'mo (Pilgrim) on Feb 20, 2001 at 16:51 UTC
If "originality" is your goal, lots of experiments with GP have already been done. There should be some room for more, however. Some resources to examine first (that don't necessarily stay within the bounds of Perl) are: the garage geneticprogramming.com look under 'evolutionary computation'; if you can get hold of one of his papers, they're quite interesting CS bibliography; a useful resource; you'll probably also have to go to a library to do further research As an interesting and strange coincidence, I just started reading a book titled The Monk in the Garden by Robin Marantz Henig. If you didn't guess from the title, it's about Gregor Mendel.	[reply]
Re: A Research idea by ryan (Pilgrim) on Feb 20, 2001 at 14:23 UTC
I also find AI very interesting. During an expert systems uni course a while ago we looked at several very 'budget' expert system generation utilities such as 'First Class', 'Level 5' and probably a better one 'Prolog' etc.. they each cost tens of thousands of dollars in most cases, but are very primitive for tackling REAL issues. Perhaps expert system development could be an option. There are several AI modules around but I doubt any from what I have seen can provide full expert system generation or perhaps trained and untrained neural network creation, jNeural looks like it might one day - but not yet.	[reply]
Re: Research ideas by chorg (Monk) on Feb 20, 2001 at 04:00 UTC
You like parsers and the like? How about working on the B::CC backend, and making it work? That would be great!! _______________________________________________ "Intelligence is a tool used achieve goals, however goals are not always chosen wisely..."	[reply]
Re: Re: Research ideas by Wodin (Acolyte) on Feb 20, 2001 at 04:06 UTC
I like them, but I don't have a lot of experience with them. A non-grammar based parser for Quake 3 logfiles is the extent of my working parsers so far. Now, I've come a decent way since then, but I'm not the guy to fix B::CC -- I don't know C, for one thing. Thanks for the interesting tip though -- reading the POD for it was fun.	[reply]
(Coyote) Re: Re: Re: Research ideas by Coyote (Deacon) on Feb 20, 2001 at 04:32 UTC
How about a pure perl XML parser? This is from Simon Cozens weekly summary of the p5p list for the week of 2/13 to 2/19. You can find a copy of this at http://www.perl.com. Parsing XML One of the targets for Perl 5.8 is to speed up XML parsing, but nobody really has any idea how to do that. The current XML::Parser uses an external library, which means that a lot of speed is lost in flapping around in XS. The idea was mentioned of a pure-Perl version, which we would then be able to ship in core. Jarkko says: Okay, I lied. I do have an opinion: relying on an external library to do XML parsing is weird. expat is nice and is a de facto standard, and reinventing the wheel that has already been extensively invented and debugged is silly in the extreme -- but we are, after all, supposed to be The Text Processing Language. Matt Sergeant, as ever, had good XML suggestions: If you do that, I suggest/recommend at least doing it the Python way - by letting XML experts (i.e. a SIG) discuss what would be the best way to do it. Note also that ActivePerl ships with XML::Parser, though in a few months it may not necessarily be the best option any more. He also pointed out that: The speed problem is that expat is basically a callback/event based parser, so you have a storm of events crossing the XS/Perl barrier, meaning that you're constantly building SV's. Orchard can get around this by doing the parsing to a tree structure in C. (Note that Orchard is also based on expat). Or it can also do SAX based event passing, but again that's about as slow as XML::Parser. Doing it all in Perl is possible, but not entirely trivial to get exactly right. XML has a lot of annoying nuances that were left in from SGML, mostly to do with DTDs. And while I don't use most of the annoying features, I think I'd be upset if the core Perl started shipping an "XML" parser that wasn't fully XML compliant. I wonder if the perl-xml mailing list could go chew on that and let us know the best way to proceed. End of quote... ---- Coyote	[reply]
Re: Research ideas by BrentDax (Hermit) on Feb 20, 2001 at 14:07 UTC
I've always thought it would be interesting to see a language with hybrid Perl/C++ish syntax. For example, you actually have specifiers like class, but you keep Perl things like (roughly) typeless variables and different funny symbols for different ways of organizing data. Maybe you could even make the funny symbols extensible, so you could, for example, define + to mean a tree (so +foo is a tree named foo). The whole thing would be parsed and converted into Perl for execution, so you wouldn't have to re-write everything. I'm probably just insane, but it's an idea... =cut --Brent Dax `@HPAJ=split("", "rekcaH lreP rentonA tsuJ"); print reverse @HPAJ; #sucky but who cares?` [download]	[reply] [d/l]
Re: Research ideas by 2501 (Pilgrim) on Feb 20, 2001 at 04:38 UTC
How about work with natural language? Even with all sorts of modules, handling natural language is something huge. You could set a small goal to improve a piece of functionality in that area. For example, search engines. I put in the phrase "Is there a site where I can obtain pearl modules?" work on code which through a bit of intelligence and a bit of brute force realizes I typo'd and what I want is computer related PERL modules and not a module for a pearl or a module made of pearl. There is so much that can be done with natural languages it can make ones head spin just trying to grok it all. The only problem you might run into is biting off more then you can chew for the scope of the project. thanks, 2501	[reply]
Re: Research ideas by Wodin (Acolyte) on Apr 08, 2001 at 13:15 UTC
Update: Well, thank you for all of the nice suggestions -- I corresponded with tilly about his, and wrote up a proposal about it. Sadly, it came to a rather untimely end, as I was denied funding -- I guess they want to refocus the summer research grants on harder sciences this year. Thanks to those of you who put time and effort into helping come up with ideas. It was greatly appreciated, and there's the possibility I'll still tackle some of the stuff sometime in here.	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks