Form validation/Search script

No-Lifer has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Form validation/Search script by JediWizard (Deacon) on Oct 24, 2005 at 18:27 UTC
I highly recomend `use CGI;` as it will (among other things) make reading your parameters much easier: `use CGI qw(:standard); my(%params) =(); foreach my $param (param()){ $params{$param} = param($param); }` [download] Just makes life easier for us all. see CGI. They say that time changes things, but you actually have to change them yourself. —Andy Warhol	[reply] [d/l] [select]
Re: Form validation/Search script by ikegami (Patriarch) on Oct 24, 2005 at 18:37 UTC
To do a regexp search for "foo" and "bar", in any order, any distance from each other, one would use `^(?=.foo)(?=.bar)`. In the following snippet, a regexp of that form is constructed dynamically: my $text = 'Perl is a general-purpose programming language originally +developed for text manipulation and now used for a wide range of task +s including system administration, web development, network programmi +ng, GUI development, and more.'; # Text source: perlintro [http://perldoc.perl.org/perlintro.html] foreach my $keywords ( 'Perl development', 'Perl sucks', ) { $re = '^' . join '', map { "(?=.\\b$_\\b)" } map quotemeta, split ' ', $keywords; print("$re\n"); print("$keywords: ", $text =~ /$re/ ? "match" : "no match", "\n"); } [download] outputs `^(?=.\bPerl\b)(?=.\bdevelopment\b) Perl development: match ^(?=.\bPerl\b)(?=.*\bsucks\b) Perl sucks: no match` [download] Note: The use of `\b` is questionable. What if the keywords start or end with characters that don't match `\w`? This issue is left unresolved. By the way, you should use core module CGI instead of handling the CGI request in your code. It's much more reliable and maintainable. You should also search if a module already does what I just coded.	[reply] [d/l] [select]
Re: Form validation/Search script by Limbic~Region (Chancellor) on Oct 24, 2005 at 18:49 UTC
No-Lifer, This is the first node I have seen you write on this topic, so I am unfamiliar with the history. It sounds like you are building a rudimentary search engine. It also sounds like you want to do this on your own instead of using a pre-built wheel. There is nothing wrong with this approach in general, but it is also sometimes useful to learn about existing technology: Adding Search Functionality to Perl Applications Building a Vector Space Search Engine in Perl Find What You Want with Plucene Cheers - L~R	[reply]
Re^2: Form validation/Search script by No-Lifer (Initiate) on Oct 24, 2005 at 19:09 UTC
Limbic-Region, Cheers for the reply. As to what I'm doing - spot on. It's actually a bit of coursework I'm working on for University - I have a mandatory "Introduction to perl/cgi" class which I'm sucking at- but determined to get finished quite soon. So, yes, I'm building a very simple search engine to go through a few pages (try www.ally.nu - searching for "perl"). I've got a few bits and bobs working, thanks to the other Monks here, and nearly have an application I could submit. Bearing in mind that they're not expecting miracles from us - we're not programming students! I'm trying to do it in the most straightforward way possible - this is my first experience with perl shudder. The things that're stumping me at the moment are - if "submit" is pressed without any form data, how to display an "error" page. And secondly, the question above - an "AND" type search. My full code is below - I know it's a complete mess, will tidy it up at the end! Thank goodness we're not being assessed on code pretty-ness, purely on search engine function! #!/usr/bin/perl -w # The following code deals with the form data if ($ENV{'REQUEST_METHOD'} eq 'POST') { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $value =~ s/["]//gi; $value =~ s/[+]/ /gi; $FORM{$name} = $value; } } $keyword=$FORM{keyword}; chdir("/home/1008/gnicoll/www.abernyte.net/public_html"); opendir(DIR, "."); print "Content-type: text/html\n\n"; print"<STYLE>"; print"BODY {FONT-FAMILY: arial,sans-serif}"; print"TD {FONT-FAMILY: arial,sans-serif}"; print"DIV {FONT-FAMILY: arial,sans-serif}"; print"P {FONT-FAMILY: arial,sans-serif}"; print"A {FONT-FAMILY: arial,sans-serif}"; print"UNKNOWN {COLOR: #0000cc}"; print"</STYLE>"; print"<BODY bgColor=#ffffff topMargin=2 marginheight=2>"; print"<TABLE cellSpacing=2 cellPadding=0 width=100% border=0>"; print"<TBODY>"; print"<TR>"; print"<TD width=1% height=69 vAlign=top><a href=http://www.ally.nu><IM +G height=59 alt=Go to Noogle Home hspace=3 src=http://www.ally.nu/logo.gif width=143 vspace=5 border=0></a></TD>" +; print"<TD width=868></TD>"; print"</TR>"; print"</TBODY>"; print"</TABLE>"; print"<TABLE cellSpacing=0 cellPadding=0 width=100% border=0>"; print"<TBODY>"; print"<TR>"; print"<TD bgColor=#3366cc><IMG height=1 width=1></TD></TR></TBODY></TA +BLE>"; print"<TABLE cellSpacing=0 cellPadding=2 width=100% border=0>"; print"<TBODY>"; print"<TR>"; print"<TD bgColor=#e5ecf9 colSpan=4><B>Search Results</B> - Your Searc +h for the keyword(s) <strong>$keyword</strong> returned the following results:</TD></TR></T +BODY></TABLE><BR>"; print"<TABLE cellSpacing=0 cellPadding=2 width=100% border=0>"; print"<TBODY>"; print"<TR>"; print"<TD width=133 rowspan=2 vAlign=top noWrap bgColor=#ffffff><P><SM +ALL><A href=http://www.ally.nu>Noogle Home</A><BR><BR><A href=http://www.ally.nu/docs>Documentation</A><BR>< +br><A href=http://www.ally.nu/credits>Credits</A><br><BR><A href=http://www. +ally.nu/docs/faq>FAQ<br></A><BR><A href=http://www.ally.nu/quiz>Quiz</A><BR>"; print"<BR><BR>"; print"</SMALL></P></TD>"; print"<TD width=1 height=37 vAlign=bottom></TD>"; print"<TD width=1 rowspan=2 vAlign=bottom background=http://www.ally.n +u/dot2.gif><IMG height=1 src=http://www.ally.nu/dot2.gif width=1></TD>"; print"<TD width=1 vAlign=bottom></TD>"; print"<TD width=100% valign=top><P><B><FONT size=-1>Search Results</FO +NT></B></P></TD>"; print"</TR>"; print"<TR>"; print"<TD height=598 vAlign=bottom></TD>"; print"<TD vAlign=bottom></TD>"; print"<TD valign=top></p>"; print"<p></p>"; print"<p></p>"; print"<p></p>"; while($file = readdir(DIR)) { next if ($file !~ /.html/); open(FILE, $file); $foundone = 0; $title = ""; while (<FILE>) { if (/$keyword/i) { $foundone = 1; } if(/<title>/) { chop; $title = $_; $title =~ s/<title>//g; $title =~ s/<\/title>//g; } if(/<TITLE>/) { chop; $title = $_; $title =~ s/<TITLE>//g; $title =~ s/<\/TITLE>//g; } if($title eq "") { $title = $file; } if(/<META NAME="description" CONTENT="/i) { chop; $content = $_; $content =~ s/<META NAME="description" CONTENT="//g; $content =~ s/">//g; } if(/<META NAME="author" CONTENT="/i) { chop; $author = $_; $author =~ s/<META NAME="author" CONTENT="//g; $author =~ s/">//g; } if($content eq "") { $content = "No Meta-tag page information available"; } if($author eq "") { $author = "No Meta-tag author information available"; } $count++ while /$keyword/ig; } if($foundone) { print "<A HREF=/$file>$title</A><br>"; print"<table width=100% border=0 align=center bgcolor=#e5ecf9>"; print"<tr>"; print"<td height=10><font size=-1><b>Results</b>: <i>$count</i> occurr +ence(s) of the word(s) <i>\"$keyword\"</i> on this page.<br> <b>Page Description</b>: $content<br><b>Page Author< +/b>: $author<br><b>URL</b>:<font color=#008000>http://www.ally.nu/$file</td>"; print"</tr>"; print"</table>"; print"<br>"; $count = 0; $listed=1; } close(FILE); } if($listed ne 1) {print "<p><br>Sorry, your search returned <b>$foundone</b> res +ults. <A HREF=/index.html>Search Again?</A>";} else {print "<P><br>Do you want a <A HREF=/index.html>new search?</A +>";} print"</TD>"; print"</TR>"; print"</TBODY>"; print"</TABLE>"; print"<BR>"; print"<CENTER>"; print"<TABLE cellSpacing=0 cellPadding=0 width=100% border=0>"; print"<TBODY>"; print"<TR>"; print"<TD bgColor=#3366cc><IMG height=1 width=1></TD></TR></TBODY></TA +BLE>"; print"<TABLE cellSpacing=0 cellPadding=2 width=100% bgColor=#e5ecf9 bo +rder=0>"; print"<TBODY>"; print"<TR>"; print"<TD noWrap bgColor=#e5ecf9>"; print"<TABLE cellSpacing=0 cellPadding=0 width=100% border=0>"; print"<TBODY>"; print"<TR>"; print"<TD noWrap align=middle><FONT size=-1>�2005 Noogle - Napier Univ +ersity Server Side Languages Coursework <A href=http://www.ally.nu>Noogle Home</A> - <A href=http://www.ally.nu/d +ocs>Documentation</A> - <A href=http://www.ally.nu/credits>Credits</A> - <A href=http://www.ally. +nu/docs/faq>FAQ</A> - <A href=http://www.ally.nu/quiz>Quiz</A></FONT></TD></TR></TBODY></TABLE> +</TD></TR></TBODY></TABLE></CENTER></BODY></ HTML>"; closedir(DIR); exit; [download] I've also cannibalised quite a bit from other scripts - it's all cobbled together really. But at least I'm understanding what's happening! Cheers, NL.	[reply] [d/l]
Re: Form validation/Search script by Zaxo (Archbishop) on Oct 24, 2005 at 18:28 UTC
That depends on what you're searching. A database will more or less do it for you via DBI and SQL. Text search in files may use system grep or an index of some kind, or can use pure perl. Detail what you want. If a regex is needed, you can join the search terms with '\|' to make an alternation in the regex. That will find a match each time it finds one term. Your "standard" treatment of the posted form is not so standard any more. Just saying, `use CGI; my $q = CGI->new;` [download] takes care of all that. After Compline, Zaxo	[reply] [d/l]
Re: Form validation/Search script by wfsp (Abbot) on Oct 24, 2005 at 19:47 UTC
Hi No-Lifer! One way to go about this is to first build a list of keywords and then all you need do is a lookup. You could use HTML::TokeParser to extract the text from each page and something like this to extract the words (what are your plans for common words, accents, hypens, numbers etc.?). Then load the words into a DB (I use DBM::Deep) with a reference to each file that contains each word (I use another D::D for the file refs). All you have to bear in mind is to apply the same rules to the words submitted as you did when you built the index. Then perhaps HTML::Template to format the results and CGI::Session to display them a page at a time. At the moment I build the index locally and upload it. There are about 2k pages and the index comes in at a shade under 42k words (12MB). And, fingers crossed, it's working well. I would be interested in seeing an outline of how you plan to go about this. We may be able to make some suggestions before you commit yourself to any particular strategy. Good luck! After writing this I saw your reply to Limbic~Region. My advice above still stands.	[reply]
Re: Form validation/Search script by marto (Cardinal) on Oct 24, 2005 at 20:13 UTC
Hi No-Lifer, further to JediWizard's comment (Re: Form validation/Search script) I would suggest taking a read at Ovid's CGI Course - Why use CGI.pm. Hope this helps. Martin	[reply]
Re: Form validation/Search script by cees (Curate) on Oct 25, 2005 at 00:33 UTC
If you want to build a search mechanism that will scale to lots of pages, then you need to use an indexer of some sort. Have a look at CGI::Application::Search which integrates the swish-e search index into a CGI app using teh CGI::Application framework.	[reply]


Welcome to the Monastery
	PerlMonks