Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Help from the Perliest monks

by perlmonknoob (Initiate)
on Jan 23, 2014 at 21:43 UTC ( [id://1071828]=perlquestion: print w/replies, xml ) Need Help??

perlmonknoob has asked for the wisdom of the Perl Monks concerning the following question:

Hello all! i recently joined a new company and they've started me on some very basic perl coding but im having trouble with a bit of perl using regex so any help would be much appreciated! Problem: I need to make a list of names in a document such as perl monk etc, then i need to create a script that takes these names and searches through another document for anything matching and then saves it all to a new document so this is what i've come with so far but im having trouble now as it's getting confusing
#!/usr/bin/perl # Namecheck Program $n = 0; #this is what we will increment to go throug the list $name = 'Defined Name list'; # Name of the file $newName = 'name list to check defined list again'; # Name of the second file open(Definedlist, $name); # opens the file and places the cont +ents into Namelist open(BigNamelist, $newName); # opens the second file and places the + contents into Newlist @firstArray = <Definedlist>; # Read it into an array @secondArray = <BigNamelist> # Read it into the second array close(Namelist) || die("Namelist Problem, stopped"); # Clo +se the file close(Newlist) || die("Newlist Problem, stopped"); # Close + the file foreach $Nameline (@secondArray) # go through each item in array insid +e the document { egrep @firstArray[$n++] < $Nameline > newName.txt # go through ar +ray one and print all the names with that name e.g (shaun is chosen so print out all the + details of that line inside the new document) } TEST DATA __Definedlist__ Test Mai Program please __BigNamelist__ test has scored 20 today with his program and around the world the program worked this is a stick situation Mai
so any help or a drection would be great my wise monks! thank you!

Replies are listed 'Best First'.
Re: Help from the Perliest monks
by kennethk (Abbot) on Jan 23, 2014 at 23:05 UTC
    Okay, a brief list of things to consider:
    1. The use strict pragma is a great tool, particularly if you are just learning. It catches a lot of silly errors. Similarly for use warnings. See Use strict warnings and diagnostics or die for a discussion why.

    2. Rather than two-argument open w/ bareword filehandle, it's probably a better idea to use three-argument open with an indirect handles. See perlopentut for lots of gory detail. You should also check if the open actually succeeded with an or die clause. In this case, your opens might look like:

      open($Definedlist, '<', $name) or die "Open failed on $name: $!"; + # opens the file and places the contents into Namelist open($BigNamelist, '<', $newName) or die "Open failed on $name: $!"; + # opens the second file and places the contents into Newlist
    3. Your close statements are being run against non-existent filehandles. Using indirect filehandles with strict would have caught this error. Also, you don't usually need to close indirect filehandles since the close automatically when they go out of scope.

    4. There is no Perl egrep built-in (function? method? sub? I'm never sure what to call things). There is a grep. This particular task is also a FAQ. See How can I tell whether a certain element is contained in a list or array?

    You should probably pick up an into text to help you get started. You can't beat the price of Beginning Perl, though it's pretty out of date. See a whole list of books at http://learn.perl.org/books/. Anonymous Monk's suggestion of reading perlintro is also a good idea.

    HTH.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Thank you very much for your wisdom i will indeed buy these books perl seems a very fun language
Re: Help from the Perliest monks
by Laurent_R (Canon) on Jan 23, 2014 at 23:20 UTC

    You basically have two possible strategies, choosing one or the other will depend one several factors, the main ones being the relative size of the the two lists and how well defined the names of the first list appear in the second one.

    Suppose your list of names is very short and your document quite large. For example, the document is the King James Bible and the list of names has only four names : (God, David, Mary, Jesus). You will probably want to read each line of the document and use a regular expression to print out each line that matches the regex. Something like this:

    # ... while (<$INPUT>) { print $OUT if /God/ or /David/ or /Mary/ or /Jesus/; # could also be written: print $OUT if /God|David|Mary|Jesus/;
    The first solution seems to be probably slightly faster than the one in the commented-out line, but it is essentially irrelevant because it is really fast anyway (about 0.1 second with the edition of the Bible that I used).

    The opposite case is when your name list is very large (say for example 10,000 words or more) and the document quite small. In this case, it is probably better to first load your name list into a hash, and then to read the document line by line, split each line into words and check if the word exists in the hash. Something like this (untested):

    IN: while (<$INPUT>) { my @words = split /\b/, $_; foreach my $word (@words) { print $_ and next IN if exists $name_hash{$word}; } }
    With the same small list as above and the same document, execution time is at least 15 times longer (about 1.5 sec). (But I would not care in many cases, 0.1 sec. or 1.5 sec. often if an irrelevant difference.) But if the name list has a few hundred words or above, or if the document is significantly shorter, this second solution is likely to be the better one.

    Quite possibly you don't even care of speed, because it is so fast anyway, then chose the easiest algorithm (probably the first one).

Re: Help from the Perliest monks (perlintro?)
by Anonymous Monk on Jan 23, 2014 at 22:24 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1071828]
Approved by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-25 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found