Help from the Perliest monks

perlmonknoob has asked for the wisdom of the Perl Monks concerning the following question:

Hello all! i recently joined a new company and they've started me on some very basic perl coding but im having trouble with a bit of perl using regex so any help would be much appreciated! Problem: I need to make a list of names in a document such as perl monk etc, then i need to create a script that takes these names and searches through another document for anything matching and then saves it all to a new document so this is what i've come with so far but im having trouble now as it's getting confusing


#!/usr/bin/perl
# Namecheck Program

$n = 0;    #this is what we will increment to go throug the list

$name = 'Defined Name list';     # Name of the file

$newName = 'name list to check defined list again';    
# Name of the second file

open(Definedlist, $name);         # opens the file and places the cont
+ents into Namelist

open(BigNamelist, $newName);    # opens the second file and places the
+ contents into Newlist

@firstArray = <Definedlist>;     # Read it into an array

@secondArray = <BigNamelist>     # Read it into the second array
 
close(Namelist) || die("Namelist Problem, stopped");             # Clo
+se the file 

close(Newlist) || die("Newlist Problem, stopped");             # Close
+ the file 


foreach $Nameline (@secondArray) # go through each item in array insid
+e the document
{
    egrep @firstArray[$n++] < $Nameline > newName.txt  # go through ar
+ray one and print all the names with that name e.g
                                 (shaun is chosen so print out all the
+ details of that line inside the new document)
        
}



TEST DATA



__Definedlist__

Test
Mai
Program
please


__BigNamelist__

test has scored 20 today with his program
and around the world the program worked
this is a stick situation Mai
[download]

so any help or a drection would be great my wise monks! thank you!

Comment on Help from the Perliest monks Download Code

Replies are listed 'Best First'.
Re: Help from the Perliest monks by kennethk (Abbot) on Jan 23, 2014 at 23:05 UTC
Okay, a brief list of things to consider: The `use strict` pragma is a great tool, particularly if you are just learning. It catches a lot of silly errors. Similarly for `use warnings`. See Use strict warnings and diagnostics or die for a discussion why. Rather than two-argument open w/ bareword filehandle, it's probably a better idea to use three-argument open with an indirect handles. See perlopentut for lots of gory detail. You should also check if the open actually succeeded with an `or die` clause. In this case, your opens might look like: `open($Definedlist, '<', $name) or die "Open failed on $name: $!"; + # opens the file and places the contents into Namelist open($BigNamelist, '<', $newName) or die "Open failed on $name: $!"; + # opens the second file and places the contents into Newlist` [download] Your close statements are being run against non-existent filehandles. Using indirect filehandles with strict would have caught this error. Also, you don't usually need to close indirect filehandles since the close automatically when they go out of scope. There is no Perl `egrep` built-in (function? method? sub? I'm never sure what to call things). There is a grep. This particular task is also a FAQ. See How can I tell whether a certain element is contained in a list or array? You should probably pick up an into text to help you get started. You can't beat the price of Beginning Perl, though it's pretty out of date. See a whole list of books at http://learn.perl.org/books/. Anonymous Monk's suggestion of reading perlintro is also a good idea. HTH. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: Help from the Perliest monks by perlmonknoob (Initiate) on Jan 24, 2014 at 00:11 UTC
Thank you very much for your wisdom i will indeed buy these books perl seems a very fun language	[reply]
Re: Help from the Perliest monks by Laurent_R (Canon) on Jan 23, 2014 at 23:20 UTC
You basically have two possible strategies, choosing one or the other will depend one several factors, the main ones being the relative size of the the two lists and how well defined the names of the first list appear in the second one. Suppose your list of names is very short and your document quite large. For example, the document is the King James Bible and the list of names has only four names : (God, David, Mary, Jesus). You will probably want to read each line of the document and use a regular expression to print out each line that matches the regex. Something like this: `# ... while (<$INPUT>) { print $OUT if /God/ or /David/ or /Mary/ or /Jesus/; # could also be written: print $OUT if /God\|David\|Mary\|Jesus/;` [download] The first solution seems to be probably slightly faster than the one in the commented-out line, but it is essentially irrelevant because it is really fast anyway (about 0.1 second with the edition of the Bible that I used). The opposite case is when your name list is very large (say for example 10,000 words or more) and the document quite small. In this case, it is probably better to first load your name list into a hash, and then to read the document line by line, split each line into words and check if the word exists in the hash. Something like this (untested): `IN: while (<$INPUT>) { my @words = split /\b/, $_; foreach my $word (@words) { print $_ and next IN if exists $name_hash{$word}; } }` [download] With the same small list as above and the same document, execution time is at least 15 times longer (about 1.5 sec). (But I would not care in many cases, 0.1 sec. or 1.5 sec. often if an irrelevant difference.) But if the name list has a few hundred words or above, or if the document is significantly shorter, this second solution is likely to be the better one. Quite possibly you don't even care of speed, because it is so fast anyway, then chose the easiest algorithm (probably the first one).	[reply] [d/l] [select]
Re: Help from the Perliest monks (perlintro?) by Anonymous Monk on Jan 23, 2014 at 22:24 UTC
Have you read perlintro?	[reply]


Think about Loose Coupling
	PerlMonks