Function for reading file

shabird has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks! I have a file which i read and extract information from it, the file contents is as follows:

DATA    Numbers five; Crystal; Reliable chromo; Onion; Apple;
            Salt.
CHEM      Synechocystis sp. PCC 6803 substr. Kazusa
  NAMES  Synechocystis sp. PCC 6803 substr. Kazusa
            Bacteria; Cyanobacteria; Synechococcales; Merismopediaceae
+;
            Synechocystis; unclassified Synechocystis.
[download]

I am extracting only DATA and its corresponding values. For this i wrote the following program.

    open(FH, "/Users/Desktop/file.txt") or die; 
my $content = join("", <FH>); 
close(FH);
my @matches = $content =~ /DATA\s+([A-Za-z\W]+)\n/g;
 print "Data:@matches\n";
[download]

Now everything works file but i want to use a subroutine for this task, where the subroutine takes as input argument the content of the file and returns all data (DATA) from the file. In the return argument only the DATA should be included and nothing else.

How can i achieve this? any inputs will be appreciated a lot :)

Comment on Function for reading file Select or Download Code

Replies are listed 'Best First'.
Re: Function for reading file by hippo (Bishop) on Mar 25, 2020 at 09:43 UTC
Your code as it stands isn't doing what you think because you open FH, but read from GB. Regardless, if you do have `$content` correctly populated somehow then you can abstract the matching away to a subroutine like this: `my @matches = foo ($content); print "Data:@matches\n"; sub foo { return shift =~ /DATA\s+([A-Za-z\W]+)\n/g; }` [download] See perlsub and our subroutine tutorials for further info.	[reply] [d/l] [select]
Re^2: Function for reading file by shabird (Sexton) on Mar 25, 2020 at 09:54 UTC
My bad i have updated the code it was a mistake, but your solution works good :)	[reply]
Re: Function for reading file by davido (Cardinal) on Mar 25, 2020 at 15:15 UTC
`[A-Za-z\W]` is a really strange regex. In an ASCII-only character set it will match every character except 0-9 and underscore (_). It may be exactly what you are after but it's unusual. Dave	[reply] [d/l]
Re^2: Function for reading file by shabird (Sexton) on Mar 28, 2020 at 13:11 UTC
How can i use it without the word regex \W ?	[reply]
Re^3: Function for reading file by davido (Cardinal) on Mar 28, 2020 at 23:36 UTC
Is your goal really to match every character in the ASCII character set except for 0-9 and _ (underscore)? I couldn't suggest a better alternative without knowing exactly what you want. However, again if you're only working with ASCII and not Unicode, these expressions do the same thing, in slightly different ways: `[A-Za-z\W]` - Match A-Z, or a-z or any character that is not A-Z a-z, 0-9, _. This reduces to matching all characters that are not 0-9, _. `[^_\d]` - Match any character that is not _ or 0-9. Much simpler way to state the previous expression. `(?!_)\D` - Don't match if the next character will be an underscore, and don't match if that character is 0-9. This one uses look-ahead to check first if the next thing would not be _, then advances the position in the string and checks if the current thing is not 0-9. It uses lookahead, and then look at semantics. `\D(?<!_)` - Don't match if the current character is 0-9. Also if that character was _ (underscore) fail to match. This checks the character at the current position, and then after advancing the marker in the string looks back to see that it was also not underscore. It uses look-at, and then lookbehind semantics. Of these options the second one is certainly the easiest to read. But my point in my previous post was that I'm doubtful this is exactly what you want. It seems very suspect to allow matching \t, \n, (, ), ^, -, A, B, z, ' '(space), ','(comma), and so on, but to disallow any numeric digit and the underscore. It doesn't seem like it's doing what you want it to be doing. But you didn't make clear to me what it is that you actually want to do. Furthermore, if you are dealing with Unicode semantics, the number of characters that are matched by that pattern is enormous, and even weirder. If you suggested what you're trying to match we might be able to help come up with a more specific expression. Dave	[reply] [d/l] [select]
Re^3: Function for reading file by AnomalousMonk (Archbishop) on Mar 29, 2020 at 01:42 UTC
... the word regex \W ... It's important to understand what you're dealing with. The `\W` (that's big-W) character class (see perlrecharclass) matches any character that is not a `\w` (little-w) character. The `\w` characters are sometimes called "word" characters, but IIRC they originate with the set of characters that are allowed in a C- or Perl-language identifier; that's why `_` (underscore) is included, but `-` (hyphen), for instance, is not. So `\W` is better described as the anti-word regex! And I agree with davido's point here that if `[A-Za-z\W]` really does the trick for you, then `[^_\d]` is more clear, readable, maintainable, and IMHO preferable. Update: Made "identifier" into a Wikipedia link. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks