If
you have a question on how to do something in Perl, or
you need a Perl solution to an actual real-life problem, or
you're unsure why something you've tried just isn't working...
then this section is the place to ask.
However, you might consider asking in the chatterbox first (if you're a
registered user). The response time tends to be quicker, and if it turns
out that the problem/solutions are too much for the cb to handle, the
kind monks will be sure to direct you here.
zsh version on alpine is 5.9 (x86_64-alpine-linux-musl). macos is zsh 5.8.1 (x86_64-apple-darwin22.0). which command reports: "command: shell built-in command" on both mac and on docker alpine.
Any ideas what the problem might be?
$PM = "Perl Monk's";
$MC = "Most Clueless FriarAbbotBishopPontiffDeaconCuratePriestVicar Parson";
$nysus = $PM . ' ' . $MC; Click here if you love Perl Monks
I feel like I may be asking for the impossible, but seeing as there are many programmers here of superior ability, I am hoping there are significant improvements that I would never have thought of. (And speed is a rare need for me, as most of my scripts are one-time-use or quick enough that efficiency is of minor concern.)
I'm searching on what amounts to a table of 30,000+ lines, with from one to five "columns." In Perl, each of these "columns" is an array (list) where each array has an identical number of rows/items. The search applies a regular expression to each column in each row, and the expression may differ.
For example, suppose we have a table like this:
Main
Comp. 1
Comp. 2
Comp. 3
Comp. 4
The red fox jumped over the hollow log.
The vixen jumped over the brown log.
The red fox leaped over the hollow log.
The gray fox jumped over the big log.
The lazy fox did not jump over the log.
The tall tree towered over the animals.
The tree swayed smartly in the field.
The animals looked up to the tall tree.
The tall tree grew in the field.
The towering tree shaded the animals.
The tawny deer jumped over the fence.
The doe jumped over the barrier.
The brown doe leaped over the picket fence.
The red deer cleared the tall fence.
The doe-eyed doe sprang over wall.
And queries like this:
Main
Comp. 1
Comp. 2
Comp. 3
Comp. 4
Match:
(hollow log)|(fence)
Match:
(jumped)
NOT match:
(vixen)|(animals)
NOT match:
(red fox)
Match:
(jump)|(leap)|(sprang)
This is just an English example to help understand the situation. The actual "columns" may represent different languages, i.e. translations, of the same thing, and each language/column will be searched with its own regular expression.
The data coming in to the subroutine includes the regex to use for each column (five--but if some are empty, that column does not need handling), the array for each column (which is skipped/empty if no regex for that column was provided), and whether or not the regex should match OR NOT match (boolean values for each column).
The expected results from the query would be the values of the first column for rows 1 and 3. The central row would not match. Only the first column's values get returned, the other columns are merely for comparison purposes--looking for similarities or contrasts to the "original" (main) column.
Here are the important bits of my code (abridged for better focus and readability):
sub processComparison {
# INCOMING COMPARISON-COLUMN DETAILS
my (
$table, #MAIN COLUMN NAME
$ACCP_searchver1, #COMPARISON COLUMN NAMES/SELECTIONS
$ACCP_searchver2,
$ACCP_searchver3,
$ACCP_searchver4,
$ACCP_comp1, #ORIGINAL USER-ENTERED QUERY
$ACCP_comp2,
$ACCP_comp3,
$ACCP_comp4,
$ACCP1_regex, #USER SELECTION FOR REGEX vs. SIMPLE MATCH
$ACCP2_regex,
$ACCP3_regex,
$ACCP4_regex,
$accpyn1, #USER SELECTION FOR MATCH/NO MATCH
$accpyn2,
$accpyn3,
$accpyn4
) = @_; # INCOMING ARRAY
my $regex1 = &composeRegex($ACCP1_regex,$ACCP_comp1);
my $regex2 = &composeRegex($ACCP2_regex,$ACCP_comp2);
my $regex3 = &composeRegex($ACCP3_regex,$ACCP_comp3);
my $regex4 = &composeRegex($ACCP4_regex,$ACCP_comp4);
my @main = &getTableFC($table);
my @crosscheck1 = &getTableFC($ACCP_searchver1) if ($regex1);
my @crosscheck2 = &getTableFC($ACCP_searchver2) if ($regex2);
my @crosscheck3 = &getTableFC($ACCP_searchver3) if ($regex3);
my @crosscheck4 = &getTableFC($ACCP_searchver4) if ($regex4);
my $linecount=0;
my ($line, $line1, $line2, $line3, $line4) = ('','','','','');
foreach $line ( @main ) {
$line1 = $crosscheck1[$linecount];
$line2 = $crosscheck2[$linecount];
$line3 = $crosscheck3[$linecount];
$line4 = $crosscheck4[$linecount];
$line =~ s/^\s+|\s+$//; chomp $line;
$line1 =~ s/^\s+|\s+$//; chomp $line1;
$line2 =~ s/^\s+|\s+$//; chomp $line2;
$line3 =~ s/^\s+|\s+$//; chomp $line3;
$line4 =~ s/^\s+|\s+$//; chomp $line4;
my ($r1,$r2,$r3,$r4) = (0,0,0,0); #USING THESE TO TALLY MATC
+HES
# CHECK REGEX MATCHES FOR COMPARISON COLUMNS
if ($regex1) {
$r1++;
if ($accpyn1) {
if ($line1 =~m/$regex1/) { $r1++ }
} else {
if ($line1 !~ m/$regex1/) { $r1++ }
}
};
if (($regex2) && ($r1!=1)) {
$r2++;
if ($accpyn2) {
if ($line2 =~m/$regex2/) { $r2++ }
} else {
if ($line2 !~ m/$regex2/) { $r2++ }
}
};
if (($regex3)&&($r1!=1)&&($r2!=1)) {
$r3++;
if ($accpyn3) {
if ($line3 =~m/$regex3/) { $r3++ }
} else {
if ($line3 !~ m/$regex3/) { $r3++ }
}
};
if (($regex4)&&($r1!=1)&&($r2!=1)&&($r3!=1)) {
$r4++;
if ($accpyn4) {
if ($line4 =~m/$regex4/) { $r4++ }
} else {
if ($line4 !~ m/$regex4/) { $r4++ }
}
};
# LINE UP MATCH RESULTS TALLIES
if ( ($r1!=1) && ($r2!=1) && ($r3!=1) && ($r4!=1) )
{ #PASSED COMPARISON FILTER
#DO CODE TO FORMAT & RETURN $line FOR MAIN COLUMN
}
} # end foreach $line
} #END SUB processComparison
As you may notice, I have attempted to exit early from loops that are found to be no longer necessary. If any single column fails to match its regex, the entire row will fail--so there is no need to extensively test the remaining columns. I have also pre-established my regular expressions, though I am not sure if there is a way to improve this. The expression used for each column will remain the same for all rows in the table.
At present, the average return, if using a regular expression of moderate complexity, is between 45 and 65 seconds, and in checking the processing times for various segments, this time is mostly (98%) concentrated in the regex-matching section. If it is strictly a "match"/"no match" for a simple word, without any regex alternations or other complexities involved, I have seen as little as 11 or so seconds. Even that seems a little bit longer than I wish, seeing as the results will be returned to the client's browser.
I'm asking sincerely, I'm not looking to criticize, or start a religious war. I'm curious on a technical level why the process takes so long, compared to say, Python modules. My typical experience is it takes seconds to install most Python modules and their dependencies, where Perl usually takes minutes, and will come with dozens of dependencies. This has been my observation as someone who develops Perl and Python on my local machines, both at work and home, however the same holds true when building docker images remotely, so it's not like Perl is just slow on "my machine".
I hate to say it, but the only other module system that I'm familiar with, that takes as-long or longer to build out modules is npm.
Hello Monks, I'm using Mojo to parse html and am trying to print the array that I have grabbed and I am getting the following when I print $r2:
r2 = Mojo:Collection=ARRAY (0x7fa51ac83968).
How can I print the content of the array? This is the code I used to grab the html that I wanted.
I am trying to solve day 8, part a. of Advent of Code: https://adventofcode.com/2022/day/8 . I am being flooded with uninitialized value warnings for lines 52, 55, 72, and 75. Here is one:
Use of uninitialized value in numeric lt (<) at ./day8a-l.pl line 55, <$line> line 98
My input is 99 lines with 99 numbers ranging from 0-9 on each line(no whitespace). Here is my code:
#!/usr/bin/perl
use strict;
use warnings;
my @row;
my $current = 0;
my $visible = 198;
my $found = 0;
open(my $lines, '<', 'input8.txt');
foreach (<$lines>)
{
chomp;
for my $l (0..length()-1)
{
my ($digit) = /^.{$l}(.)/;
if (defined $digit)
{
$row[$current][$l] = $1;
}
}
$current += 1;
}
close $lines;
open(my $line, '<', 'input8.txt');
$current = 0;
while (<$line>)
{
chomp;
if ($current > 0 and $current < 98)
{
$visible += 2;
for (my $i = 1; $i < 98; $i++)
{
my $counter = 1;
$found = 0;
while ($row[$current][$i-$counter] < $row[$current][$i] an
+d $found == 0)
{
$counter += 1;
if ($counter == $i)
{
$visible += 1;
$found = 1;
$counter = 1;
}
}
while ($row[$current][$i+$counter] < $row[$current][$i] an
+d $found == 0)
{
$counter += 1;
if ($counter == $i)
{
$visible += 1;
$found = 1;
$counter = 1;
}
}
while ($row[$current-$counter][$i] < $row[$current][$i] an
+d $found == 0)
{
$counter += 1;
if ($counter == $current)
{
$visible += 1;
$found = 1;
$counter = 1;
}
}
while ($row[$current+$counter][$i] < $row[$current][$i] an
+d $found == 0)
{
$counter += 1;
if ($counter == $current)
{
$visible += 1;
$found = 1;
$counter = 1;
}
}
}
}
$current += 1;
}
close $line;
print $visible . "\n";
Suppose we wish to highlight search terms returned from a user's query. Now, suppose the user has entered a regular expression, all of which must have matched to be returned from the query, but which also contains captured groups in the expression, and we wish to highlight only those terms in the results.
For example:
my $text = 'This is just an arbitrary example of text.';
#ONLY TWO WORDS ARE CAPTURED--WE WANT TO HIGHLIGHT ONLY THOSE
my $query = qr~(?:(?:This)|(?:That)).*?(just).*?(arbitrary).*?$~;
#THIS WOULD HIGHLIGHT THE ENTIRE LINE
$text =~ s~($query)~<span class="highlight">$1</span>~g;
How could we upgrade that last line to where it highlighted only the expressions that had been properly captured, i.e. "arbitrary" and "just" in this example?
NOTE: For compatibility purposes, it needs to work with Perl 5.12.4. This excludes the use of the @{^CAPTURE} variable that was not made available until Perl 5.25.7.
When a button is in the disabled state it does not respond -as advertised, but once the button is enabled again it responds to all presses that it received when it was in the disabled state. I can't figure out how to eliminate this behavior. I do not want the button to respond to anything that happened when it was disabled -ever. Can anyone help me with this? Here's a program to demonstrates the problem, it disables all buttons for 4 seconds whenever one is pressed. Thanks
use Tk;
my $mw = MainWindow->new;
my %btns;
for (qw(alpha beta gamma)){
my $name = $_;
$btns{$_} = $mw->Button(
-text => $_,
-command => sub{foo($name)},
)->pack;
}
MainLoop;
sub foo{
my $name = shift;
for (keys %btns){
$btns{$_}->configure(-state => 'disabled');
}
my %dispatch =(
alpha => \&alpha,
beta => \&beta,
gamma => \&gamma,
);
$dispatch{$name}->($name);
for (keys %btns){
$btns{$_}->configure(-state => 'normal');
}
}
sub alpha{
print "alpha\n";
sleep 4;
}
sub beta{
print "beta\n";
sleep 4;
}
sub gamma{
print "gamma\n";
sleep 4;
}
I have a text file converted from HTML. I need to format this file and output it into a standard size (8.5 X 11 inches), similar to the size of the regular MS word page, but still in text format. Below is my code. Any comments are greatly appreciated. Thank you!
use strict;
use warnings;
my($data, $input, $output, $newdata);
$input = 'C:\Users\xxx\Documents\input.txt';
$output = 'C:\Users\xxx\Documents\output.txt';
open (INFILE, '<', $input);
open (OUTFILE, '>', $output);
local $/;
$data = <INFILE>;
print OUTFILE "$newdata";
close INFILE;
close OUTFILE
Note the second call of string_reverse. From ibidem:
"At the end of the program we call reverse_string with undef, which gets translated to C as NULL. This allows it to free the output buffer so that the memory will not leak."
A smart aleck might jump to the conclusion to replace malloc with GC_MALLOC as described here.
Is this recommended or will it cause problems in the future not yet imaginable now?
Update: Thank you all for the kind and very informative responses. I now consider this a research project for the future. So far I don't even know if the mentioned library compiles on the Mac. I will report back.
Snippets of code should be wrapped in
<code> tags not<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).