Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Modifying a parameter to a recursive function

by CoDeReBeL (Initiate)
on Apr 08, 2009 at 20:19 UTC ( [id://756451]=perlquestion: print w/replies, xml ) Need Help??

CoDeReBeL has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

I'm having trouble with the following code. It contains a subroutine that calls itself recursively. It modifies the value of its input parameter, but that value immediately reverts to its former state once the function exits. I've pretty much come to the conclusion that it's a shortcoming of Perl and there's no way around it. I read somewhere that in order to modify a subroutine's argument you have to use @_ or $_, so I did. The same article said that as soon as you set some other variable to $v = $_ it will be impossible to modify the value of $_ afterwards. I think this may be happening when I call a method on the passed object reference or use a regular expression on a string. The subroutine takes a list of scalar parameters, the value of which may be either a reference to an object or a string. There's no way to do anything useful with it unless you know which one it is, obviously, so you have to at least check that.

The program below "works," in that the input file(s) is processed and the output file is written without any errors (depending on the input file, I suppose) or warnings. The print STDERR $_ ; at the very bottom prints a string that has been modified exactly the way you'd expect, but when the function returns (usually to itself, in the case of a string) the string reverts to its previous value.

I've tried various re-writings of the function to no avail. I've made $tree a global variable and that failed, too. I've pretty much decided that this is not going to work because of some shortcoming of Perl, but I've really only been playing around with Perl for about half a week now, so I'm pretty sure that the wisdome of the Perl Monks far surpasses mine.

It's not a big deal or anything becauses there are several other ways to do it (and other languages to do it in, for that matter) but it just keeps nagging at me in the back of my head and I'd just like to know what the deal is here to satisfy my curiosity.

use strict; use warnings; use diagnostics; use HTML::TreeBuilder; use HTML::Entities; use HTML::Element; sub traverse ; foreach my $file_name (@ARGV) { my $tree = HTML::TreeBuilder->new ; $tree->parse_file($file_name); $tree->dump ; print "\n\nWhere would you like to put the output file? " ; my $output = <STDIN> ; open OUTPUT_FILE, "> $output" or die $! ; select OUTPUT_FILE ; traverse ($tree); $tree = $tree->delete ; close OUTPUT_FILE or die $!; } sub traverse { foreach (@_) { if ($_) { if (ref $_) { print STDERR $_->tag(), "\n\n" ; if ($_->tag() ne "head" && $_->tag() ne "script" && $_->tag() ne "img" && $_->tag() ne "object" && $_->tag() ne "applet") { my @contents = $_->content_list() ; traverse (@contents) ; } if (!$_->parent) { my $s = $_->as_HTML ("",{}) ; $s =~ s/></>\n</g ; print $s ; } } else { print STDERR "Processing a string element...\n" ; $_ =~ s/\s&\s/ &amp; /g ; $_ =~ s/</&lt;/g ; $_ =~ s/>/&gt;/g ; $_ =~ s/'em\s/&rsquo;em /g ; $_ =~ s/'tis\s/&rsquo;tis /g ; $_ =~ s/'twas\s/&rsquo;twas /g ; $_ =~ s/'Twas\s/&rsquo;Twas /g ; $_ =~ s/'Tis\s/&rsquo;Tis / ; $_ =~ s/'\s/&rsquo; /g ; $_ =~ s/^'/&lsquo;/g ; $_ =~ s/(\s)'/$1&lsquo;/g ; $_ =~ s/"'/&ldquo;lsquo;/g ; $_ =~ s/'"/&rsquo;&rdquo;/g ; $_ =~ s/\s"/ &ldquo;/g ; $_ =~ s/^'/&lsquo;/g ; $_ =~ s/^"/&ldquo;/g ; $_ =~ s/"\s/&rdquo; /g ; $_ =~ s/'$/&rsquo;/g ; $_ =~ s/"$/&rdquo;/g ; $_ =~ s/(,|\.)'/$1&rsquo;/g ; $_ =~ s/(,|\.)"/$1&rdquo;/g ; $_ =~ s/(\S)'(\S)/$1&rsquo;$2/g ; print STDERR ($_ , "\n\n"); } } } }

Update:

I’ve been playing around with this code for a while and thought I'd post the latest version. The "main" function is more like it was when I started...

foreach my $file_name (@ARGV) { my $tree = HTML::TreeBuilder->new ; $tree->parse_file($file_name); print "\n\nWhere would you like to put the output file for $file_nam +e? " ; my $output = <STDIN> ; open OUTPUT_FILE, "> $output" or die $! ; $tree = traverse ($tree); print OUTPUT_FILE $tree->as_HTML (""," ",{}) ; $tree = $tree->delete ; close OUTPUT_FILE or die $!; }

...with all the output going on here again. The only reason any output was being done inside the traverse sub to begin with was because I figured, "Hey, if I can't get at the modified strings here I'll just go in there and get them," but that hasn't panned out. The new version of traverse goes like this...

sub traverse ($) { my $element = $_[0] ; if ($element) { if (ref $element) { print $element->tag(), "\n\n" ; if (go_ahead ($element)) { for my $child ($element->content_list()) { $child = traverse ($child) ; } } } else { print "Processing a string element...\n" ; $element = curly_quotes ($element) ; print ($element , "\n\n"); } } return $element ; }

go_ahead() and curly_quotes() just do what the chunks of code they've replaced did, and they are working fine. Running the program gives me the following output at the command line:

C:\ ... \Programs\HTMLify>htmlify3.pl test.html Where would you like to put the output file for test.html? output.txt html head body h1 Processing a string element... Testing...&rsquo; p Processing a string element... &ldquo;lsquo;I&rsquo;d rather ne&rsquo;er have been here,&rsquo;she sa +id,&rdquo; I said. &ldquo;So what?&rdquo; I &lsquo;wondered.&rsquo; pre p Processing a string element... Jack &amp; Jill went &lt; the hill to &gt; a pail of water. &amp; &lt; +&gt; p Processing a string element... &ldquo;I&rsquo;m really happy!&rdquo;

but after running it the file output.txt looks like this...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <meta content="text/html; charset=iso-8859-1" http-equiv="content- +type" /> <meta content="javascript" http-equiv="Content-Script-Type" /> <title>HTMLify - Convert text to HTML paragraphs</title> </head> <body lang="en-US" xml:lang="en-US"> <h1>Testing...'</h1> <p>"'I'd rather ne'er have been here,'she said," I said. "So what? +" I 'wondered.'</p><pre></pre><p>Jack & Jill went < the hill to > a p +ail of water. & <></p> <p>"I'm really happy!"</p> </body> </html>

...and I'm really wondering right now what that’s gonna like, but anyway...

Given bellaire’s response and the other examples, I’m convinced that it really should work. I’m especially mystified that returning $element doesn’t work. I agree with ig and mr mischief that the HTML::Element involved here is probably conspiring against me and just not letting me change its content no matter what I do, probably by feeding me a copy of a copy of the string instead of the real one.

I’ve experimented with the content_refs_list method as they suggested but so far without success. The main problem with that seems to be telling the difference between a reference to a string and a reference to an HTML::Element. I keep getting yelled at by Perl for calling methods on unblessed references and such, which makes me think that my strings are getting through the if (ref ${$element}) test, which they shouldn’t.

I also had a mysterious incident at one point. I inadvertently left this line...

${element} = curly_quotes (${element}) ;

...like that after changing the rest of the function back to what it had been, so that...

sub traverse ($) { my $element = $_[0] ; if ($element) { if (ref $element) { print $element->tag(), "\n\n" ; if (go_ahead ($element)) { for my $child ($element->content_list()) { $child = traverse ($child) ; } } } else { print "Processing a string element...\n" ; ${$element} = curly_quotes (${element}) ; print ($element , "\n\n"); } } return $element ; }

...actually ran and actually didn’t crash or complain or anything. I don’t see how (in a logical world, anyway) $element in the routine above got past the if (ref $element) test and into the else clause below it if it could be dereferenced safely. But I’m not pretending to be an expert on Perl after 3 or 4 days, either. Maybe Perl itself just expects occasional mistakes like that and works around them.

ig probably has the answer here...

{ my @contents = $_->content_list() ; print STDERR "before: @contents\n"; traverse (@contents) ; print STDERR "after: @contents\n"; $_->detach_content(); $_->push_content(@contents); }

...or something very close to that. It's probably just a matter of figuring out where exactly to call those methods, etc.

Thanks, everybody, for helping out.

Update 2

It works!

sub traverse { for my $element (@_) { if (Scalar::Util::blessed ($element)) { if (go_ahead($element)) { my @contents = $element->content_list() ; print "Before: ", @contents, "\n\n" ; traverse(@contents) ; print "After: ", @contents, "\n\n" ; $element->detach_content() ; $element->push_content (@contents) ; } } else { print "Processing a string: " ; $element = curly_quotes($element) ; print $element, "\n\n" ; } } }

There’s the working version, in case anyone is interested. Took me awhile to get it going, though. For a while I kept getting yelled at by Perl for calling methods on unblessed references until I finally found the Scalar::Util::blessed() sub mentioned in the online documentation for Perl. Couldn't (or just didn't happen to) find it in the ActivePerl documentation. I still think that either one of these should have worked, but they didn’t...

sub traverse { my $element = $_[0] ; if (Scalar::Util::blessed ($element)) { if (go_ahead($element)) { my @contents = $element->content_list() ; print "Before: ", @contents, "\n\n" ; for my $child (@contents) { traverse ($child) ; print "Middle: ", $child, "\n\n" ; } print "After: ", @contents, "\n\n" ; } } else { print "Processing a string: " ; $element = curly_quotes($element) ; print $element, "\n\n" ; } }

The only difference between that version and the one that works is that the one that works does the for (@_) loop and this one just handles one parameter at a time. Shouldn’t make a difference but it does.

sub traverse { my $element = $_[0] ; if (Scalar::Util::blessed (${element})) { if (go_ahead(${element})) { my @contents = ${element}->content_refs_list() ; print "Before: ", @{contents}, "\n\n" ; for my $child (@contents) { traverse (${${child}}) ; print "Middle: ", ${${child}}, "\n\n" ; } print "After: ", @{contents}, "\n\n" ; } } else { print "Processing a string: " ; ${element} = curly_quotes(${element}) ; print ${element}, "\n\n" ; } }

This one did some weird stuff. For some reason it will only run if you dereference child twice, which doesn’t make sense to me at all. It makes even less sense to me that it doesn’t work even though it runs, but both the versions above print an unaltered string in the print "Middle... line just after the traversal of $child. Oh, well.

Although it did save me some work, the HTML::Element class gave me a lot of aggravation, too, and I found 3 major faults with it that I'm pretty sure are not my fault:

  1. The test if ($h->tag() == "pre") returns undefined when it should obviously be true because the element in question quite obviously has that tag.
  2. Anything in a <pre> tag fails to show up in the string returned by $h->asHTML().
  3. And my favorite: Calling $h->dump on a <pre> element with <![CDATA[...]]> in it caused Perl to output my source code from the spot where I called it to the end of my source file. That was entertaining.
  4. I find it extremely distasteful to mix object references and text strings in a list just generally speaking. If you don’t know which is coming when, there’s not much really useful about it since you can’t call a method on a string. If there’s not a package for a string class somewhere on CPAN (and I can’t believe there’s not even though I haven’t actually looked yet), there certainly should be. Even if there’s not whoever wrote the HTML package should have made an HTML::String object of their own descended from HTML::Element if they wanted (not like there’s actually much choice) to mix strings and elements in lists.

    I would have preferred to be using an XHTML package but I couldn’t find one. There&s an XML package but its parser looks like a pain in the butt to deal with (although, in hindsight, perhaps it would have been less painful to just dig into it anyway) and the XML::Simple package says that it won’t handle embedded content such as <b> tags inside of <p> tags. Oh, well.

    Thanks again for the wisdom.

    Cheers!

Replies are listed 'Best First'.
Re: Modifying a parameter to a recursive function
by GrandFather (Saint) on Apr 08, 2009 at 21:04 UTC

    The immediate 'problem' is that the for loop variable ($_ in your case) is an alias to each element in turn of the list being iterated over. It really isn't the same $_ that is used outside the loop. The same thing happens with $_ in map and grep. If you really want to retain the last processed value in a for loop you need to copy it to another variable explicitly:

    sub traverse { my $result; for my element (@_) { ... $result = 'my interesting value to be retained on the last ite +ration'; } return $result; }

    You really, really, really ought to use a variable instead of the default variable for your for loop variable btw. Using the default variable for more than a couple of lines and expecting it not to change by magic is just asking for trouble!

    Oh, and you should use the three parameter version of open and lexical file handles:

    open my $outputFile, '>', $output or die "Failed to create $output: $! +\n";

    True laziness is hard work
      While your points are definitely good advice, I'm not sure that's what he's asking about. I think the OP is asking about modifying the parameters to the subroutine in-place. For example:
      @a = (1..10); sub x { foreach (@_) { $_++; } } x(@a); print @a; #prints 234567891011
      So in principle, you can modify a subroutine's arguments using the default variable $_ inside a foreach loop. Question is, why isn't it working inside his recursive function?

        I don't think that is really the OP's problem. What you suggest can be checked trivially by:

        use strict; use warnings; my @array = (1 .. 8); print "Before: @array\n"; nastyModifyParamsSub (@array); print "After: @array\n"; sub nastyModifyParamsSub { for my $element (@_) { ++$element; } }

        Prints:

        Before: 1 2 3 4 5 6 7 8 After: 2 3 4 5 6 7 8 9

        Which shows that modifying the parameters is possible and that there is nothing magical about requiring $_ to do it. The following though shows one way that OP's code could be going awry:

        use strict; use warnings; my @array = (1 .. 8); print "Before: @array\n"; nastyModifyParamsSub (@array); print "After: @array\n"; sub nastyModifyParamsSub { for (@_) { ++$_; clobberDefVar (); } } sub clobberDefVar { $_ = 0; }

        Prints:

        Before: 1 2 3 4 5 6 7 8 After: 0 0 0 0 0 0 0 0

        The bottom line though to that the OP's description is an fuzzy unclear thing and there is too much code and no data at all.


        True laziness is hard work
Re: Modifying a parameter to a recursive function
by Roy Johnson (Monsignor) on Apr 08, 2009 at 21:11 UTC
    The only recursive call you have passes an array that you just declared and assigned in a minimal scope around the recursive call. The values you're modifying within the call are thrown away when that scope ends. It's not clear to me what you want to be modified. Do you expect $_->content_list() to return different values after the recursive call?

    bellaire gave a good example of the sort of value-modification you seem to want.


    Caution: Contents may have been coded under pressure.
Re: Modifying a parameter to a recursive function
by ig (Vicar) on Apr 08, 2009 at 21:33 UTC

    Please don't give up on perl just yet. As others have pointed out, it certainly is possible for a subroutine to modify a variable of the caller. In fact, your script is doing that but you then throw the modified value away without using it.

    Try the following:

    use strict; use warnings; use diagnostics; use HTML::TreeBuilder; use HTML::Entities; use HTML::Element; sub traverse ; foreach my $file_name (@ARGV) { my $tree = HTML::TreeBuilder->new ; $tree->parse_file($file_name); $tree->dump ; print "\n\nWhere would you like to put the output file? " ; my $output = <STDIN> ; open OUTPUT_FILE, "> $output" or die $! ; select OUTPUT_FILE ; traverse ($tree); $tree = $tree->delete ; close OUTPUT_FILE or die $!; } sub traverse { foreach (@_) { if ($_) { if (ref $_) { print STDERR $_->tag(), "\n\n" ; if ($_->tag() ne "head" && $_->tag() ne "script" && $_->tag() ne "img" && $_->tag() ne "object" && $_->tag() ne "applet") { my @contents = $_->content_list() ; print STDERR "before: @contents\n"; traverse (@contents) ; print STDERR "after: @contents\n"; } if (!$_->parent) { my $s = $_->as_HTML ("",{}) ; $s =~ s/></>\n</g ; print $s ; } } else { print STDERR "Processing a string element...\n" ; $_ =~ s/\s&\s/ &amp; /g ; $_ =~ s/</&lt;/g ; $_ =~ s/>/&gt;/g ; $_ =~ s/'em\s/&rsquo;em /g ; $_ =~ s/'tis\s/&rsquo;tis /g ; $_ =~ s/'twas\s/&rsquo;twas /g ; $_ =~ s/'Twas\s/&rsquo;Twas /g ; $_ =~ s/'Tis\s/&rsquo;Tis / ; $_ =~ s/'\s/&rsquo; /g ; $_ =~ s/^'/&lsquo;/g ; $_ =~ s/(\s)'/$1&lsquo;/g ; $_ =~ s/"'/&ldquo;lsquo;/g ; $_ =~ s/'"/&rsquo;&rdquo;/g ; $_ =~ s/\s"/ &ldquo;/g ; $_ =~ s/^'/&lsquo;/g ; $_ =~ s/^"/&ldquo;/g ; $_ =~ s/"\s/&rdquo; /g ; $_ =~ s/'$/&rsquo;/g ; $_ =~ s/"$/&rdquo;/g ; $_ =~ s/(,|\.)'/$1&rsquo;/g ; $_ =~ s/(,|\.)"/$1&rdquo;/g ; $_ =~ s/(\S)'(\S)/$1&rsquo;$2/g ; print STDERR ($_ , "\n\n"); } } } }

    On a simple test file:

    <html> <head> <title>test</title> </head> <body> This is some "text" in the body. </body> </html>

    This produced the following output:

    <html> @0 <head> @0.0 <title> @0.0.0 "test" <body> @0.1 " This is some "text" in the body. " Where would you like to put the output file? test.ou html before: HTML::Element=HASH(0x841e528) HTML::Element=HASH(0x841e5c8) head body before: This is some "text" in the body. Processing a string element... This is some &ldquo;text&rdquo; in the body. after: This is some &ldquo;text&rdquo; in the body. after: HTML::Element=HASH(0x841e528) HTML::Element=HASH(0x841e5c8)

    Note that the "after" value of @content is different from the "before" value - the subroutine is modifying the callers variable. But this variable is a lexical (my) variable within the scope of the block of the if statement. When it goes out of scope it is discarded without your having done anything with it.

    update: If you want to modify the content of one of the nodes you might find the content_refs_list method useful. From the HTML::Element documentation:

    This returns a list of scalar references to each element of $h’s content list. This is useful in case you want to in-place edit any large text segments without having to get a copy of the current value of that segment value, modify that copy, then use the "splice_content" to replace the old with the new. Instead, here you can in-place edit:

    It will be well worth your time to read HTML::Element carefully.

    Using this method will require some change to your traverse sub.

    Alternatively, you can explicitly replace the node content as you traverse the tree with something like the following:

    { my @contents = $_->content_list() ; print STDERR "before: @contents\n"; traverse (@contents) ; print STDERR "after: @contents\n"; $_->detach_content(); $_->push_content(@contents); }
Re: Modifying a parameter to a recursive function
by mr_mischief (Monsignor) on Apr 08, 2009 at 21:24 UTC
    I'm not sure what proof you have that your strings are reverting. Be aware that both @contents and $s are the results of calling a method on an object. If you are expecting a substitution on a string returned by contents_list to be refs, then then you really want content_refs_list instead. The HTML::Entity method contents_list only returns a list, not a list of refs. It seems you're changing something other than what you want.

    As for being able to change your arguments in Perl, take a look at this simple test case:

    my @foo = qw( aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff); sub foo { for ( @_ ) { $_ =~ s/(.)\1/$1/ || return; print $_ . "\n"; foo( $_ ); } } foo @foo; print @foo;

    The above produces this:

Re: Modifying a parameter to a recursive function
by jethro (Monsignor) on Apr 08, 2009 at 21:39 UTC
    $_ and @_ are completely different variables and $_ has nothing to do with the parameters of a subroutine

    "The same article said that as soon as you set some other variable to $v = $_ it will be impossible to modify the value of $_ afterwards"

    This is wrong. Maybe you misread or misinterpreted it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://756451]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-24 14:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found