Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Janitors Thread Retitler v1

by davido (Cardinal)
on Nov 30, 2004 at 20:55 UTC ( [id://411280]=sourcecode: print w/replies, xml ) Need Help??
Category: PerlMonks Related Scripts
Author/Contact Info davido
Description:

There is a newer version of this code posted at Janitors Thread Retitler v2. Both of these have been replaced by Janitors Thread Retitler v3.1.

This is a new retitler for the Janitors to use in retitling entire threads. It won't work if you're not a janitor, as it requires an edit page only available to the Monastery's cleanup crew. If you're a Janitor, and are tired of manually retitling nodes one by one, and would like an alternative to Janitors' Tools - Bulk node retitler, read on.

Usage is:

retitle -id NNNNNN -title "New Title Here" [-debug]

To configure for your system you just need to configure the script with your username and password, on or around line 13 and 14, $user and $passwd.

It works by fetching the XML Node Thread ticker for the thread specified by the command-line arg "-id NNNNNN". It walks the thread's XML footprint picking each node ID along the way. It then grabs the original title of the root node, and asks for confirmation before proceeding.

Finally, it uses the editor's view to update the titles for each node id. It leaves "Re:" and "Re^N:" alone, and skips any node whos title is different from the root node's title. That way if someone has intentionally used a different title in a followup it will leave that alone.

In debug mode, this script simply skips the final step where it normally would submit the change.

It requires WWW::Mechanize version 0.74 or newer. The version currently available on ActiveState PPM is 0.72, so if you're an ActiveState user you'll have to upgrade WWW::Mechanize through some other means. The most recent version is 1.05, but anything from 0.74 on is fine.

This was my first dive into using WWW::Mechanize to fill in forms, and it was an educational experience. I'm aware that a similar tool already exists, but I wanted to work through it myself for the practice. My tool uses the XML ticker, which may be less fragile than Corion's bulk-node retitler (no offense to Corion; it was he who pointed out this fact, and the XML ticker wasn't available when he wrote his retitler).

Since it's a learning experience, I welcome any comments or suggestions for improvement.

Janitors, enjoy!

Updated to expand usefulness to older style "Re: Re:" titles too.

use strict;
use warnings;

use Getopt::Long;

use WWW::Mechanize 0.74;
use XML::Simple;

# Use your own username and password here.
my $site = "http://www.perlmonks.org/";
my $user   = 'someuser';
my $passwd = 'somepassword';
# ----------------------------------------

our $DEBUG = 0;

my( $target, $new_title );

GetOptions( 'id=i'    => \$target, 
            'title=s' => \$new_title,
            'debug'   => \$DEBUG );
            
unless( defined( $target ) 
        and defined( $new_title )
        and ( $target =~ m/\d+/ ) 
        and ( length( $new_title ) > 3 ) 
    ) {
        die   "Usage: retitle -id nnnnnn -title \"New title...\""
            . " [-debug].\n"; 
}

my $login = "op=login;user=$user;passwd=$passwd;expires=+10y;";

my $agent = WWW::Mechanize->new( 'autocheck' => 1 );

$agent->env_proxy();

my @node_ids = sort { $a <=> $b } 
    get_node_ids( $target, $login, $agent );

sleep 2;

my $old_title = get_root_title( $node_ids[0], $agent );

print "Original Title: '$old_title'. Continue? (y/n):\n";
my $ok = <STDIN>;
die "Stopped.\n" unless $ok =~ m/^y/i;

foreach my $id ( @node_ids ) {
    sleep 2;
    edit_title( $id, $agent, $old_title, $new_title );
}



sub get_node_ids {
    my( $target, $login, $agent ) = @_;
    print "Fetching thread node ID's for id = $target.\n";
    $agent->get(  
          $site . '?'
        . $login
        . "node_id=180684;id=$target"
    );

    $agent->success() 
        or die "Unable to fetch thread for id = $target.\n";
    
    my $xmlref = XMLin( $agent->content(), 
                        ForceArray => 1, 
                        KeepRoot => 1 );
    my @node_ids = traverse( $xmlref );
    print "\tFetched " 
        . scalar( @node_ids ) 
        . " node IDs.\n\n";
    return @node_ids;
}

sub traverse {
    my @nodes;
    foreach my $key ( keys %{$_[0]} ) {
        if ( ref( $_[0]->{$key} ) ) {
            push @nodes, traverse( $_[0]->{$key} );
        }
        if ( $key =~ m/^\d+$/ ) {
            push @nodes, $key;
        }
    }
    return @nodes;
}

sub get_root_title {
    my( $target, $agent ) = @_;
    print "Fetching title for id = $target.\n";
    $agent->get(
          "$site?displaytype=xml;node_id="
        . $target
    );
    $agent->success()
        or die "Unable to fetch title for id = $target.\n";
    my $xmlref = XMLin( $agent->content(), 
                        ForceArray => 1, 
                        KeepRoot => 1 );
    print   "\tTitle: ", 
            $xmlref->{'node'}{$target}{'title'}, 
            "\n\n";
    return $xmlref->{'node'}{$target}{'title'};
}

sub edit_title {
    my( $target, $agent, $from, $to ) = @_;
    $agent->get(
          "$site?displaytype=editors;node_id="
        . $target
    );
    $agent->success()
        or die "Unable to fetch editors form for id://$target.\n";
    my $form = $agent->form_name('edit_node');
    unless ( $form ) { 
        die "Couldn't find 'edit_node'.\n"; 
    };
    my $old_title = $form->value( 'update_title', 1 );
    if ( $old_title !~ m/^(?:Re(?:\^\d+)?:\s*)*\Q$from\E$/i ) {
        print "$old_title doesn't match $from. Skipping node.\n";
        return;
    }
    my $new_title = $old_title;
    $new_title =~ 
        s/^(Re(?:\^\d+)?:\s*)*\Q$from\E$/(defined($1)?$1:'').$to/ie;
    print   "Retitling ($target):\n\t'$old_title' =>\n"
          . "\t'$new_title'.\n\n";
    $agent->field( 'update_title', $new_title );
    unless( $DEBUG ) {
        $agent->click_button( 'value' => 'update' );
        $agent->success() or
            die "Couldn't edit $old_title.\n";
    }
}
Replies are listed 'Best First'.
Re: Janitors Thread Retitler v1
by castaway (Parson) on Dec 02, 2004 at 08:23 UTC
    Nice work davido!

    A couple of points:

    1. Please add a <p><small>"Edit by [$user] - Retitle from '$orig_title'"</small></p> to the end of the root node while retitling, else we'll still have to do that bit by hand.

    2. I'd prefer to be able to enter the original title on the command line too, so that I can just start it and forget, and not have to wait for it to ask if I want to continue.

    3. If I enter an invalid node id, the xml node thread ticker gives back a valid but empty list, you should probably check if the title it gives has a length. (Someone should probably patch the ticker not to do this, see my post in the Scriptorium).

    4. I have WWW::M 0.70, so I replaced your line

    $agent->click_button( 'value' => 'update' );
    with
    $agent->current_form->value('blah','update'); $agent->click('blah');
    stolen from Corions version, and it seems to work fine.

    C.

      I'd prompt for a 'from' and 'to' and not try to parse "Re^2:" but just s/\Q$from/$to/ on the titles. Perhaps slightly more robust would be

      s/(?!(?<=\w)\w)\Q$from\E(?!(?<=\w)\w)/$to/

      Which is like \b\Q$from\E\b except that it is more DWIM in cases where $from starts or ends with \W characters.

      And I'd prompt for what to add to the bottom of retitled nodes (with what castaway proposed as a default) since I'd only rarely want to include the full original title, especially that many times.

      I didn't check, but I hope it allows me to retitle subtrees of a thread.

      - tye        

        "that many times" ? I thought we only put the janitor-comment on the top node. Or perhaps I have no idea what you mean?

      All good points, castaway. I'll have time to work on implementing them this evening (localtime california).

      One thought. The reason that I opted to not require the original title to be typed in on the command line is because I felt that it is inconvenient to have to carefully type it in exactly as it appears in the original post, without any missed spaces, punctuation, and so on. How about if I implement an optional commandline switch, "-orig"? If you use the switch, the script runs from the commandline without the runtime prompt. If you don't use the switch, the script still grabs the original title itself (like it does now) and prompts to ensure it's correct.

      Thanks for the suggestion on how to overcome the WWW::Mechanize 0.74 requirement. I'll implement that too.


      Dave

        Well, I dunno about you, but on any occasion where I pass a node to a/the retitler, I've always gone for the "cut&paste" method so far, both on original title, and new title suggestion. Your suggestion to make it optional is fine by me :)

        C.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://411280]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-16 07:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found