Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Correct link to put in my WWW:Mechanize

by Anonymous Monk
on Mar 12, 2022 at 18:45 UTC ( #11142033=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!
I am trying to submit some requests to a website so that I don't upload 1-by-1 (I have ~100 sequences to run), using the WWW:Mechanize module. But I can't seem to be able to find the correct page to put in my script. The tool I am trying to use is this one:
http://www.csbio.sjtu.edu.cn/bioinf/MemBrain/. If I just put this:
my $url = 'http://www.csbio.sjtu.edu.cn/MEMBRAIN/'; my $mech = WWW::Mechanize->new (timeout=>1000); $mech->post($url);
I get: Error POSTing http://www.csbio.sjtu.edu.cn/cgi-bin/MEMBRAIN.cgi/: Internal Server Error. Same result using get instead of post
By looking at the page source, I see:
<form name="myform" action="/cgi-bin/MEMBRAIN.cgi" method="post" onSub +mit="javascript: return checkform();">

so I am thinking that my URL might be wrong. Any ideas how to point to the correct page so that it gets executed?

Replies are listed 'Best First'.
Re: Correct link to put in my WWW:Mechanize
by LanX (Sage) on Mar 12, 2022 at 18:48 UTC
    > If I just put this:

    my $url = 'http://www.csbio.sjtu.edu.cn/MEMBRAIN/';

    It's

    http://www.csbio.sjtu.edu.cn/bioinf/MemBrain/ --------

    and CASE matters too.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Thanks Rolf, I tried that, still the same error (Error GETing http://www.csbio.sjtu.edu.cn/bioinf/MEMBRAIN/: Not Found) (both by putting post or get). Is there maybe somewhere that I need to add the cgi-bin part perhaps?
        I've updated my post in the meantime that cAsE matters.

        Anything after the domain is a file-path and follows unix rules.

        You should try to open those URIs in the browser.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Correct link to put in my WWW:Mechanize
by hippo (Bishop) on Mar 13, 2022 at 11:37 UTC

    As has been already advised the URL in your script does not match the URL in your description. Fixing it gives this SSCCE which runs to completion with no errors for me:

    #!/usr/bin/env perl use strict; use warnings; use WWW::Mechanize; my $url = 'http://www.csbio.sjtu.edu.cn/bioinf/MemBrain/'; my $mech = WWW::Mechanize->new; $mech->post ($url); die $mech->status unless $mech->success;

    You can run this code for yourself, confirm that it works and then start to expand it from there. That might be easier for you that trying to condense what you already have into an SSCCE.


    🦛

Re: Correct link to put in my WWW:Mechanize
by bliako (Monsignor) on Mar 13, 2022 at 17:23 UTC

    The above  <form ...> has onSubmit="javascript: return checkform();"> AFAIU this calls a js function checkform() prior to submitting. This function may be adding extra fields to the form, e.g. "I have checked the form, fields validate, it's ok for the cgi script to process it.". By accessing the cgi script directly, you may be omitting some of these fields. For example, look at the B1=Submit field (re: Re^8: Correct link to put in my WWW:Mechanize). Where does that come from? You don't mention that field in Re^11: Correct link to put in my WWW:Mechanize. But I saw it with my browser. Is that field the important difference between a successful submission? So, you need to inspect that js function and see if and how it modifies the submitted form. Then you can call the cgi directly.

    Anyway, may I suggest that you leave automation to someone who knows what they are doing lest these free resources find out that there are hundreds of failed attempts and put a captcha-wall and spoil it for the community? Not that I am the internet police or anything, just an opinion. BTW your question has nothing to do with Perl or Mech.

    bw, bliako

      > AFAIU this calls a js function checkform() prior to submitting.

      It's a classic JS form validator, a submission is only possible if it returns true.

      see e.g. https://www.w3schools.com/js/js_validation.asp

      > For example, look at the B1=Submit field. Where does that come from?

      I can see this with JS disabled

      <input style="font-size: 13pt" type="submit" value="Submit" name="B1">

      I don't have the impression JS is adding something to the form, frankly I doubt the author had the necessary skills to do so.

      But my advise here can only be superficial, because I don't know what the correct genetic input has to loo like. And the author decided to let his web-service simply crash, when the JS validation is bipassed to send broke input. 🤷🏽

      DISCLAIMER: I don't expect biologist to be good hackers, but academics should be capable to ask proper questions.

      ANYWAY: I agree with you that this is not really a Perl issue, it's about HTML/HTTP.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      Anyway, may I suggest that you leave automation to someone who knows what they are doing lest these free resources find out that there are hundreds of failed attempts and put a captcha-wall and spoil it for the community? Not that I am the internet police or anything, just an opinion.

      Seems there is even an option to download the whole program plus data. A quick look at the zip file shows some python programs, so it would make much more sense to run a local copy and modify it to batch process (it there is not already such an option).

      perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'

        yep, that would be the best option. It looks like we entered the meta-python era ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11142033]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2022-12-03 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?