Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Unable to get the JSON data in the website

by Perl_Love (Acolyte)
on May 10, 2016 at 15:18 UTC ( [id://1162636]=perlquestion: print w/replies, xml ) Need Help??

Perl_Love has asked for the wisdom of the Perl Monks concerning the following question:

Hello ,everybody I use this browser named Mozilla Firefox to open this web page http://www.tianyancha.com/company/398459114 ,and I can see web page information. When use Httpfox to open this website http://www.tianyancha.com/company/398459114.json ,can see information too. I want to obtain the web page information,I write these code as follow : but shows {"state":"error","message":"","data":null}. please tell me how to get to this web page information, thank you ~
use HTTP::Cookies; use LWP::UserAgent; use HTTP::Request::Common qw (GET POST); $|=1; my $ua=LWP::UserAgent->new(agent=>'Mozilla/5.0 (Windows NT 10.0; rv:46 +.0) Gecko/20100101 Firefox/46.0',timeout=>10); my $cookie_jar=HTTP::Cookies->new(); $ua->cookie_jar($cookie_jar); my %h=( 'Tyc-From'=>'normal', 'Accept-Encoding'=>'gzip, deflate', 'Accept'=>'application/json, text/plain, */*', 'Referer'=>'http://www.tianyancha.com/company/398459114' ); my $ba=$ua->request(GET 'http://www.tianyancha.com/company/398459114.j +son',%h)->as_string; open(C,">test_get.txt") or die; print C "$ba"; close C;
Hello! Thank you for soon reply ! Is there any better method of using Perl to get this web page or get JSON information ? Must though http://antirobot.tianyancha.com/captcha/verify first ? By the way ,if when access, download the authentication code, and then manually enter position, and then submit the manual input position through similar LWP module, access to web pages or JSON information? Whether this method can be ok ? Kindly give me the suggestion ,thanks a lot ~

Replies are listed 'Best First'.
Re: Unable to get the JSON data in the website
by tangent (Parson) on May 10, 2016 at 16:55 UTC
    When I try your script I get:
    HTTP/1.1 403 Forbidden ... {"state":"error","message":"","data":null}
    When I go to http://www.tianyancha.com/company/398459114 in my browser I am re-directed to http://antirobot.tianyancha.com/captcha/verify where I need to fill in a captcha.

    So it looks like the page is designed to prevent you from doing what you want (hence the "antirobot" in the URL - not a relative :)

Re: Unable to get the JSON data in the website
by talexb (Chancellor) on May 10, 2016 at 16:22 UTC

    I suggest you read up on JSON.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1162636]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-03-29 04:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found