Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: how do I scrape this web page

by Your Mother (Archbishop)
on Mar 05, 2020 at 00:19 UTC ( [id://11113802]=note: print w/replies, xml ) Need Help??


in reply to Re: how do I scrape this web page
in thread how do I scrape this web page

FWIW, the prices are in the raw HTML, no JS needed. It’s just buried in noise. Raw text stripped out a little–

Gold Price Today Gold Price $1,638.17

Replies are listed 'Best First'.
Re^3: how do I scrape this web page
by Marshall (Canon) on Mar 05, 2020 at 00:29 UTC
    Wow, then my mistake. I stand corrected. Thanks! I just saw so much JS noise and I searched on "638" in an attempt to find the actual price without a result, but that probably means I made a search mistake. Ooops! This does make the job quite a bit easier. This then now brings back up the "terms of use" and whether using this page is "legal" for the described usage. That I don't know.

    I found this below: Geez, this page's code is a mess!:

    <div class="metal-title"> Gold Price </div> <div class="nfprice">&#36;1,638.93</div> <div class="table-variations"> <div class="single-variation-currency">
    Update: I followed my own advice and googled "commodity api data". There appear to be lots of options. I haven't investigated nor do I endorse any of the sources. But one says "XXXX offers commodity prices data for almost 100 commodities, including gold prices, silver prices and oil prices from multiple sources. XXXX's simple API gives access to daily spot prices and historical commodity prices. That or other similar sounds promising.

    The API for XXXX says a free user gets: "Authenticated users have a limit of 300 calls per 10 seconds, 2,000 calls per 10 minutes and a limit of 50,000 calls per day." Pay for users can go faster. This is much better than fiddling around with web page with fancy graphics. The data is returned in a format that is easy for computers to understand. Well geez as it should be if the "throttle" on a free account is an average of 30 requests per second!

    Additional Update https://blog.quandl.com/getting-started-with-the-quandl-api This shows how to get the data you want in JSON or CSV files. The way to use Perl is to get this JSON data and do what you want with it. Look at https://docs.quandl.com/docs/in-depth-usage for some examples. Scraping a user web page is not the right way to get this info. Get the right API for the data that you need and then use Perl to just go crazy with this JSON, CSV or HTML data. Although Your Mother found the HTML representation of Gold Price on this initial page and yes parsing this page can get that number, it is not the "right way". Using an API to get the data you want is the "right way" and these API's are designed to be very performant. I mean geez, this API is designed so that you can hit it 50k times per day without even paying anything! If you need this data more often than that, you are into something much more advanced than your question indicates!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11113802]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-25 15:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found