Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

parse html for specific things

by Gtforce (Sexton)
on Dec 15, 2017 at 11:07 UTC ( [id://1205567]=perlquestion: print w/replies, xml ) Need Help??

Gtforce has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I've been struggling with this data (a few months old in perl). The data comes off a html page stripped down to just this. I'm looking to pick specific things like buyQuantity1, buyQuantity2, buyQuantity3. Any advice on how to do this?

The data:

{"futLink":"\/live_market\/dynaContent\/live_watch\/get_quote\/GetQuot +eFO.jsp?underlying=SHREECEM&instrument=FUTSTK&expiry=28DEC2017&type=- +&strike=-","otherSeries":["EQ"],"lastUpdateTime":"15-DEC-2017 14:55:5 +3","tradedDate":"15DEC2017","data":[{"extremeLossMargin":"5.00","cm_f +fm":"21,823.45","bcStartDate":"25-JUL-17","change":"-146.90","buyQuan +tity3":"3","sellPrice1":"17,751.40","buyQuantity4":"5","sellPrice2":" +17,758.50","priceBand":"No Band","buyQuantity1":"3","deliveryQuantity +":"93,925","buyQuantity2":"2","sellPrice5":"17,775.00","quantityTrade +d":"1,09,030","buyQuantity5":"2","sellPrice3":"17,772.10","sellPrice4 +":"17,772.15","open":"18,044.00","low52":"13,140.30","securityVar":"5 +.21","marketType":"N","pricebandupper":"19,688.10","totalTradedValue" +:"2,741.00","faceValue":"10.00","ndStartDate":"-","previousClose":"17 +,898.30","symbol":"SHREECEM","varMargin":"7.50","lastPrice":"17,751.4 +0","pChange":"-0.82","adhocMargin":"-","companyName":"Shree Cements L +imited","averagePrice":"17,850.84","secDate":"14DEC2017","series":"EQ +","isinCode":"INE070A01015","surv_indicator":"-","indexVar":"-","pric +ebandlower":"16,108.50","totalBuyQuantity":"7,050","high52":"20,538.0 +0","purpose":"DIVIDEND - RS 24 PER SHARE","cm_adj_low_dt":"23-DEC-16" +,"closePrice":"0.00","isExDateFlag":false,"recordDate":"-","cm_adj_hi +gh_dt":"15-MAY-17","totalSellQuantity":"7,255","dayHigh":"18,197.90", +"exDate":"21-JUL-17","sellQuantity5":"1","bcEndDate":"31-JUL-17","css +_status_desc":"Listed","ndEndDate":"-","sellQuantity2":"1","sellQuant +ity1":"1","buyPrice1":"17,750.25","sellQuantity4":"2","buyPrice2":"17 +,750.20","sellQuantity3":"2","applicableMargin":"12.50","buyPrice4":" +17,750.00","buyPrice3":"17,750.10","buyPrice5":"17,735.05","dayLow":" +17,701.00","deliveryToTradedQuantity":"86.15","basePrice":"17,898.30" +,"totalTradedVolume":"15,355"}],"optLink":"\/marketinfo\/sym_map\/sym +bolMapping.jsp?symbol=SHREECEM&instrument=-&date=-&segmentLink=17&sym +bolCount=2"}

Replies are listed 'Best First'.
Re: parse html for specific things (JSON)
by LanX (Saint) on Dec 15, 2017 at 11:22 UTC
    This looks like JSON not HTML.

    use JSON; # imports encode_json, decode_json, to_json and from_json. # simple and fast interfaces (expect/generate UTF-8) $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

Re: parse html for specific things
by ablanke (Prior) on Dec 15, 2017 at 11:39 UTC
    Hi,

    Yes, it really looks like JSON.

    Try this:

    use strict; use warnings; use JSON; use Data::Dumper; my $json = JSON->new->allow_nonref; my $json_text = do { local $/; <DATA> }; #read DATA Block my $ref = $json->decode( $json_text ); #get Perl Data Structure my %data = %{$ref->{data}->[0]}; #giving the shortcut a na +me my %buyQuantity = map { $_ =~ m/buyQuantity/ ? ($_, $data{$_}) : () } keys %data; # extract data you are looking for print Dumper(\%buyQuantity); #observe results __DATA__ {"futLink":"\/live_market\/dynaContent\/live_watch\/get_quote\/GetQuot +eFO.jsp?underlying=SHREECEM&instrument=FUTSTK&expiry=28DEC2017&type=- +&strike=-","otherSeries":["EQ"],"lastUpdateTime":"15-DEC-2017 14:55:5 +3","tradedDate":"15DEC2017","data":[{"extremeLossMargin":"5.00","cm_f +fm":"21,823.45","bcStartDate":"25-JUL-17","change":"-146.90","buyQuan +tity3":"3","sellPrice1":"17,751.40","buyQuantity4":"5","sellPrice2":" +17,758.50","priceBand":"No Band","buyQuantity1":"3","deliveryQuantity +":"93,925","buyQuantity2":"2","sellPrice5":"17,775.00","quantityTrade +d":"1,09,030","buyQuantity5":"2","sellPrice3":"17,772.10","sellPrice4 +":"17,772.15","open":"18,044.00","low52":"13,140.30","securityVar":"5 +.21","marketType":"N","pricebandupper":"19,688.10","totalTradedValue" +:"2,741.00","faceValue":"10.00","ndStartDate":"-","previousClose":"17 +,898.30","symbol":"SHREECEM","varMargin":"7.50","lastPrice":"17,751.4 +0","pChange":"-0.82","adhocMargin":"-","companyName":"Shree Cements L +imited","averagePrice":"17,850.84","secDate":"14DEC2017","series":"EQ +","isinCode":"INE070A01015","surv_indicator":"-","indexVar":"-","pric +ebandlower":"16,108.50","totalBuyQuantity":"7,050","high52":"20,538.0 +0","purpose":"DIVIDEND - RS 24 PER SHARE","cm_adj_low_dt":"23-DEC-16" +,"closePrice":"0.00","isExDateFlag":false,"recordDate":"-","cm_adj_hi +gh_dt":"15-MAY-17","totalSellQuantity":"7,255","dayHigh":"18,197.90", +"exDate":"21-JUL-17","sellQuantity5":"1","bcEndDate":"31-JUL-17","css +_status_desc":"Listed","ndEndDate":"-","sellQuantity2":"1","sellQuant +ity1":"1","buyPrice1":"17,750.25","sellQuantity4":"2","buyPrice2":"17 +,750.20","sellQuantity3":"2","applicableMargin":"12.50","buyPrice4":" +17,750.00","buyPrice3":"17,750.10","buyPrice5":"17,735.05","dayLow":" +17,701.00","deliveryToTradedQuantity":"86.15","basePrice":"17,898.30" +,"totalTradedVolume":"15,355"}],"optLink":"\/marketinfo\/sym_map\/sym +bolMapping.jsp?symbol=SHREECEM&instrument=-&date=-&segmentLink=17&sym +bolCount=2"}
Re: parse html for specific things
by Laurent_R (Canon) on Dec 15, 2017 at 22:34 UTC
    Yes, probably correct JSON, but badly formatted JSON, pretty much unreadable.
      pretty much unreadable

      Probably because it is not intended to be read by humans. JSON parsers will have no problem with lack of irrelevant white space, so why bother adding white space? Compare with the HTML delivered by the Google start pages, or the "compressed" version of jQuery. Both are size-optimized input for the browser's parser, not intended to be read by humans in this form.

      My guess is that the JSON blob is loaded into a web page or into an appllication that parses the data and generates a nice, human readable view of the data.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        JSON is human readable. I think the comment relates to posting it on a site. Perhaps, for posting purposes, in the same we the guidelines suggest sensible layout of code/data example, it'd be wise for users to post pretty printed JSON to make it easier for humans here reading/responding to questions. Of course perl can help here, or there are simply online tools for doing so, for example http://jsonprettyprint.com.

        Probably because it is not intended to be read by humans. JSON parsers will have no problem with lack of irrelevant white space, so why bother adding white space?
        Yes, I fully appreciate that it doesn't matter for parsers. But, still, when I have to write some JSON or some HTML, I usually prefer to insert at least some vertical space (EOLs), because there is almost always a time somewhere in the future when I or another human will need to read the source or change it. My take on this is similar to what I do with Perl or C code: to make it friendly to the human reader. But I certainly understand your point that you may also need to optimize your document's size, especially if it's going to be transmitted zillions of times over a network which may sometimes be slow.
      Readability of JSON is solved by a one-liner:
      perl -MJSON -we 'local $/; print JSON->new->pretty->canonical->encode( +decode_json(<>))'
      Too bad that doesn't work with anything Turing-complete, like JS code or Perl itself...
Re: parse html for specific things
by Anonymous Monk on Dec 16, 2017 at 01:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1205567]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-03-29 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found