Gtforce has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks!
I've been struggling with this data (a few months old in perl). The data comes off a html page stripped down to just this. I'm looking to pick specific things like buyQuantity1, buyQuantity2, buyQuantity3. Any advice on how to do this?
The data:
{"futLink":"\/live_market\/dynaContent\/live_watch\/get_quote\/GetQuot
+eFO.jsp?underlying=SHREECEM&instrument=FUTSTK&expiry=28DEC2017&type=-
+&strike=-","otherSeries":["EQ"],"lastUpdateTime":"15-DEC-2017 14:55:5
+3","tradedDate":"15DEC2017","data":[{"extremeLossMargin":"5.00","cm_f
+fm":"21,823.45","bcStartDate":"25-JUL-17","change":"-146.90","buyQuan
+tity3":"3","sellPrice1":"17,751.40","buyQuantity4":"5","sellPrice2":"
+17,758.50","priceBand":"No Band","buyQuantity1":"3","deliveryQuantity
+":"93,925","buyQuantity2":"2","sellPrice5":"17,775.00","quantityTrade
+d":"1,09,030","buyQuantity5":"2","sellPrice3":"17,772.10","sellPrice4
+":"17,772.15","open":"18,044.00","low52":"13,140.30","securityVar":"5
+.21","marketType":"N","pricebandupper":"19,688.10","totalTradedValue"
+:"2,741.00","faceValue":"10.00","ndStartDate":"-","previousClose":"17
+,898.30","symbol":"SHREECEM","varMargin":"7.50","lastPrice":"17,751.4
+0","pChange":"-0.82","adhocMargin":"-","companyName":"Shree Cements L
+imited","averagePrice":"17,850.84","secDate":"14DEC2017","series":"EQ
+","isinCode":"INE070A01015","surv_indicator":"-","indexVar":"-","pric
+ebandlower":"16,108.50","totalBuyQuantity":"7,050","high52":"20,538.0
+0","purpose":"DIVIDEND - RS 24 PER SHARE","cm_adj_low_dt":"23-DEC-16"
+,"closePrice":"0.00","isExDateFlag":false,"recordDate":"-","cm_adj_hi
+gh_dt":"15-MAY-17","totalSellQuantity":"7,255","dayHigh":"18,197.90",
+"exDate":"21-JUL-17","sellQuantity5":"1","bcEndDate":"31-JUL-17","css
+_status_desc":"Listed","ndEndDate":"-","sellQuantity2":"1","sellQuant
+ity1":"1","buyPrice1":"17,750.25","sellQuantity4":"2","buyPrice2":"17
+,750.20","sellQuantity3":"2","applicableMargin":"12.50","buyPrice4":"
+17,750.00","buyPrice3":"17,750.10","buyPrice5":"17,735.05","dayLow":"
+17,701.00","deliveryToTradedQuantity":"86.15","basePrice":"17,898.30"
+,"totalTradedVolume":"15,355"}],"optLink":"\/marketinfo\/sym_map\/sym
+bolMapping.jsp?symbol=SHREECEM&instrument=-&date=-&segmentLink=17&sym
+bolCount=2"}
Re: parse html for specific things (JSON)
by LanX (Saint) on Dec 15, 2017 at 11:22 UTC
|
This looks like JSON not HTML.
use JSON; # imports encode_json, decode_json, to_json and from_json.
# simple and fast interfaces (expect/generate UTF-8)
$perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
| [reply] [Watch: Dir/Any] [d/l] |
Re: parse html for specific things
by ablanke (Prior) on Dec 15, 2017 at 11:39 UTC
|
Hi,
Yes, it really looks like JSON.
Try this:
use strict;
use warnings;
use JSON;
use Data::Dumper;
my $json = JSON->new->allow_nonref;
my $json_text = do { local $/; <DATA> }; #read DATA Block
my $ref = $json->decode( $json_text ); #get Perl Data Structure
my %data = %{$ref->{data}->[0]}; #giving the shortcut a na
+me
my %buyQuantity = map {
$_ =~ m/buyQuantity/ ? ($_, $data{$_}) : ()
} keys %data; # extract data you are looking for
print Dumper(\%buyQuantity); #observe results
__DATA__
{"futLink":"\/live_market\/dynaContent\/live_watch\/get_quote\/GetQuot
+eFO.jsp?underlying=SHREECEM&instrument=FUTSTK&expiry=28DEC2017&type=-
+&strike=-","otherSeries":["EQ"],"lastUpdateTime":"15-DEC-2017 14:55:5
+3","tradedDate":"15DEC2017","data":[{"extremeLossMargin":"5.00","cm_f
+fm":"21,823.45","bcStartDate":"25-JUL-17","change":"-146.90","buyQuan
+tity3":"3","sellPrice1":"17,751.40","buyQuantity4":"5","sellPrice2":"
+17,758.50","priceBand":"No Band","buyQuantity1":"3","deliveryQuantity
+":"93,925","buyQuantity2":"2","sellPrice5":"17,775.00","quantityTrade
+d":"1,09,030","buyQuantity5":"2","sellPrice3":"17,772.10","sellPrice4
+":"17,772.15","open":"18,044.00","low52":"13,140.30","securityVar":"5
+.21","marketType":"N","pricebandupper":"19,688.10","totalTradedValue"
+:"2,741.00","faceValue":"10.00","ndStartDate":"-","previousClose":"17
+,898.30","symbol":"SHREECEM","varMargin":"7.50","lastPrice":"17,751.4
+0","pChange":"-0.82","adhocMargin":"-","companyName":"Shree Cements L
+imited","averagePrice":"17,850.84","secDate":"14DEC2017","series":"EQ
+","isinCode":"INE070A01015","surv_indicator":"-","indexVar":"-","pric
+ebandlower":"16,108.50","totalBuyQuantity":"7,050","high52":"20,538.0
+0","purpose":"DIVIDEND - RS 24 PER SHARE","cm_adj_low_dt":"23-DEC-16"
+,"closePrice":"0.00","isExDateFlag":false,"recordDate":"-","cm_adj_hi
+gh_dt":"15-MAY-17","totalSellQuantity":"7,255","dayHigh":"18,197.90",
+"exDate":"21-JUL-17","sellQuantity5":"1","bcEndDate":"31-JUL-17","css
+_status_desc":"Listed","ndEndDate":"-","sellQuantity2":"1","sellQuant
+ity1":"1","buyPrice1":"17,750.25","sellQuantity4":"2","buyPrice2":"17
+,750.20","sellQuantity3":"2","applicableMargin":"12.50","buyPrice4":"
+17,750.00","buyPrice3":"17,750.10","buyPrice5":"17,735.05","dayLow":"
+17,701.00","deliveryToTradedQuantity":"86.15","basePrice":"17,898.30"
+,"totalTradedVolume":"15,355"}],"optLink":"\/marketinfo\/sym_map\/sym
+bolMapping.jsp?symbol=SHREECEM&instrument=-&date=-&segmentLink=17&sym
+bolCount=2"}
| [reply] [Watch: Dir/Any] [d/l] |
Re: parse html for specific things
by Laurent_R (Canon) on Dec 15, 2017 at 22:34 UTC
|
Yes, probably correct JSON, but badly formatted JSON, pretty much unreadable.
| [reply] [Watch: Dir/Any] |
|
pretty much unreadable
Probably because it is not intended to be read by humans. JSON parsers will have no problem with lack of irrelevant white space, so why bother adding white space? Compare with the HTML delivered by the Google start pages, or the "compressed" version of jQuery. Both are size-optimized input for the browser's parser, not intended to be read by humans in this form.
My guess is that the JSON blob is loaded into a web page or into an appllication that parses the data and generates a nice, human readable view of the data.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [Watch: Dir/Any] |
|
JSON is human readable. I think the comment relates to posting it on a site. Perhaps, for posting purposes, in the same we the guidelines suggest sensible layout of code/data example, it'd be wise for users to post pretty printed JSON to make it easier for humans here reading/responding to questions. Of course perl can help here, or there are simply online tools for doing so, for example http://jsonprettyprint.com.
| [reply] [Watch: Dir/Any] |
|
Probably because it is not intended to be read by humans. JSON parsers will have no problem with lack of irrelevant white space, so why bother adding white space?
Yes, I fully appreciate that it doesn't matter for parsers. But, still, when I have to write some JSON or some HTML, I usually prefer to insert at least some vertical space (EOLs), because there is almost always a time somewhere in the future when I or another human will need to read the source or change it. My take on this is similar to what I do with Perl or C code: to make it friendly to the human reader. But I certainly understand your point that you may also need to optimize your document's size, especially if it's going to be transmitted zillions of times over a network which may sometimes be slow.
| [reply] [Watch: Dir/Any] |
|
|
|
|
Readability of JSON is solved by a one-liner:
perl -MJSON -we 'local $/; print JSON->new->pretty->canonical->encode(
+decode_json(<>))'
Too bad that doesn't work with anything Turing-complete, like JS code or Perl itself... | [reply] [Watch: Dir/Any] [d/l] |
Re: parse html for specific things
by Anonymous Monk on Dec 16, 2017 at 01:04 UTC
|
| [reply] [Watch: Dir/Any] |
|
|