Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: youtube parser/scrabber

by bliako (Monsignor)
on Aug 19, 2021 at 12:47 UTC ( [id://11135962]=note: print w/replies, xml ) Need Help??


in reply to Re: youtube parser/scrabber
in thread youtube parser/scrabber

I have added a ; at the end of said regex and now have this: ^.+ytplayer\.config\s*=\s*(\{.*?};)

For this particular use-case the above regex extracts the JSON. Although JSON's decode_prefix() will ignore any trailing non-JSON (e.g. the Javascript I mentioned) content. Now, regarding the problem of unquoted keys and values. There is a allow_barekey() option to the JSON parser which will allow keys not to be quoted.

And you need to deal with the remaining problem of unquoted values. Unquoted values may be indicative of a much bigger problem: that values in the "JSON" (which is actually a Javascript hash) are function calls or other hash values, variables etc.! For example, this is the line that _get_args() looks for:

if(createPlayer){ if(window.ytplayer.bootstrapPlayerResponse){ window.ytplayer.config={args:{raw_player_response:window.ytplayer. +bootstrapPlayerResponse}}; ...

There is a reason why it is unquoted I think ...

So, yes the scrapper looks outdated (though very recently updated) and you are better off using something else.

bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11135962]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (12)
As of 2024-04-23 14:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found