Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

How to get raw node content

by roboticus (Chancellor)
on Sep 15, 2017 at 18:25 UTC ( #1199471=monkdiscuss: print w/replies, xml ) Need Help??

Hello, gang:

I was wondering if there was an API available that would let me fetch the raw content of a perlmonks node?

Every once in a while, I see a node that I'm interested in dismantling, but it gets a bit butchered by the HTMLification/templating/(or something). The latest example is Reaped: Re: .pl to .exe, where I'm wanting to figure out just what the code is doing. But with the HTML entities, font rendering and whatnot, I can't really tell exactly what the code *was*. So rather than hand-editing it and trying to put it in a form that I can analyze, I'd love to be able to occasionally fetch just the raw node contents.

I did a little googling and trying to browse the SiteDocClan nodes, but I'm not a member, so there's a limit to what I can see (e.g., I can't see sdc to-do wiki, SDC Wiki, the PMDev, editor, cabalist and pedagogue wikis...).

I've found some public information, like What XML generators are currently available on PerlMonks?, WWW::PerlMonks, but I didn't see anything that provides the raw node content. I'm not asking for a new feature if it's not available, just a pointer to it if it exists.



When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re: How to get raw node content
by LanX (Cardinal) on Sep 15, 2017 at 18:30 UTC


      Thanks, I just gave that a try, and it worked nicely for a non-reaped node. That'll give me what I'm looking for, 95% of the time.

      However, I can't seem to make it work for a reaped node. I tried:;displaytype=xml

      But in return I got:

      <node id="1199455" title="Reaped: Re: .pl to .exe" created="2017-09-15 + 09:28:14" updated="2017-09-15 09:28:14"> <type id="11">note</type> <author id="52855">NodeReaper</author> <data> <field name="doctext"> This node was taken out by the [NodeReaper] on [localtime://2017-09-15 + 16-15-40]<BR>Reason: &#91;[hippo]]: Unformatted, without context and + apparently off-topic<p>You may view [href://?op=viewreaped;node_id=1 +199455|the original node and the consideration vote tally].</p> </field> <field name="root_node">288628</field> <field name="parent_node">288628</field> </data> </node>

      So I tried changing it to:;node_id=1199455;displaytype=xml

      And got:

      <node id="53641" title="Visit Reaped Nodes" created="2001-01-23 00:08: +15" updated="2005-08-22 15:36:03"> <type id="14">superdoc</type> <author id="937169">InGoodGraces</author> </node>

      Edit: on re-reading your reply, I looked for and found the XML link you mentioned. I hadn't noticed it. Unfortunately, it also fails on a reaped node.


      When your only tool is a hammer, all problems look like your thumb.

        > However, I can't seem to make it work for a reaped node.

        I think this is intentionally so (or simply nobody cared to provide this xml interface here)

        Reaped nodes are meant to become invisible to discourage spammers.

        And I'm not sure if really all input text is passed thru, since this might lead to a vulnerability for the viewer.

        Anyway I once was capable to download all reaped nodes in order to train a spam filter, but I can't recall how I did it.

        Probably I only got it in the HTML form.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!


        Actually I see an xml-displaytype link in the original text of reaped nodes, but only in my role as pm-dev.

        And it doesn't show the raw text but the perl-code producing this node.

        In other words this "show original and vote tally" node with op=viewreaped; belongs a totally different class of nodes.

        you need logged in (and correct link)

        or you can use corion's backup

Re: How to get raw node content
by huck (Parson) on Sep 15, 2017 at 19:53 UTC
      Actually your approach is better to see whitespaces and linebreaks directly inside the browser, since the XML view in FF doesn't show it.

      FWIW here a nodelet hack which opens a JS.alert() with the HTML of the post you are trying to reply to.

      <script><!-- function show_quote() { alert(document.querySelector("div.preview").innerHTML.match(/^[^]*?(?= +<hr> <div class="editnodetext">)/)[0]); } --></script> <a href='javascript:show_quote()'> show_quote</a>

      (you need to be in a "comment on" node to make it work)

      it was part of my plans to extend my wiki-syntax with comfortable quoting of a users post...

      ... of course milking the XML-displaytype would be more reasonable here.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!


      Thanks, but that still has the HTML entities encoding and bracket ([ ]) munging.


      When your only tool is a hammer, all problems look like your thumb.

        How raw can we get?

        lynx -dump > dump.txt


        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Sorry for nitpicking ...

      ... it's very close but that's not the raw input of the poster.

      For instance [links] are expanded and code tags have an extra download link.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://1199471]
Approved by LanX
Front-paged by LanX
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2021-04-10 20:07 GMT
Find Nodes?
    Voting Booth?

    No recent polls found