Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Your challenge is to 'golf' some Perl code (produce code that requires the fewest [key] strokes -- fewest characters) that mostly just does s/--/¬-/g, but with some simple restrictions. I was surprised that I implemented this simple task over a dozen times before I finally got it right. I golfed mine down to 80 characters, so I wanted to see what y'all can come up with. Getting a correct solution may be a bigger challenge than golfing the solution.

Background

A 'de facto HTML comment' is started by "<!--" and ended by "-->" and can contain anything between those two delimiters except, of course, "-->". This is such a nice, simple, easy-to-parse definition that it has advantages over a standard HTML comment.

Some (notorious but still very popular) browsers only handle de facto HTML comments. Many browsers only handle standard HTML comments.1

Your task is to golf some code that will adjust de facto HTML comments so that they are also standard HTML comments. I'll let those who are curious about the details of standard HTML comments visit Google. The only detail we need to worry about for the golf is that "--" inside of a de facto HTML comment is the problem.

Although "<!-- foo -- -- bar -->" is a valid HTML comment according to both the standard and de facto definitions, I'll make the task much easier by just requiring that all occurrences of "--" be replaced inside of the de facto comments. But we want to change as few pixels as possible so we'll transform the above comment to something like "<!-- foo ¬- -¬ bar -->".

If you can code a solution that changes even fewer characters but still makes sure each de facto comment ends up also being a standard comment, then you'll get bonus points (in the tradition of Whose Line Is It Anyway).

I chose "¬" (the "not" symbol, "\xAC", &#xAC;=¬) because it looks a lot like "-" in most fonts and is still in Latin-1. The soft hyphen (&#xAD=&shy;) looks even closer to "-" but shouldn't be displayed at all in most cases, so I rejected it. The en dash is "–", &#x2013;, &ndash;, and is "\x96" in Windows-1252 (Microsoft's extension to Latin-1 which is nearly the de facto interpretation of "Latin-1") and it also looks even more like "-". But some browsers are still standards-compliant enough that they won't display that. How does your browser display it (–)?

The rules

  1. Insert as few characters as possible into the following code:
    #!/usr/bin/perl -w use strict; $| = 1; $/ = ''; for( <DATA> ) { #2345678 1 2345678 2 2345678 3 2345678... # Replace this line with your code ; print; }
    Some sample input is shown later.
  2. Your code must make it so that, for each "<!--" that starts a de facto HTML comment, the next occurrence of "--"s after it is the first two characters of "-->" (which ends the comment). Bonus points for instead making each comment valid according to the HTML standards.
  3. Your code should change as few characters as possible.
    • So it should not change any characters outside of de facto HTML comments. (If there is a "<!--" that is never followed by a "-->" then your code can either treat the rest of the string as being inside a comment or outside, whatever makes your code shorter.)
    • Rerunning your code on output from your code should make no changes.
    • Your code must only change "-" to "¬". So running tr/\x95/-/ on the input and output should give the same results.
    Points deducted for changing too many characters but even more points deducted for not producing comments that fit both definitions.
  4. You can assume the input and output are 8-bit Latin-1. Or you can assume utf-8 strings if you prefer. Other encodings might be legal though I can't think of any advantage.
  5. You get penalized for causing global side effects. This means that using "$a" instead of "my $x" isn't going to be a net win here. You can use global variables for their intended purposes but you'll get a small penalty if you change them and don't change them back (either to their previous value or to their standard default value).
  6. You get penalized for causing warnings.
  7. Please hide your solutions like spoilers (such as using a table or similar to set identical foreground and background colors and/or using READMORE tags and putting "spoilers" in your node title).

Later I'll post my solution and some test code that covers some of the rules. For now, I don't want to hint at techniques to try.

Here is some test data (but don't assume this is the only data you need to handle):

__END__ ---<!-- -->---> <--!-- <!-- -- --> --> <!---->--<!----->-<!------>---<!-------> <!---><!----> <!--->--<!----> <!--->---<!----> <!--->----<!----> -<!-->--<!-->--<!-->---<!--> <!--><!-->-<!-->--<!-->--<!-->---<!-->-- <!-- - - --> <!--- ---> <!---- ---->

1 Some browsers don't manage to get either definiton right. I have a copy of Opera that appears to require < and > to be balanced inside of HTML comments. Opera impresses me both with its nice features and how it manages to have bugs that are just so, well, stupid. (:

- tye        


In reply to Golf: Fix de facto HTML comments by tye

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-25 06:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found