Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Before showing you the code, some notes.

  • You are mixing text (the boundary delimiters) and binary data. This is bad
  • You are writing out hexadecimal representations of numbers, but you want to save space. This makes little sense
  • You should really use pack

Now, I changed the encoding. Now it is:

  • one long, for the index length
  • the index. Each entry in the index is made of:
    1. one long: the index number
    2. one long: the string length
    3. some bytes: the string itself
  • the content, as a series of longs
  • And now, the code:

    Encoder:

    #!/usr/bin/perl @lines = <>; binmode STDOUT; $x = 1; foreach $ln (@lines) { @words = split /\s+/, $ln; foreach $word (@words) { $index{$word} = $x++; $count{$word}++; } } $idx=''; foreach $key (keys %index) { $idx.=pack("N(N/a*)",$index{$key},$key); } print STDOUT pack("N",length($idx)),$idx; foreach $ln (@lines) { @words = split /\s+/, $ln; foreach $word (@words) { print STDOUT pack("N",$index{$word}); } }

    Decoder:

    #!/usr/bin/perl binmode STDIN; undef $/; $i=<>; $is=unpack "N",$i; $ind=substr($i,4,$is);$con=substr($i,4+$is); %index=unpack "(N(N/a))*",$ind; @con=unpack "N*",$con; print join(' ',@index{@con});

    Note the use of binmode and the unsetting of $/ (aka input record separator).

    -- 
            dakkar - Mobilis in mobile
    

    In reply to Re: help needed on encoding a text file by dakkar
    in thread help needed on encoding a text file by bfdi533

    Title:
    Use:  <p> text here (a paragraph) </p>
    and:  <code> code here </code>
    to format your post; it's "PerlMonks-approved HTML":



    • Are you posting in the right place? Check out Where do I post X? to know for sure.
    • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
      <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
    • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
    • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2024-04-23 11:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found