comment on

Hi,

Without reading through all the replies....To be honest you have had quite a few.

My advice would be that there is always the potential to compress. Even in random sequences you will get repeated patterns. The difficulty is finding those patterns. You want a pattern that is easy to find first.

My advice would be to sort the text of interest and then count for each character type. That might be best done in a database. You might want to split the text up as well and do that bit by bit. With this information you will be able to better find opportunities for compression.

Update__________________

With infinite computing power LanX would be right.

For very large file sizes it would be difficult if not impossible to find the best compression solution. In that situation I would be right.

I know that the question states 50%. But really if you think about it you could compress the stored algorithms that do the transformation. It just goes on and on. Do people understand what I am saying?

Sorting is a good place to start maybe because the algorithm's (code) can be modified from that point in order to preserve information that will allow the recreation of the original file. There are a range of different sorting algorithms that I have in a book here. If anyone want me to post any of those I will.

In reply to Re: Data compression by 50% + : is it possible? by betmatt
in thread Data compression by 50% + : is it possible? by baxy77bax

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks