comment on

I have a medium sized file of about 6,000 records. They are diagnosis codes for emergency room patient visits, but that's beside the point. Consider them a string of numbers arranged like:

396.04,305.1,894.2,V321,908.3,,,,,
299.5,785,432.1,305.1,,,,,,
112.3,312,685,422.1,V566,433.2,987.5,,,, etc.

There can be as few as 2 numbers in each record or as many as 10. I need to filter out a certain subset of records. In each set (row) of numbers there is at least one number between 296.1 and 314.0. However, if the row contains "305.1", the record can be excluded if "305.1" is the only number between 296.1 and 314.0 in the row. In the sample rows above, row 1 would be discarded and rows 2 and 3 kept.

The person requesting this data suggested that I dump it into an excel spreadsheet, sort the data and manually remove the records that I don't need. That seems way too labor intensive to me, and I would think that Perl would have some quick and easy way to sort this out. I can't quite get my mind around the best way to do it, however.

I was thinking that I'd read each row and use a regular expression to find rows with 305.1 and then check for the existence of another qualifying number. Based on that I could then either delete the rows I don't need or save the ones I want to a new file. I'm a little rusty with Perl right now, and I don't even know how to start. I thought if I organized it into a node and tossed it out here that some discussion might help get me going. I'd appreciate any suggestions. Thanks.

In reply to Seeking Algorithm by WhiteBird

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Just another Perl shrine
	PerlMonks