http://qs321.pair.com?node_id=11101591


in reply to Re^7: Sort alphabetically from file
in thread Sort alphabetically from file

the provided sample data was not '\t'. as far as i can tell it was not tab separated. it was a double space. there is no reason under the sun (that i can think of) to have two white spaces between data

Well, even if you can't think of a reason, the OP's file format appears to use it ;-) The main point here is this: we don't know the OP's real file format. Depending on where the source code was copied-and-pasted from, tabs could have been converted to spaces. The data is so simplistic that it most likely isn't the real data the OP is working with. And if it is, then it's most likely a homework assignment, and if I was an instructor, in my next assignment I might specifically design my input file format to allow for single whitespace characters in a column and require two or more whitespace characters between columns, just to teach people about how to handle strange situations like that. People tend to get pretty creative in their file formats.

spaces between data is kind of flawed in itself really, your better off using a comma or some other separator that is not normally used

I absolutely agree!

you also took what i said out of context for the most part, what i said was: As long as that does not corrupt your data set it should be fine (and i am sure it is fine)

If it had just been the first part of the sentence, without the part in parentheses, then I think it's a great way to word it. But the part in parentheses expresses a level of certainty that we just can't have. Even re-reading the sentence now, I don't see another way to understand the wording of that sentence; if I'm mistaken, please feel free to explain what you meant. I quoted that part because that's what I was objecting to, plus a little more so the quote would make more sense. And if someone was missing context, your post is still there :-) I've updated my post though.

To put it a different way: It sounded like you were saying not to worry about it, but not thinking about these kinds of issues is what contributes to people designing some "strange" file formats :-)

In such cases I find it better to ask the OP to be specific about their file format (providing a hex dump if necessary), to design a solution as robust and defensively coded as possible based on the data given (i.e. it rejects data that isn't exactly like the sample data), and/or to provide a solution but explain all of the assumptions and limitations.

By the way, it looks like you've edited your post without mentioning the edit. Please see How do I change/delete my post?, in particular "It is uncool to update a node in a way that renders replies confusing or meaningless".