Re: Best X-delimited format?
by Abigail-II (Bishop) on Sep 18, 2002 at 16:59 UTC
|
I doubt everyone will agree what the "optimal" delimited
format is. A few points:
- For a program, it doesn't matter what the delimiter is -
an "a" is as easy as a tab or a comma.
- For humans, it matters.
- I give much kudos to things that are debuggable with
vi and telnet.
- Tabs lose points, because they are not always easy to
distinguish from spaces. Furthermore, it's not uncommon
to configure editor to expand tabs to spaces.
- Printable punctuation characters are better than letters,
digits or control characters.
- The delimiter should be choosen in such a way it's not a
common character in the data, to avoid use of a backslash.
Don't use a dot as a delimiter when delimiting decimal numbers.
- I've preferences for colons (because important files in /etc
do so, semi-colons, dots, hyphens (all three because it's
natural) and "horizontal whitespace", that is, any sequence
of one or more spaces or tabs. Then you can make columns.
Abigail | [reply] |
Re: Best X-delimited format?
by katgirl (Hermit) on Sep 18, 2002 at 14:18 UTC
|
I use "|" (pipe) for most of mine... another question:
Is there any delimiter that should definitely not be used? Ever ever under any circumstances? On pain of... er... pain? | [reply] |
|
That probably depends on the nature of the data that you plan to delimit. I find that when I'm delimiting numbers that have decimals or commas, a decimal or comma delimiter tends to get really confusing :^). The same goes for delimiting strings with any common punctuation...I tend to use Text::ParseWords and don't run into much trouble.
Jason
| [reply] |
|
| [reply] [d/l] [select] |
Re: Best X-delimited format?
by mce (Curate) on Sep 18, 2002 at 14:23 UTC
|
Hi,
There are plenty standards for delimiters.
The easiest is CSV (with a ,) or
Tabs as you mention.
but the de facto standard now-a-days is XML to delimit data, or more general SGML.
---------------------------
Dr. Mark Ceulemans
Senior Consultant
IT Masters, Belgium
| [reply] |
|
| [reply] |
|
XML does result in larger files. That said though, larger files are very rarely a problem at this stage. If you are involved in flat files that would be so large that XML encoding them would be prohibitive, then the flat files are probably too big already.
Also, XML allows much flexibility in adding, moving, and rearranging the data. It's obvioiusly not the one answer for every situation, but it's the best answer for most situations.
| [reply] |
|
Re: Best X-delimited format?
by sauoq (Abbot) on Sep 18, 2002 at 16:50 UTC
|
I argued any character would be as "optimal" as any other character (although I prefer \t).
I tend to agree with this in theory but in real life, commas, tabs, and pipes are probably used most frequently.
You can get more efficiency out of the case where you are positive that your delimiter won't show up in your data. Otherwise, you really need at least two characters: the delimiter and a quoting (or escaping) character. The less either of those shows up in your data, the less processing you will have to do and the smaller your input will be. So, it is better to use a less frequently used character as the delimiter.
I disagree with you ex-supervisor's assertion that tab is "the standard." There simply is no standard.
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
Re: Best X-delimited format?
by fglock (Vicar) on Sep 18, 2002 at 14:14 UTC
|
it's not used as much in data
That's it: it depends on what data you have.
Anyway, I like both \t and #, but I also use
',' and ';' sometimes.
update: \t is not so good because some
text editors will change it to spaces. It might
get very hard to debug that.
| [reply] |
Re: Best X-delimited format?
by Zaxo (Archbishop) on Sep 18, 2002 at 18:57 UTC
|
If da boss wants standards, you can go retro. There is the ASCII control set, chars 0..31. It has RS == chr(30), FS == chr(28), ESC == chr(27), and more to choose from for fancier formats.
That violates much of the good sense in the other replies. They aren't printable, so text editors and human readers will have trouble. Efficiency for text data is great.
After Compline, Zaxo
| [reply] |
Re: Best X-delimited format?
by zengargoyle (Deacon) on Sep 19, 2002 at 05:22 UTC
|
I just want to jump in for my favorite, "^".
It seems to be rarely used. If you have a "comment" field for humans to use almost all of the other punctuation on the keyboard will be used by somebody.
- Tom's junk.
- 13 @ $4.32 each
- primary john; secondary phil;
- upstairs (behind the door)
But rarely 'tell mark ^ paul'. Plus it hangs high in the line and has space below for visual hooking.
| [reply] |