Simple or not

apachi15 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Simple or not by toolic (Bishop) on Apr 06, 2009 at 14:46 UTC
Welcome to the Monastery. Firstly, when posting example code, please use "code" tags, as described in Writeup Formatting Tips. This makes it easier for others to understand and download your code. Secondly, it would have been better to create a more meaningful title for the posting: "Simple or not" conveys no useful meaning. Now, on to (what I think is) your problem. If you are trying to replace all whitespace in your input with tabs, then preserve the newline, the following should accomplish that task (untested): `use strict; use warnings; open my $fh_in , '<', 'test1.txt' or die "Can not open file test1.txt: + $!"; open my $fh_out, '>', 'test2.txt' or die "Can not open file test2.txt: + $!"; while (<$fh_in>) { s/\s+/\t/g; print $fh_out "$_\n"; } close $fh_out; close $fh_in;` [download] In my opinion, you never clearly stated what your goal was. It would have been more useful if you had supplied a few sample lines of your input file, your actual output and your expected output.	[reply] [d/l]
Re: Simple or not by BrowserUk (Patriarch) on Apr 06, 2009 at 16:25 UTC
The most efficient way to squash (convert multiple characters to a single instance) is to use tr///s: `$line =~ tr/ /\t/s;` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Simple or not by almut (Canon) on Apr 06, 2009 at 14:38 UTC
I do have problems with the last \s since I need my \n Maybe you could explicitly match spaces in the replacement (instead of whitespace). Or chomp off the newline first and then append it back on when you're done with the whitespace -> tab substition. If I understood the problem correctly, that is...	[reply]
Re: Simple or not by kennethk (Abbot) on Apr 06, 2009 at 14:42 UTC
First, please read Writeup Formatting Tips, as wrapping your code in `<code>` tags would dramatically improve the formatting. Rather than rolling your own, I'd recommend using Text::CSV and setting the sep_char attribute to a space, a la: `my $csv = Text::CSV->new({sep_char => ' '});`	[reply] [d/l] [select]
Re: Simple or not by ELISHEVA (Prior) on Apr 06, 2009 at 15:09 UTC
Your regular expression: `$line =~ s/ s{2,}/\t/s` is converting " ss" and " ssss" to tabs and it is only converting the first one because you don't have a global flag at the end. To replace the tabs you would need `s/\s\s+/\t/g` or `s/\s{2,}/\t/g`. Note the backslash before the "s". As for keeping the final newline, I would recommend stripping it before you apply the regex and adding it after you are done. There are ways to do that with regular expressions alone but your regular expression would be quite complicated. So the code to prep the line should look something like this: `while(my $line = <FH>) { #strip newline chomp $line; #replace runs of two or more spaces with tabs $line =~ s/\s\s+/\t/g; #add back newline $line .= "\n"; #.... do whatever with $line .... }` [download] Best, beth	[reply] [d/l] [select]
Re: Simple or not by apachi15 (Initiate) on Apr 07, 2009 at 07:25 UTC
HI, thank you very much and sorry I did not read the rules (I did now), I posted my message a bit in a rush. Anyway I tried your suggestions, but still there are some problems with it. For example: `open my $fh_in , '<', 'test1.txt' or die "Can not open file test1.txt: + $!"; open my $fh_out, '>', 'test2.txt' or die "Can not open file test2.txt: + $!"; while (<$fh_in>) { s/\s+/\t/g; print $fh_out "$_\n"; } close $fh_out; close $fh_in;` [download] The code does the job but still there is a problem in the 5th column since there is not always an entry. This is an example of my text file: `MGI:1918918 381629 0610007C +21Rik 5 syntenic 51374 + C2orf28 2p23.3 B MGI:1918917 71667 0610007L +01Rik 5 syntenic F 55069 + C7orf42 7q11.21 C MGI:1923501 76251 0610007P +08Rik 13 syntenic B3 375748 + C9orf102 9q22.32 M MGI:1915571 58520 0610007P +14Rik 12 syntenic 11161 + C14orf1 14q24.3 B` [download] In the lines with no entry in the 5 th column there should be second TAB. I currently work with an if(...){then....} I hope it is clearer now, Thanks Martin	[reply] [d/l] [select]
Re^2: Simple or not by kennethk (Abbot) on Apr 07, 2009 at 18:24 UTC
Your issue here arises from the fact you are starting with a fixed-width file, not a white-space delimited file - the latter cannot contain a null column with its body by construction. You can roll your own solution using substr or pack/unpack. However, it's probably a good idea to turn to CPAN and use one of the many fixed width parsing modules available (e.g. search terms fixedwidth, fixedlength). Update: You reposted the problem outside this thread here. You shouldn't break threads unnecessarily, and you really shouldn't double post.	[reply]


Perl-Sensitive Sunglasses
	PerlMonks