Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Simple or not

by apachi15 (Initiate)
on Apr 06, 2009 at 14:25 UTC ( [id://755733]=perlquestion: print w/replies, xml ) Need Help??

apachi15 has asked for the wisdom of the Perl Monks concerning the following question:

Hi to all Monks out there,

I think I have a siimple question for advanced perlmonks.

I have a text file which has to be importet to database. Unfortunately the fields are separeted with space characters and not Tabs. That's my code I am trying:
open(FH,"<test1.txt") or die "Can not find the file test.txt"; open(OUT,">test2.txt"); my @input=<FH>; close FH; foreach my $line(@input) { $line =~ s/ s{2,}/\t/s ; } print OUT @input; close(OUT);

I do have problems with the last\s since I need my \n

Thanks for every little help, Martin

Replies are listed 'Best First'.
Re: Simple or not
by toolic (Bishop) on Apr 06, 2009 at 14:46 UTC
    Welcome to the Monastery.

    Firstly, when posting example code, please use "code" tags, as described in Writeup Formatting Tips. This makes it easier for others to understand and download your code.

    Secondly, it would have been better to create a more meaningful title for the posting: "Simple or not" conveys no useful meaning.

    Now, on to (what I think is) your problem. If you are trying to replace all whitespace in your input with tabs, then preserve the newline, the following should accomplish that task (untested):

    use strict; use warnings; open my $fh_in , '<', 'test1.txt' or die "Can not open file test1.txt: + $!"; open my $fh_out, '>', 'test2.txt' or die "Can not open file test2.txt: + $!"; while (<$fh_in>) { s/\s+/\t/g; print $fh_out "$_\n"; } close $fh_out; close $fh_in;

    In my opinion, you never clearly stated what your goal was. It would have been more useful if you had supplied a few sample lines of your input file, your actual output and your expected output.

Re: Simple or not
by BrowserUk (Patriarch) on Apr 06, 2009 at 16:25 UTC
Re: Simple or not
by almut (Canon) on Apr 06, 2009 at 14:38 UTC
    I do have problems with the last \s since I need my \n

    Maybe you could explicitly match spaces in the replacement (instead of whitespace). Or chomp off the newline first and then append it back on when you're done with the whitespace -> tab substition.  If I understood the problem correctly, that is...

Re: Simple or not
by kennethk (Abbot) on Apr 06, 2009 at 14:42 UTC

    First, please read Writeup Formatting Tips, as wrapping your code in <code> tags would dramatically improve the formatting.

    Rather than rolling your own, I'd recommend using Text::CSV and setting the sep_char attribute to a space, a la:

    my $csv = Text::CSV->new({sep_char => ' '});

Re: Simple or not
by ELISHEVA (Prior) on Apr 06, 2009 at 15:09 UTC

    Your regular expression: $line =~ s/ s{2,}/\t/s is converting " ss" and " ssss" to tabs and it is only converting the first one because you don't have a global flag at the end. To replace the tabs you would need s/\s\s+/\t/g or s/\s{2,}/\t/g. Note the backslash before the "s".

    As for keeping the final newline, I would recommend stripping it before you apply the regex and adding it after you are done. There are ways to do that with regular expressions alone but your regular expression would be quite complicated. So the code to prep the line should look something like this:

    while(my $line = <FH>) { #strip newline chomp $line; #replace runs of two or more spaces with tabs $line =~ s/\s\s+/\t/g; #add back newline $line .= "\n"; #.... do whatever with $line .... }

    Best, beth

Re: Simple or not
by apachi15 (Initiate) on Apr 07, 2009 at 07:25 UTC
    HI, thank you very much and sorry I did not read the rules (I did now), I posted my message a bit in a rush. Anyway I tried your suggestions, but still there are some problems with it. For example:
    open my $fh_in , '<', 'test1.txt' or die "Can not open file test1.txt: + $!"; open my $fh_out, '>', 'test2.txt' or die "Can not open file test2.txt: + $!"; while (<$fh_in>) { s/\s+/\t/g; print $fh_out "$_\n"; } close $fh_out; close $fh_in;
    The code does the job but still there is a problem in the 5th column since there is not always an entry. This is an example of my text file:
    MGI:1918918 381629 0610007C +21Rik 5 syntenic 51374 + C2orf28 2p23.3 B MGI:1918917 71667 0610007L +01Rik 5 syntenic F 55069 + C7orf42 7q11.21 C MGI:1923501 76251 0610007P +08Rik 13 syntenic B3 375748 + C9orf102 9q22.32 M MGI:1915571 58520 0610007P +14Rik 12 syntenic 11161 + C14orf1 14q24.3 B
    In the lines with no entry in the 5 th column there should be second TAB. I currently work with an if(...){then....} I hope it is clearer now, Thanks Martin

      Your issue here arises from the fact you are starting with a fixed-width file, not a white-space delimited file - the latter cannot contain a null column with its body by construction.

      You can roll your own solution using substr or pack/unpack. However, it's probably a good idea to turn to CPAN and use one of the many fixed width parsing modules available (e.g. search terms fixedwidth, fixedlength).

      Update: You reposted the problem outside this thread here. You shouldn't break threads unnecessarily, and you really shouldn't double post.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://755733]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-19 20:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found