Re: header footer
by kcott (Archbishop) on Mar 04, 2014 at 22:13 UTC
|
#!/usr/bin/env perl
use strict;
use warnings;
$^I = '.bak';
my $head_len = 5;
my $foot_len = 3;
while (<>) {
chomp;
print substr($_, $head_len, length($_) - $head_len - $foot_len), "
+\n";
}
Here's the starting files (before that script is run):
$ ls -l pm_1076973.*
-rwxr-xr-x 1 ken staff 210 5 Mar 08:50 pm_1076973.pl
-rw-r--r-- 1 ken staff 87 5 Mar 08:46 pm_1076973.txt
$ cat pm_1076973.txt
12345... content ...123
12345... more content ...123
12345... even more content ...123
Now run the script:
$ pm_1076973.pl pm_1076973.txt
Now the files look like this:
$ ls -l pm_1076973.*
-rwxr-xr-x 1 ken staff 202 5 Mar 08:56 pm_1076973.pl
-rw-r--r-- 1 ken staff 63 5 Mar 08:56 pm_1076973.txt
-rw-r--r-- 1 ken staff 87 5 Mar 08:46 pm_1076973.txt.bak
$ cat pm_1076973.txt
... content ...
... more content ...
... even more content ...
$ cat pm_1076973.txt.bak
12345... content ...123
12345... more content ...123
12345... even more content ...123
| [reply] [d/l] [select] |
|
Does your code assume there is just one header and footer in the file? If not, i am not seeing how it will go from one header/footer segment to another. And thanks for the reading links. It definitely helps.
| [reply] |
|
"Does your code assume there is just one header and footer in the file?"
No it does not.
Furthermore, given I put in a fair amount of effort to show the exact input and output, I'm surprised you're asking.
"If not, i am not seeing how it will go from one header/footer segment to another."
That sounds like you didn't even try it.
Did you just have a quick look and decided it wouldn't work?
Changing the lengths from my demo 5 and 3 to your real application requirements of 50 and 30, your sample records (posted below):
HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA
+CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGG
+GGGGGGGGGGGGGG1111111111111111111112222222222222222222222222222333333
+333333333333333333333444444444444444444444444444444444455 55555555555
+55555555555555555566666666666666777777777777FTRDDDDDDDDDDFFFFFFFFFFFF
+FFFFF
HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA
+CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGG
+GGGGGGGGGGGGGG1111111111111111111112222222222222222222222222222333333
+333333333333333333333444444444444444444444444444444444455 55555555555
+55555555555555555566666666666666777777777777FTRDDDDDDDDDDFFFFFFFFFFFF
+FFFFF
become
1STR HYTRES NAME PLACE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG111111111111111111111222222222222222
+222222222222233333333333333333333333333344444444444444444444444444444
+4444455 5555555555555555555555555555566666666666666777777777777
1STR HYTRES NAME PLACE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG111111111111111111111222222222222222
+222222222222233333333333333333333333333344444444444444444444444444444
+4444455 5555555555555555555555555555566666666666666777777777777
| [reply] [d/l] [select] |
|
Re: header footer
by Eily (Monsignor) on Mar 04, 2014 at 21:49 UTC
|
You probably want to use substr instead of a regex, because there is a length parameter for counting from the start, or negative offset for counting from the end. And to only cut the first 50 chars in the first line, and last 30 in the last, you can use the variable $. (line number, which is 0, so false on the first line) and eof (which will be true on the last). So :
RemoveHeader() unless $.;
RemoveFooter() if eof;
If you intend to use your script on several files at once, have a look at eof on how to reset $. at the start of each file. | [reply] [d/l] |
|
Found another example that doesnt store the entire file in a variable
but reads it line by line like this,
open ( my $input_fh, "<", $input_file );
open ( my $output_fh, ">", $output_file );
foreach my $line ( <$input_fh> )
{
##########
i have to use regex here to first see if this line start with a HDR -
+correct?
then if it does i would do a substring to delete the first 50 and wr
+ite the
rest to the output file
else
write the entire line to output
similarly check if it starts with EDR and substring again
##########
}
close ( $input_fh );
close ( $output_fh );
I didnt quiet get your suggestion on using the line number and eof. The file
will have numerous records that has headers and footers.
Am I on the right track with the above approach?
Storing in a variable vs reading line by line. Is one way better than the other?
| [reply] [d/l] |
|
%length = (HDR => 5, FTR => 10);
while(<DATA>)
{
while(/(HDR|FTR)/) # find either HDR or FTR
{
substr($_, $-[1], $length{$1}) = ''; # $-[1] is the position of
+the first capture groups (parenthesis)
}
print;
}
__DATA__
HDR--Hello this is a test FTR-------Should there be text here? HDR--Tw
+o records on the same line FTR-------
HDR--and here an incomplete footer FTR--
| [reply] [d/l] |
Re: header footer
by oiskuu (Hermit) on Mar 04, 2014 at 23:03 UTC
|
I'd suggest dividing the problem into parts. A: reading of a single record; B: writing the modified record; ...
Most of the difficulty is in the reading part, and we cannot really offer you much advice without learning all the details about the file format. Is the record of arbitrary length? Can a single record span megabytes? Is the record size encoded in the header?
Update: May a record contain "HDR" or "FTR" in its body (as a substring)?
| [reply] |
|
no. It will only be there in the header and footer.
you wont find a HDR or FTR in the body.
| [reply] |
|
hmm.. i just realized some of the responses are not shown in full. Just the header :) Sry i am reading through those now. Thanks.
| [reply] |
|
|
|
|
record is arbitrary in length yes. But like i said the header and footer lenght is always 50 and 30. Individual record is only about 10kb. But there are way too many records. record size is not encoded in the header.
| [reply] |
|
so am i missing something in thinking that
read line by line
check if pattern HDR exists - cut from end of header to rest of line and > outfile
rest of lines > outfile
keep going till find pattern > FTR - cut from FTR to end of line > outfile
would this not work?
| [reply] |
|
|
Re: header footer
by Laurent_R (Canon) on Mar 04, 2014 at 22:33 UTC
|
Hmm, I have the feeling that there are some misunderstandings here. It seems that there are several headers and several footers, not just one of each in the file. Possibly even one header and one footer for each record. It is also not entirely clear whether the headers and footers are on the same line at the data. Can you please give a sample of your file so that we can understand better its structure?
| [reply] |
|
Oups, indeed.
Well then maybe setting the input file separator ($/) to "HDR" would work then.
Or s/HDR.{47}|FTR.{27}//g;, but it doesn't appeal much to me.
| [reply] [d/l] |
Re: header footer
by Kenosis (Priest) on Mar 04, 2014 at 22:34 UTC
|
Can you share a (redacted, if necessary) copy of one of those records? Are the headers and footers the same for each record? Also, what separates those records?
| [reply] |
|
The file will have 1000s of records each record has its own header and footer. The header and footer can be different for each record. Only thing is that the header and footer are of fixed lenght 50 and 30.
HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA
+CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGG11111111111111111111122222222222222222222222222223
+33333333333333333333333333444444444444444444444444444444444455
5555555555555555555555555555566666666666666777777777777FTRDDDDDDDDDDFF
+FFFFFFFFFFFFFFF
HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA
+CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGG11111111111111111111122222222222222222222222222223
+33333333333333333333333333444444444444444444444444444444444455
5555555555555555555555555555566666666666666777777777777FTRDDDDDDDDDDFF
+FFFFFFFFFFFFFFF
Sry if there was a better way to send a sample file.
I copied the same record twice for simplicity sake but they will be different but leaght will be 50 and 30
and each record will have it until the end of the file.
| [reply] [d/l] |
|
| [reply] |
|