header footer

gupr1980 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: header footer by kcott (Archbishop) on Mar 04, 2014 at 22:13 UTC
G'day gupr1980, Welcome to the monastery. Your substitution regex has little bearing on the task you describe. You might want to get up to speed on Perl's regular expresions by reading "perlrequick - Perl regular expressions quick start" and "perlretut - Perl regular expressions tutorial". Having said that, I see no reason to use regular expressions here: substr is quite capable of doing this (and I'd expect it to be a lot faster). Here's an example script, `pm_1076973.pl`, with much short header and footer lengths for demo purposes: `#!/usr/bin/env perl use strict; use warnings; $^I = '.bak'; my $head_len = 5; my $foot_len = 3; while (<>) { chomp; print substr($_, $head_len, length($_) - $head_len - $foot_len), " +\n"; }` [download] Here's the starting files (before that script is run): `$ ls -l pm_1076973.* -rwxr-xr-x 1 ken staff 210 5 Mar 08:50 pm_1076973.pl -rw-r--r-- 1 ken staff 87 5 Mar 08:46 pm_1076973.txt` [download] `$ cat pm_1076973.txt 12345... content ...123 12345... more content ...123 12345... even more content ...123` [download] Now run the script: `$ pm_1076973.pl pm_1076973.txt` [download] Now the files look like this: `$ ls -l pm_1076973.* -rwxr-xr-x 1 ken staff 202 5 Mar 08:56 pm_1076973.pl -rw-r--r-- 1 ken staff 63 5 Mar 08:56 pm_1076973.txt -rw-r--r-- 1 ken staff 87 5 Mar 08:46 pm_1076973.txt.bak` [download] `$ cat pm_1076973.txt ... content ... ... more content ... ... even more content ...` [download] `$ cat pm_1076973.txt.bak 12345... content ...123 12345... more content ...123 12345... even more content ...123` [download] -- Ken	[reply] [d/l] [select]
Re^2: header footer by gupr1980 (Acolyte) on Mar 04, 2014 at 22:27 UTC
Does your code assume there is just one header and footer in the file? If not, i am not seeing how it will go from one header/footer segment to another. And thanks for the reading links. It definitely helps.	[reply]
Re^3: header footer by kcott (Archbishop) on Mar 04, 2014 at 23:44 UTC
"Does your code assume there is just one header and footer in the file?" No it does not. Furthermore, given I put in a fair amount of effort to show the exact input and output, I'm surprised you're asking. "If not, i am not seeing how it will go from one header/footer segment to another." That sounds like you didn't even try it. Did you just have a quick look and decided it wouldn't work? Changing the lengths from my demo `5` and `3` to your real application requirements of `50` and `30`, your sample records (posted below): HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA +CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGG +GGGGGGGGGGGGGG1111111111111111111112222222222222222222222222222333333 +333333333333333333333444444444444444444444444444444444455 55555555555 +55555555555555555566666666666666777777777777FTRDDDDDDDDDDFFFFFFFFFFFF +FFFFF HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA +CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGG +GGGGGGGGGGGGGG1111111111111111111112222222222222222222222222222333333 +333333333333333333333444444444444444444444444444444444455 55555555555 +55555555555555555566666666666666777777777777FTRDDDDDDDDDDFFFFFFFFFFFF +FFFFF [download] become 1STR HYTRES NAME PLACE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG111111111111111111111222222222222222 +222222222222233333333333333333333333333344444444444444444444444444444 +4444455 5555555555555555555555555555566666666666666777777777777 1STR HYTRES NAME PLACE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG111111111111111111111222222222222222 +222222222222233333333333333333333333333344444444444444444444444444444 +4444455 5555555555555555555555555555566666666666666777777777777 [download] -- Ken	[reply] [d/l] [select]
Re^4: header footer by gupr1980 (Acolyte) on Mar 05, 2014 at 00:07 UTC
Re: header footer by Eily (Monsignor) on Mar 04, 2014 at 21:49 UTC
You probably want to use substr instead of a regex, because there is a length parameter for counting from the start, or negative offset for counting from the end. And to only cut the first 50 chars in the first line, and last 30 in the last, you can use the variable $. (line number, which is 0, so false on the first line) and eof (which will be true on the last). So : `RemoveHeader() unless $.; RemoveFooter() if eof;` [download] If you intend to use your script on several files at once, have a look at eof on how to reset $. at the start of each file.	[reply] [d/l]
Re^2: header footer by gupr1980 (Acolyte) on Mar 04, 2014 at 22:22 UTC
Found another example that doesnt store the entire file in a variable but reads it line by line like this, `open ( my $input_fh, "<", $input_file ); open ( my $output_fh, ">", $output_file ); foreach my $line ( <$input_fh> ) { ########## i have to use regex here to first see if this line start with a HDR - +correct? then if it does i would do a substring to delete the first 50 and wr +ite the rest to the output file else write the entire line to output similarly check if it starts with EDR and substring again ########## } close ( $input_fh ); close ( $output_fh );` [download] I didnt quiet get your suggestion on using the line number and eof. The file will have numerous records that has headers and footers. Am I on the right track with the above approach? Storing in a variable vs reading line by line. Is one way better than the other?	[reply] [d/l]
Re^3: header footer by Eily (Monsignor) on Mar 04, 2014 at 23:01 UTC
Yeah, I just missed the multiple records in the same file. Mea culpa Then you can do something like: `%length = (HDR => 5, FTR => 10); while(<DATA>) { while(/(HDR\|FTR)/) # find either HDR or FTR { substr($_, $-[1], $length{$1}) = ''; # $-[1] is the position of +the first capture groups (parenthesis) } print; } __DATA__ HDR--Hello this is a test FTR-------Should there be text here? HDR--Tw +o records on the same line FTR------- HDR--and here an incomplete footer FTR--` [download]	[reply] [d/l]
Re: header footer by oiskuu (Hermit) on Mar 04, 2014 at 23:03 UTC
I'd suggest dividing the problem into parts. A: reading of a single record; B: writing the modified record; ... Most of the difficulty is in the reading part, and we cannot really offer you much advice without learning all the details about the file format. Is the record of arbitrary length? Can a single record span megabytes? Is the record size encoded in the header? Update: May a record contain "HDR" or "FTR" in its body (as a substring)?	[reply]
Re^2: header footer by gupr1980 (Acolyte) on Mar 05, 2014 at 00:04 UTC
no. It will only be there in the header and footer. you wont find a HDR or FTR in the body.	[reply]
Re^3: header footer by gupr1980 (Acolyte) on Mar 05, 2014 at 00:21 UTC
hmm.. i just realized some of the responses are not shown in full. Just the header :) Sry i am reading through those now. Thanks.	[reply]
Re^4: header footer by gupr1980 (Acolyte) on Mar 05, 2014 at 01:04 UTC
Re^5: header footer by Kenosis (Priest) on Mar 05, 2014 at 01:20 UTC
Re^4: header footer by gupr1980 (Acolyte) on Mar 05, 2014 at 13:22 UTC
Re^2: header footer by gupr1980 (Acolyte) on Mar 04, 2014 at 23:08 UTC
record is arbitrary in length yes. But like i said the header and footer lenght is always 50 and 30. Individual record is only about 10kb. But there are way too many records. record size is not encoded in the header.	[reply]
Re^3: header footer by gupr1980 (Acolyte) on Mar 04, 2014 at 23:18 UTC
so am i missing something in thinking that read line by line check if pattern HDR exists - cut from end of header to rest of line and > outfile rest of lines > outfile keep going till find pattern > FTR - cut from FTR to end of line > outfile would this not work?	[reply]
Re^4: header footer by Kenosis (Priest) on Mar 04, 2014 at 23:34 UTC
Re^4: header footer by graff (Chancellor) on Mar 04, 2014 at 23:34 UTC
Re: header footer by Laurent_R (Canon) on Mar 04, 2014 at 22:33 UTC
Hmm, I have the feeling that there are some misunderstandings here. It seems that there are several headers and several footers, not just one of each in the file. Possibly even one header and one footer for each record. It is also not entirely clear whether the headers and footers are on the same line at the data. Can you please give a sample of your file so that we can understand better its structure?	[reply]
Re^2: header footer by Eily (Monsignor) on Mar 04, 2014 at 22:46 UTC
Oups, indeed. Well then maybe setting the input file separator ($/) to "HDR" would work then. Or `s/HDR.{47}\|FTR.{27}//g;`, but it doesn't appeal much to me.	[reply] [d/l]
Re: header footer by Kenosis (Priest) on Mar 04, 2014 at 22:34 UTC
Can you share a (redacted, if necessary) copy of one of those records? Are the headers and footers the same for each record? Also, what separates those records?	[reply]
Re^2: header footer by gupr1980 (Acolyte) on Mar 04, 2014 at 22:47 UTC
The file will have 1000s of records each record has its own header and footer. The header and footer can be different for each record. Only thing is that the header and footer are of fixed lenght 50 and 30. HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA +CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG11111111111111111111122222222222222222222222222223 +33333333333333333333333333444444444444444444444444444444444455 5555555555555555555555555555566666666666666777777777777FTRDDDDDDDDDDFF +FFFFFFFFFFFFFFF HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX1STR HYTRES NAME PLA +CE DEST GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG11111111111111111111122222222222222222222222222223 +33333333333333333333333333444444444444444444444444444444444455 5555555555555555555555555555566666666666666777777777777FTRDDDDDDDDDDFF +FFFFFFFFFFFFFFF [download] Sry if there was a better way to send a sample file. I copied the same record twice for simplicity sake but they will be different but leaght will be 50 and 30 and each record will have it until the end of the file.	[reply] [d/l]
Re^3: header footer by gupr1980 (Acolyte) on Mar 04, 2014 at 22:50 UTC
Just to clarify HDR.S287878877.DDDDD.DDDDDDXXXXXXXXXXXXXXXXXXXXXXX is the header and FTRDDDDDDDDDDFFFFFFFFFFFFFFFFF is the footer	[reply]
Re^4: header footer by Kenosis (Priest) on Mar 04, 2014 at 23:08 UTC


Syntactic Confectionery Delight
	PerlMonks