dbach355 has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I am a bit novice at perl and I may be able to resolve this with shell, but I think perl is likely better solution.
The input file is many fields perhaps 100 that are space delimited. My goal is to output a delmited file with a unique delimiter such as \f. The layout is a bit more complicated that some of the fields have spaces in it. To get around this, the preceeding field contains the number of spaces that are in the field. Below is a short example (not the entire file)
Field names (file is not comma delimited)
ssn, employee number, Number of characters of employee name, employee name, hire date, number of characters for address, address, state, number of characters for city, city, zip
123445678 45612 11 Steve Smith 11012015 16 1001 Main Street GA 7 Atlanta 30553
The number of fields is fixed, the number of fixed field lengths and variable field lengths varies. I was trying to figure out how to start processing the file and the first 10 fields are of fixed length, and then the variable ones start and are mixed in with some fixed fields.
I am thinking to do something such as an array, or hash to have the field name and maybe the type of field.
ssn,empNo,ncEmpName,empName,hireDate,ncAddr,addr,state,ncCity,city,zip (nc=number of characters)
f,f,v,d,f,v,d,f,v,d,f (f -fixed, v-stating characters for next field which is variable, d - data of the variable field)
I would then run a subroutine based on if the field is fixed, variable, or data, but then need some method to set the remaining characters to continue processing
The reading of the first fixed fields is simple, reading the first field that contains the number of characters of the next variable field is also simple. Such as in my example it is easy to read ssn empNo ncEmpName...then My thoughts were to start processing the next as an single character array, and when I know the number of characters such as 11, I would pull those characters for the field, so then I am not sure how to read the reaminder of the data as a new field for input to continue processing.
Sorry, this is likely pretty vague what I am trying to describe. I am looking for some suggestions of how to process, such as using single character array and start processing data that way, or if there are some other methods I am likely not familiar with.
Re: How to process variables length fields in delimited file.
by liverpole (Monsignor) on Oct 06, 2016 at 02:03 UTC
|
Hi dbach355,
My first approach would be to define, programmatically (ie. with a data structure), what the input file contains on each line.
Once that's in a script, you run it and prove to yourself that your data does in fact behave as expected.
Since each line is made up of space-delimited items, but some of them are count-prefixed, you could
define your line format with an array containing an array reference for each item. Each array reference
would hold the LABEL of the item (eg. 'ssn' for social-security, 'emp_num' for employee number, etc.), and
a compiled regular expression (that's the qr/.../ syntax) used to parse the item.
In cases where the item is prefixed with a count, specifying the length of the item, you could use a
string like 'COUNT' instead of a regex.
Here's an example for what you've defined:
my @line_format = (
[ 'ssn', qr/(\d{9})/ ],
[ 'emp_num', qr/(\d+)/ ],
[ 'emp_name', 'COUNT' ],
[ 'hire_date', qr/(\d{8})/ ],
[ 'city', 'COUNT' ],
[ 'state', qr/([A-Z]{2})/ ],
[ 'city', 'COUNT' ],
[ 'zip', qr/(\d{5})/ ],
);
Then you write a subroutine parse_line that you call for each line of your input file.
(I would also pass in the line number, in case the line doesn't match your formula, so you can
die with an error saying which line was invalid).
For each array ref in @line_format you either parse the COUNT, and pull off that number
of characters, or you apply the next regex. If the data validates, you assign it into a hash
local to the subroutine, with the label as the key. When the subroutine completes successfully,
you pass back a reference to that hash.
Here's how you might write the parse_line subroutine:
sub parse_line {
my ($line, $linenum) = @_;
my %parsed = ( );
foreach my $format (@line_format) {
my ($label, $expected) = @$format;
if ($expected eq 'COUNT') {
# Pull the COUNT off the beginning of the line and apply i
+t
if ($line !~ s/\s*(\d+) //) {
die "Error #1 parsing item '$label' (line #$linenum)\n
+";
}
my $count = $1;
if ($line !~ s/(.{$count})//) {
die "Error #2 parsing item '$label' (line #$linenum)\n
+";
}
$parsed{$label} = $1;
} else {
# Pull of the next non-space word, and test with the regex
if ($line !~ s/^\s*(\S+)//) {
die "Error #3 parsing item '$label' (line #$linenum)\n
+";
}
$parsed{$label} = $1;
}
}
return \%parsed;
}
When I call that subroutine with the data you defined for a single line:
use Data::Dumper::Concise;
my $line = "123445678 45612 11 Steve Smith 11012015 16 1001 Main
+ Street GA 7 Atlanta 30553";
my $result = parse_line($line, 1);
die Dumper $result;
This simple program dumps as its result:
{
city => "Atlanta",
emp_name => "Steve Smith",
emp_num => 45612,
hire_date => 11012015,
ssn => 123445678,
state => "GA",
zip => 30553
}
So I know I'm on the right track.
The next steps would be something like;
- Read all the lines in the file
- Call the subroutine parse_line on each line (and line number), getting back a hash ref
- Add that hash ref to an array (or do whatever you want with it)
Does that help?
Edit: fixed whom I'm responding to (thanks choroba)
s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
| [reply] [d/l] [select] |
Re: How to process variable length fields in delimited file.
by GrandFather (Saint) on Oct 06, 2016 at 01:37 UTC
|
If the fixed fields really are fixed length rather than space delimited then you can pull the lines apart using a template like this:
use strict;
use warnings;
my @template = (
'ssn 9',
'employee number 5', 'employee name *',
'hire date 8',
'address *', 'state 2', 'city *', 'zip 5'
);
while (my $line = <DATA>) {
chomp $line;
my %fields;
for my $field (@template) {
my ($name, $length) = $field =~ /(.*) (.+)/;
$line =~ s/^\s+//;
$length = substr $line, 0, index ($line, ' ') + 1, '' if $leng
+th eq '*';
$fields{$name} = substr $line, 0, $length, '';
}
print "$_: $fields{$_}\n" for keys %fields;
}
__DATA__
123445678 45612 11 Steve Smith 11012015 16 1001 Main Street GA 7 Atlan
+ta 30553
Prints:
employee number: 45612
state: GA
hire date: 11012015
city: Atlanta
zip: 30553
ssn: 123445678
employee name: Steve Smith
address: 1001 Main Street
For output I'd strongly recommend using a module like Text::CSV to generate correctly formatted CSV files.
Premature optimization is the root of all job security
| [reply] [d/l] [select] |
Re: How to process variable length fields in delimited file.
by choroba (Cardinal) on Oct 06, 2016 at 07:44 UTC
|
You can create an unpack template that parses each line, the only problem is that in order to use the length fields, they must be separated by null bytes, not zeroes. But it's easy to change spaces to nulls and then back:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use Syntax::Construct qw{ /r };
my $template = join 'x',
'A9', # ssn
'A5', # employee number
'Z*/A', # employee name
'A8', # hire date
'Z*/A', # address
'A2', # state
'Z*/A', # city
'A5', # zip
;
while (<>) {
say join ',', map tr/\x0/ /r, unpack $template, tr/ /\x0/r;
}
Update: used tr instead of s .
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
Re: How to process variable length fields in delimited file.
by Marshall (Canon) on Oct 06, 2016 at 01:29 UTC
|
Hi dbach355,
My goal is to output a delmited file with a unique delimiter such as \f. I think that you will find that a CSV (Comma Separated Value) line using the "pipe" character, "|" as the delimiter will work out well. CSV is a generic term, you can use something other than a comma. I work with a few "|" separated DB's, some with a million+ records. If you use \f, "Form Feed", you will wind up with something that cannot be printed easily (one page per column is not too friendly!). This also has the problem of being "invisible". Using a tab character (\t) has the same visibility problem.
The real problem with your format are the embedded spaces. These first 10 columns can be handled in a number of ways. What do the other columns look like? Do they contain embedded spaces, like "John Smith"? Do they have a constant field width perhaps? Your goal is achievable. I just need a bit more info.
Update:
Once you have the data in "|" delimited form, Perl can process a line like that easily. An example is shown below. There are modules, like Txt::CSV that can be used. However, if the "|" does not appear anywhere in the data, there is no need for that. You are new to Perl and I don't want to overly complicate things if it is not necessary.
#!/usr/bin/perl
use strict;
use warnings;
my $line = "ssn|empNo|ncEmpName|empName|hireDate|ncAddr|addr|state|ncC
+ity|city|zip";
my @columns = split /\|/, $line;
print "@columns[-1,-4,4,3]\n"; # "zip state hireDate empName"
Update with code:
I thought some more about this problem. If you have fixed width fields interspersed with space separated fields, you have a big mess.
One way to describe the fields and implement this is shown below.
A 'v' field contains no embedded spaces and is variable in length, an "f" field, fixed field is a certain number of characters. This code builds a Regex (Regular Expression) and then executes that regex on the input. Anybody's brain would go crazy to write a regex with 100 terms, hence the program does that from the input table.
I do suspect that your problem can be solved "easier" than this, but without more info about the other ~90 columns, I am unsure.
#!/usr/bin/perl
use strict;
use warnings;
#empNo|ncEmpName|empName|hireDate|ncAddr|addr|state|ncCity|city|zip";
my $line2 = "123445678 45612 11 Steve Smith 11012015 16 1001 Main Stre
+et GA 7 Atlanta 30553 x y z";
# Note: Looks like ncEmpName is "45612 11", a fixed width field
my @format_spec = qw(
v empNo
f8 ncEmpName
f11 enpName
v hireDate
v ncAddr
f16 addr
v state
v ncCity
f7 city
v zip
v x
v y
v z
);
my $regex = "^";
while (@format_spec)
{
my $format = shift @format_spec; # pair wise in List::Util possible
my $name = shift @format_spec; # here keep it simple
if ($format =~ /v/) #variable length (no embedded spaces)
{
$regex .= '\s*(\S+)';
}
elsif ( (my $width) = $format =~ /\s*f(\d+)/) # fixed length,means
# embedded spaces
{
$regex .= '\s*(.{' . "$width})"; # \s cannot be within ""
}
print "$regex\n"; #for debug, comment this out later
}
my (@tokens) = $line2 =~ /$regex/;
print join ("|", @tokens), "\n";
__END__
The regex is built like this:
^\s*(\S+)
^\s*(\S+)\s*(.{8})
^\s*(\S+)\s*(.{8})\s*(.{11})
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)\s*(\S+)
+\s*(.{7})
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)\s*(\S+)
+\s*(.{7})\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)\s*(\S+)
+\s*(.{7})\s*(\S+)\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)\s*(\S+)
+\s*(.{7})\s*(\S+)\s*(\S+)\s*(\S+)
^\s*(\S+)\s*(.{8})\s*(.{11})\s*(\S+)\s*(\S+)\s*(.{16})\s*(\S+)\s*(\S+)
+\s*(.{7})\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)
The "|" separated line is like this:
123445678|45612 11|Steve Smith|11012015|16|1001 Main Street|GA|7|Atlan
+ta|30553|x|y|z
Of course the fixed length fields can have trailing spaces, but that is easy to get rid of:
@tokens = map{s/\s*$//; $_;}@tokens; #delete trailing spaces
or some such similar formulation. Also, a very long but simple (no back-tracking) regex can execute quite quickly. I doubt that a regex approach will be a performance problem even if the regex is so long that it is incomprehensible to a human. | [reply] [d/l] [select] |
Re: How to process variable length fields in delimited file.
by Tux (Canon) on Oct 06, 2016 at 07:40 UTC
|
If the first 10 fields are of fixed length, I'd use unpack on that part. Using A10 for a 10 characted wide field will strip the trailing spaces. Work on from there.
my ($ssn, $empno, $empname, ...) = unpack "A10 A20 A12 ...", $buffer;
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: How to process variable length fields in delimited file.
by johngg (Canon) on Oct 06, 2016 at 11:20 UTC
|
This is similar to GrandFather's approach moving along the line field by field a time but uses the @fieldNames array and a counter to determine whether we have an actual field or the field width of the next field.
use strict;
use warnings;
use feature qw{ say };
open my $dataFH, q{<}, \ <<__EOF__ or die qq{open: < HEREDOC: $!\n};
123445678 45612 11 Steve Smith 11012015 16 1001 Main Street GA 7 Atlan
+ta 30553
234256653 76467 8 Joe Blow 06072014 11 83 Low Road CO 6 Denver 12345
239879583 62098 10 Andy Pandy 03112012 13 10 The Strand NJ 13 Atlantic
+ City 16345
__EOF__
my @fieldNames = qw{
ssn empNo ncEmpName
empName hireDate ncAddr
addr state ncCity city zip
};
while ( <$dataFH> )
{
chomp;
my $fieldCt = 0;
my @fields;
while ( length )
{
s{^\s*}{};
my $next = $1 if s{(\S+)}{};
if ( $fieldNames[ $fieldCt ] =~ m{^nc} )
{
s{^\s*}{};
push @fields, substr $_, 0, $next, q{};
$fieldCt ++;
}
else
{
push @fields, $next;
}
$fieldCt ++;
}
say join q{|}, @fields;
}
The output.
123445678|45612|Steve Smith|11012015|1001 Main Street|GA|Atlanta|30553
234256653|76467|Joe Blow|06072014|83 Low Road|CO|Denver|12345
239879583|62098|Andy Pandy|03112012|10 The Strand|NJ|Atlantic City|163
+45
I hope this is of interest.
| [reply] [d/l] [select] |
Re: How to process variable length fields in delimited file.
by shadowsong (Pilgrim) on Oct 06, 2016 at 11:37 UTC
|
Hi dbach355
Seeing as how The number of fields is fixed, the number of fixed field lengths and variable field lengths varies - if all you need is another file with custom delimiters; you can achieve this with a one-liner:
perl -lawpe "$_=qq|$F[0]\\f$F[1]\\f$F[2]\\f$F[3]|" in.txt > out.txt
The offset within the @F array denotes each input field in your file; so offset 0 would represent ssn, offset 1 employee number and so on; simply craft your output line how you'd like it... See http://www.perl.com/pub/2004/08/09/commandline.html for additional command line options.
Cheers, Shadowsong | [reply] [d/l] |
Re: How to process variable length fields in delimited file.
by dbach355 (Initiate) on Oct 06, 2016 at 13:28 UTC
|
Thank you all for your responses. I will review them for a better understanding. I am very novice in perl, so I would like to read details to get a good understanding of the proposed methods
I could have put the exact code to begin with, but I did not want to get to long winded, but at times details are better. One reason I was thinking of using the \f character is I don't care about printing the data ( I say that now), the data once in a readable delimited file will pass to SPLUNK application for end use. The problem in the data is there is about every character in the text. There are maybe 1,0000,000 lines of text a day and from the below message these are text from network devices which include characters such as #@$^|}{[]<> and about every character I could think of. They had tabs in also. I finally grepped the file for several days of output and I did not find a \f. Other possibility is to use multicharacter delimiter such as @#! which is unlikely to be together as standard text.
Here is the devil in the details of the true layout and an example of 1 data line. I will review and when I have time, comment on the solution. Thank you all
For each message:
1. Record Starter: "====>"
2. Message ID (uuid)
3. Condition ID (uuid, for future use)
4. Network Type of message node:
IP Node 1
Non IP Node 5
5. IP Address (see A.)
6. String length of the nodename
7. Nodename
8. Network Type of message generation node (see 4.)
9. IP Address of message generation node (see A.)
10. String length of the message generation nodename
11. Nodename of message generation node
12. Log only flag
13. Unmatched flag
14. Message source type
Console 0x0001
Message API 0x0002
Logfile 0x0004
Monitor 0x0008
SNMP 0x0010
Server MSI 0x0020
Agent MSI 0x0040
Legacy Link 0x0080
| Schedule 0x0100
Internal 0x1000
Subproduct 0x2000
15. Notification flag:w
16. Trouble ticket flag
17. Acknowledge on troubleticket flag
18. Message creation date and time (see B. for the format)
19. Message receipt date and time (see B. for the format)
| 20. Unbuffer time
21. Severity
UNKNOWN 0x01
NORMAL 0x02
WARNING 0x04
CRITICAL 0x08
MINOR 0x10
MAJOR 0x20
22. Status of the auto action
Failed 2
Started 8
Finished 9
Defined 11
Undefined 12
23. Network Type of auto action node (see 4.)
24. IP address of the node where the auto action is executed (see A.
+)
25. String length of the nodename where the auto action is executed
26. Nodename of the node where the auto action is executed
27. Auto action creates annotation flag
28. Acknowledge flag of the auto action
29. Status of the operator initiated action (see 15.)
30. Network Type of operator initiated action node (see 4.)
31. IP address of the node where the operator initiated action is ex
+ecuted
32. String length of the nodename where the oper. initiated action i
+s executed
33. Nodename of the node where the operator initiated action is exec
+uted
34. Operator initiated action creates annotation flag
35. Acknowledge flag of the operator initiated action
36. Time and date when the message has been acknowledged (see B. for
+ the format)
37. String length of the operator who has acknowledged the message
38. Name of the operator who has_acknowledged the message
39. String length of message source
40. Message source
41. String length of application
42. Application
43. String length of messagegroup
44. Messagegroup
45. String length of object
46. Object
47. String length of notification service name(s)
48. Notification service name(s)
49. String length of auto action call
50. Auto action call
51. String length of operator initiated action call
52. Operator initiated action call
53. String length of message text
54. Message text
55. String length of original message text
56. Original message text
57. Number of annotations
58. String length of message type
59. Message type
60. Esclate Flag
61. Assign flag
62. Escalation type
63. Date and time when the message was escalated (see B. for the for
+mat)
64. Network Type of escalation node (see 4.)
65. Escalation server IP address
66. String length of escalation server node name
67. Escalation server node name
68. String length of the operator who has escalated the message
69. Name of the operator who has escalated the message
70. Instruction type:
No instruction 0
Instruction text 1
Instruction Interface 2
Internal instruction 3
71. Read only flag
72. Original message number (uuid)
73. Time difference in seconds between agent time zone and GMT
74. String length of instruction ID or name
75. Instruction ID, instruction interface name or message numbers
of internal instructions (depends on instruction type)
| 76. Length of Instruction Interface parameters
77. Instruction Interface parameters
78. String length of service name
79. Service name
80. String length of message key
81. Message key
82. Duplicate count
83. Date/time when last duplicate was received (see B. for the form
+at).
This field is 0 if message has no duplicates.
84. CMA count. Number of custom message attributes.
For each CMA:
1. CMA record starter: "CMA"
2. String length of the CMA name
3. CMA name
4. String length of the CMA value
5. CMA value
For each annotation:
1. Annotation record starter: "ANNO"
2. Date and time of the annotation (see B. for the format)
3. Annotation number
4. String length of the author of the annotation
5. Author of the annotation
6. String length of the annotation text
7. Annotation text
A. All IP addresses are in binary format
the following script can be used to convert the IP address:
#cat convert.sh
#!/bin/ksh
# convert.sh
# usage convert <IP_ADDRESS_IN_BINARY_FORMAT>
OPC_IP_ADDR=$(echo $1| awk '{printf("%d.%d.%d.%d\n", \
((int($1)/16777216)%256), \
((int($1)/65536)%256), \
((int($1)/256)%256), \
((int($1))%256) \
)}')
echo "$1 = ${OPC_IP_ADDR}"
#end of convert.sh
B. All time specifications are in seconds since 1.1.1970 GMT
1 Example data line
====> 064191a8-7db9-71e6-12cc-abbb01aa0000 45f86528-d563-71e0-03bd-8a2
+39ed50000 1 175337506 39 router174.network.microsoft.com 1 -141380770
+2 44 syslog152.network.microsoft.com 1 0 4 0 0 0 1474214430 147421443
+1 0 2 12 0 0 0 0 0 12 0 0 0 0 0 1474214431 3 OpC 22 GNS_IOS_SYSLOG_
+2(1.71) 35 SYSLOG-cisco-ios-RADIUS-SERVERALIVE 4 DATA 13 mxgamdrnb08e
+ 0 0 0 116 RADIUS-6-SERVERALIVE: Group ACCT_GROUP: Radius server 1
+7.24.174.55:1645,1646 is responding again (previously dead). 235 2016
+-09-18T10:59:45.932408-05:00 mxgamdrnb08e.microsoft.com local7.info 2
+1395: Sep 18 15:59:44.907 GMT: %RADIUS-6-SERVERALIVE: Group ACCT_GRO
+UP: Radius server 17.24.174.55:1645,1646 is responding again (previou
+sly dead). 0 0 0 0 0 0 0 0.0.0.0 0 0 0 0 0000000000000000000000000
+00000000000 18000 0 0 44 systlog152.network.microsoft.com 70 SYSLOG
+:mxgamdrnb08e:RADIUS-SERVER_STATUS:17.24.174.55:1645,1646:good 0 1474
+214431 20 CMA 15 ATRIUM_CATEGORY 6 SWITCH CMA 13 ATRIUM_IMPACT 0 CMA
+ 17 ATRIUM_IP_ADDRESS 12 10.15.212.34 CMA 15 ATRIUM_MAILCODE 7 GA8-89
+5 CMA 19 ATRIUM_MANUFACTURER 5 CISCO CMA 17 ATRIUM_NODE_GROUP 50 MANA
+GENOC DATA SITE TYPE A2 CSCTG62793_DISABLE_RD CMA 15 ATRIUM_PRIORITY
+ 10 PRIORITY_5 CMA 14 ATRIUM_PRODUCT 18 Catalyst 3560x-24P CMA 13 ATR
+IUM_REGION 2 US CMA 17 ATRIUM_SITE_GROUP 5 US-GA CMA 14 ATRIUM_URGENC
+Y 0 CMA 13 ATRIUM_ciName 12 MXGAWDRNB08E CMA 13 MSC_IN_ATRIUM 1 Y CM
+A 11 EventSource 10 MS_Network CMA 15 REMEDY_ticketID 1 N CMA 14 cond
+ition_name 55 SYSLOG-cisco-ios-RADIUS-SERVERALIVE (resolution) [1628]
+ CMA 15 gns.alarm.class 8 BreakFix CMA 15 gns.alarm.state 10 REGISTER
+ED CMA 19 gns.alarm.subobject 22 17.24.174.55:1645,1646 CMA 25 gns.cm
+db.auto.ticket.flag 4 none
| [reply] [d/l] |
|
Here is the devil in the details of the true layout and an example of 1 data line
The squirrel is always in the details, since the devil is a squirrel. But I can't help you here with the data you provided (only one record? seriously?) since in "39 router174.network.microsoft.com" - well, "router174.network.microsoft.com" is just 31 chars long, not 39. Even with a NULL terminator it would be 32 chars long, not 39. Hence, the following is just bull - you know, garbage in => garbage out.
while (<>) {
s/\r?\n//; # strip line endings
# get field numbers and field description
if (/\s{2,3}(\d+)\. (.+)/) {
my ($number, $text) = ($1,$2);
$number--; # since first element of an array is 0, not 1
# if this field denotes string length, store it
if ($text =~ /string length/i) {
push(@lengths, $number);
}
# remember field number and text (only if not previously seen)
$names{$number} = $text unless $names{$number};
next; # nothing else to do for this line.
}
# now process the one line of data, if at hand
if (/^====>/) { # Record Starter, right?
# split line at whitespace
my @array = split;
# for all length indicators, concatenate
# subsequent array elements into one
# complain if the size doesn't fit
for my $index (@lengths) {
my $length = $array[$index];
my $string;
my $counter = 1;
while (length $string < $length) {
# join array elements with space to rebuild the field
$string = join " ", $string, $array[$index + $counter]
+;
warn "length mismatch for $string: $length <=> ".lengt
+h $string,"\n"
if length $string > $length + 1;
}
# weed out concatenated elements from array
splice @array, $index + 1, $counter;
}
# done, output the fields
for (sort {$a <=> $b} keys %names) {
print "$names{$_}: $array[$_]\n";
}
}
}
perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
| [reply] [d/l] |
|
| [reply] [d/l] |
Re: How to process variable length fields in delimited file.
by dbach355 (Initiate) on Oct 17, 2016 at 20:02 UTC
|
Thank you all for comments. I have not had change to review and test to see if I understand. The job gets in the way :) And the company is forcing 2 weeks off, so I have been trying to clean up some old tasks.
Again, thank you and I will update.
David
| [reply] |
|
|