G'day IB2017,
Here's code which generates the regex I think you were after.
I've provided two sets of output:
the one shown in your OP;
another which I think is more useful as it gives you access to all of the actual values in the table
(blank cells are represented as zero-length strings).
#!/usr/bin/env perl
use strict;
use warnings;
my $content = join '', <DATA>;
my ($header, undef) = split /\a\a/, $content, 2;
my $cols = scalar split /\a/, $header;
my $re = qr{((?:(?:|[^\a]+)\a){$cols}\a)};
{
print "*** WANTED ***\n";
while ($content =~ /$re/g) {
my $row = $1;
$row =~ s/\a/(BEL)/g;
print "$row\n";
}
}
{
print "\n*** PROBABLY MORE USEFUL ***\n";
my @rows;
while ($content =~ /$re/g) {
my $row = $1;
$row =~ s/\a$//;
push @rows, [ split /\a/, $row ];
}
print join('|', @$_), "\n" for @rows;
}
__DATA__
Agreement^GACAP^GACAP^GAccord^G^Galbatross^G^G^Galbatros^G^Galleged vi
+olation^G^G^Ginfraction présumée^G^Gallowable^G^G^Gadmissible^G^Ganch
+ovy^G^G^Ganchois^G^Gangler fish, burbot^G^G^Glotte^G^G
Note: all of the '^G's are actually BELL (U+0007) characters which I embedded in the DATA section.
Output:
*** WANTED ***
Agreement(BEL)ACAP(BEL)ACAP(BEL)Accord(BEL)(BEL)
albatross(BEL)(BEL)(BEL)albatros(BEL)(BEL)
alleged violation(BEL)(BEL)(BEL)infraction présumée(BEL)(BEL)
allowable(BEL)(BEL)(BEL)admissible(BEL)(BEL)
anchovy(BEL)(BEL)(BEL)anchois(BEL)(BEL)
angler fish, burbot(BEL)(BEL)(BEL)lotte(BEL)(BEL)
*** PROBABLY MORE USEFUL ***
Agreement|ACAP|ACAP|Accord
albatross|||albatros
alleged violation|||infraction présumée
allowable|||admissible
anchovy|||anchois
angler fish, burbot|||lotte