Thank's for your help with regex.
The program has been finished and some features have been added, those are reference line numbers at the beginning/ending of each line in the output file.
TABs have been changed to SPACEs for portability with Unicode.
The program read the input file to collect the maximum line reference number (max_line), number of line containing text (regex
/[a-zA-Z0-9]/) in the input file (valid_line)
, the number of
group of words read from the input file for the max array (nbr_max_tab) and the respective TABs stop position in max array.
variable name |
variable type |
line number in program |
description |
max_line |
scalar number |
{16 27} |
the maximum line reference number when writing to output file by adding nbr_line to valid_line |
max_tab |
scalar number |
{3 8 10 11 13 22 25} |
current index in the max array |
nbr_line |
scalar number |
{3 16 23 29} |
current line number when writing line reference numbers |
nbr_max_tab |
scalar number |
{3 13} |
number of index numbers (size) in the max array |
valid_line |
scalar number |
{3 8 16} |
number of line containing text (regex /[a-zA-Z0-9]/) in the input file |
max |
array number |
{3 10 25} |
array containing the maximum TAB stop column number for each group of words read from the input file |
ARGV |
array misc |
{1 3 4 23 27} |
array containing command line arguements passed to the program, those are
0 input file to read from
1 output file to write to
2 number of 0s to prepend to line numbers when writing line reference numbers
3 starting number for line numbers when writing line reference numbers
4 number of SPACEs between reference numbers {2 3} and line content ($_) when writing to output file (1)
|
LAST_MATCH_START/@-/$-[0] |
array number |
25 |
column where TABs start within line and between group of words |
LAST_MATCH_END/@+/$+[0] |
array number |
{10 25} |
column where TABs end within line and between group of words |
$_ |
scalar misc |
25 |
last line read from the input file |
{F0 F1} |
scalar pointer |
{4 6 17 19 23 25 27 28 32} |
position in input/output files |
The input.txt and output.txt file have used Ubuntu Mousepad
http://www.xfce.org/ having vertical scrolling which vi lacks
input have been purposely misaligned and containing empty lines
input.txt
Bob, the rabbit jump above the fence Jack, the cat hid under th
+e porch of the red house Rex, the dog ran after Jack th
+e birds fly When the world is reduced to a single dark wood fo
+r our two pairs of dazzled eyes to a musical house for our cle
+ar understanding then I shall find you
When we are very strong who draws back? very happy
+ who collapses from ridicule When we are very bad what
+ can they do to us.
The taste of ashes in the air the smell of wood sweating in the
+ hearth steeped flowers the devastation of paths
+ drizzle over the canals in the fields why not already playthi
+ngs and incense?
Arousing a pleasant taste of Chinese ink a black powder gently
+rains on my night I lower the jets of the chandelier th
+row myself on the bed and turning toward thedark I see
+you O my daughters and queens!
output line reference number at begin/end and
group of words are well alined
output.txt
00 Bob, the rabbit jump above the fence Jack, the cat hid
+under the porch of the red house
+ Rex, the dog ran after Jack
+ the birds fly
+
+ When the w
+orld is reduced to a single dark wood for our two pairs of dazzled ey
+es
+
+to a musical house for our clear understanding
+
+
+ then I shal
+l find you {00 .. 03}
01 When we are very strong who draws back?
+
+ very happy
+ who collapses fr
+om ridicule
+ When we ar
+e very bad
+
+
+what can they do to us.
+
+
+
+ {00 .. 03}
02 The taste of ashes in the air the smell of wood
+sweating in the hearth
+ steeped flowers
+ the devastation
+of paths
+ drizzle ov
+er the canals in the fields
+
+
+why not already playthings and incense?
+
+
+
+ {00 .. 03}
03 Arousing a pleasant taste of Chinese ink a black powder gen
+tly rains on my night
+ I lower the jets of the chandelier
+ throw myself on
+the bed
+ and turnin
+g toward thedark
+
+
+I see you
+
+
+ O my daught
+ers and queens! {00 .. 03}
shell command to call the program with the right arguement
usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NUMBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_NUMBER_AND_LINE>
ex:
perl format-pre-post-nbr-SPACE.pl input.txt output-0.txt 2 0 8
code
die "usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NU
+MBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_
+NUMBER_AND_LINE>\n" if $#ARGV < 4;
$valid_line=$nbr_max_tab=0;$max[0]=0;$nbr_line=$ARGV[3];
open(F0, $ARGV[0]); open(F1, ">$ARGV[1]");
while(<F0>) {
if (/[a-zA-Z0-9]/) {
$valid_line++; $max_tab=1;
while (/\t+/g) {
$max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] || $max[$
+max_tab] eq "";
$max_tab++;
}
$nbr_max_tab = $max_tab if $nbr_max_tab < $max_tab;
}
}
$valid_line--;$max_line=$nbr_line+$valid_line;
seek F0,0,0;
while(<F0>) {
s/\r//;chop;
if (/[a-zA-Z0-9]/) {
$max_tab=1;
print F1 "0" x ($ARGV[2] - length($nbr_line)), $nbr_line, " "
+x $ARGV[4];
while (/[^\t]+/g) {
print F1 substr($_, $-[0], ($+[0] - $-[0])), " " x ($max[
+$max_tab++] - ($+[0] - $-[0]));
}
print F1 " " x $ARGV[4], "{", "0" x ($ARGV[2] - length($ARGV[3
+])), $ARGV[3], " .. ", "0" x ($ARGV[2] - length($max_line)), $max_lin
+e, "}";
print F1 "\n";
$nbr_line++;
}
}
close F0;close F1;
group of words : text NOT containing TABs
/[^\t]/
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.