Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Thank's for your help with regex.
The program has been finished and some features have been added, those are reference line numbers at the beginning/ending of each line in the output file. TABs have been changed to SPACEs for portability with Unicode.
The program read the input file to collect the maximum line reference number (max_line), number of line containing text (regex /[a-zA-Z0-9]/) in the input file (valid_line) , the number of group of words read from the input file for the max array (nbr_max_tab) and the respective TABs stop position in max array.


variable name variable type line number in program description
max_line scalar number {16 27} the maximum line reference number when writing to output file by adding nbr_line to valid_line
max_tab scalar number {3 8 10 11 13 22 25} current index in the max array
nbr_line scalar number {3 16 23 29} current line number when writing line reference numbers
nbr_max_tab scalar number {3 13} number of index numbers (size) in the max array
valid_line scalar number {3 8 16} number of line containing text (regex /[a-zA-Z0-9]/) in the input file
max array number {3 10 25} array containing the maximum TAB stop column number for each group of words read from the input file
ARGV array misc {1 3 4 23 27} array containing command line arguements passed to the program, those are
0 input file to read from
1 output file to write to
2 number of 0s to prepend to line numbers when writing line reference numbers
3 starting number for line numbers when writing line reference numbers 
4 number of SPACEs between reference numbers {2 3} and line content ($_) when writing to output file (1)
LAST_MATCH_START/@-/$-[0] array number 25 column where TABs start within line and between group of words
LAST_MATCH_END/@+/$+[0] array number {10 25} column where TABs end within line and between group of words
$_ scalar misc 25 last line read from the input file
{F0 F1} scalar pointer {4 6 17 19 23 25 27 28 32} position in input/output files




The input.txt and output.txt file have used Ubuntu Mousepad http://www.xfce.org/ having vertical scrolling which vi lacks input have been purposely misaligned and containing empty lines
input.txt
Bob, the rabbit jump above the fence Jack, the cat hid under th +e porch of the red house Rex, the dog ran after Jack th +e birds fly When the world is reduced to a single dark wood fo +r our two pairs of dazzled eyes to a musical house for our cle +ar understanding then I shall find you When we are very strong who draws back? very happy + who collapses from ridicule When we are very bad what + can they do to us. The taste of ashes in the air the smell of wood sweating in the + hearth steeped flowers the devastation of paths + drizzle over the canals in the fields why not already playthi +ngs and incense? Arousing a pleasant taste of Chinese ink a black powder gently +rains on my night I lower the jets of the chandelier th +row myself on the bed and turning toward thedark I see +you O my daughters and queens!

output line reference number at begin/end and group of words are well alined
output.txt
00 Bob, the rabbit jump above the fence Jack, the cat hid +under the porch of the red house + Rex, the dog ran after Jack + the birds fly + + When the w +orld is reduced to a single dark wood for our two pairs of dazzled ey +es + +to a musical house for our clear understanding + + + then I shal +l find you {00 .. 03} 01 When we are very strong who draws back? + + very happy + who collapses fr +om ridicule + When we ar +e very bad + + +what can they do to us. + + + + {00 .. 03} 02 The taste of ashes in the air the smell of wood +sweating in the hearth + steeped flowers + the devastation +of paths + drizzle ov +er the canals in the fields + + +why not already playthings and incense? + + + + {00 .. 03} 03 Arousing a pleasant taste of Chinese ink a black powder gen +tly rains on my night + I lower the jets of the chandelier + throw myself on +the bed + and turnin +g toward thedark + + +I see you + + + O my daught +ers and queens! {00 .. 03}

shell command to call the program with the right arguement
usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NUMBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_NUMBER_AND_LINE> ex: perl format-pre-post-nbr-SPACE.pl input.txt output-0.txt 2 0 8

code

die "usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NU +MBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_ +NUMBER_AND_LINE>\n" if $#ARGV < 4; $valid_line=$nbr_max_tab=0;$max[0]=0;$nbr_line=$ARGV[3]; open(F0, $ARGV[0]); open(F1, ">$ARGV[1]"); while(<F0>) { if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab=1; while (/\t+/g) { $max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] || $max[$ +max_tab] eq ""; $max_tab++; } $nbr_max_tab = $max_tab if $nbr_max_tab < $max_tab; } } $valid_line--;$max_line=$nbr_line+$valid_line; seek F0,0,0; while(<F0>) { s/\r//;chop; if (/[a-zA-Z0-9]/) { $max_tab=1; print F1 "0" x ($ARGV[2] - length($nbr_line)), $nbr_line, " " +x $ARGV[4]; while (/[^\t]+/g) { print F1 substr($_, $-[0], ($+[0] - $-[0])), " " x ($max[ +$max_tab++] - ($+[0] - $-[0])); } print F1 " " x $ARGV[4], "{", "0" x ($ARGV[2] - length($ARGV[3 +])), $ARGV[3], " .. ", "0" x ($ARGV[2] - length($max_line)), $max_lin +e, "}"; print F1 "\n"; $nbr_line++; } } close F0;close F1;

group of words : text NOT containing TABs /[^\t]/

In reply to Re: misalined TABs using substr,LAST_MATCH_START/END,regex by perl_boy
in thread misalined TABs using substr,LAST_MATCH_START/END,regex by perl_boy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-19 18:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found