Re^2: convert tags to punctuation

in reply to Re: convert tags to punctuation
in thread convert tags to punctuation

As a variation on Polyglot's solution, you can define the tags in a hash. The advantage is that it is more easily expanded if more tags are needed. I have chosen to specify the characters by name (charnames) because I find single punctuation marks, embedded in quotes, hard to read.

use strict;
use warnings;
my %tags = (
    91 => "\N{FULL STOP}",          # '.'
    92 => "\N{APOSTROPHE}",         # '''
    93 => "\N{COMMA}",              # ','
    94 => "\N{EXCLAMATION MARK}",   # '!'
);
my $line =
    'Text with unusual punctuation<91><91><91>'
   .'I<92>m not going to lie<93> this is odd text<94>'
;

$line =~ s/<(\d\d)>/$tags{$1}/ge;

print $line, "\n";
[download]

Bill

Comment on Re^2: convert tags to punctuation Download Code

Replies are listed 'Best First'.
Re^3: convert tags to punctuation by Anonymous Monk on Jan 15, 2021 at 19:01 UTC
Bill -- I think your code is more maintainable. The document I am messing with is about 600,000 lines long. Is there a way to speed this up? Is there a way to get a complete list of <ab> tags ?	[reply]
Re^4: convert tags to punctuation by BillKSmith (Monsignor) on Jan 16, 2021 at 16:47 UTC
You should ask the person who prepares your input file if he can direct you to either a specification of the file format or to the documentation of the program that created it. If this fails, I would write a perl program to list all the tags. The only way I know to get the values, is use an editor to examine the tags in context and make your best guess. (It usually will be obvious.) It is nearly impossible to guess what will or will not make a Perl program faster. The usual advice is to profile your program. Only work on those parts which are using the most time. Use benchmark to measure possible improvement. In your case, I/O is probably taking much longer than processing. Slurping the entire file into memory is probably not an option. Reading the file in large blocks may help, but it is not easy to get right. I recommend against any optimization unless it is absolutely necessary. Bill	[reply]
Re^5: convert tags to punctuation by Anonymous Monk on Jan 16, 2021 at 17:26 UTC
I noticed something interesting about this document: If I view it with the 'more' filter. I see a bunch of black rectangles with the tags inside them. If I view it with gedi or ptked I see \x{93} , \x{94} , \x{95} , etc. Does it matter what chars go in my s/ ... / line? What does PERL see?	[reply]
Re^6: convert tags to punctuation by haukex (Archbishop) on Jan 16, 2021 at 18:07 UTC
Re^4: convert tags to punctuation by LanX (Saint) on Jan 16, 2021 at 16:59 UTC
> Is there a way to speed this up? what makes you think it's not fast enough? Update quoting davido from the CB: who cares about how fast Perl runs; it's almost always the network or IO that are standing in the way. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]

In Section Seekers of Perl Wisdom

Update