- Add my before $paragraph in while(defined($paragraph=$enumerate->Next())).
- Typo: Change ListParagraph to ListParagraphs.
- Typo: Change ListFormatListLevelNumber to ListFormat->ListLevelNumber.
- You might need ListFormat->ListValue instead of (or in addition to) ListFormat->ListLevelNumber.
- To match your VB code, check for $count->{Count} == 1 inside the if (defined $count) {...} block.
Working, tested code: #!/usr/bin/perl
use strict;
use warnings;
use Cwd qw( abs_path );
use Data::Dumper;
doc2pt( "C:/a/perls/pm/PM_932635_data.doc" );
sub doc2pt {
die if @_ != 1;
my ( $doc_path ) = @_;
( my $out_path = $doc_path ) =~ s{\.doc$}{.out}
or die "doc_path '$doc_path' does not end in '.doc'";
require Win32::OLE;
require Win32::OLE::Enum;
# NOTE: Win32::OLE appears to need abs path
my $abs_doc_path = abs_path($doc_path);
my $document = Win32::OLE->GetObject($abs_doc_path)
or die "Can't GetObject($abs_doc_path) $!\n";
print "Extracting Text ...\n";
open my $out_fh, '>', $out_path
or die "Can't open output file '$out_path': $!";
my $debug = sub {
die if @_ != 2;
my ( $name, $value ) = @_;
local $Data::Dumper::Useqq = 1;
local $Data::Dumper::Terse = 1;
printf {$out_fh} "%-10s ==> %s", $name, Dumper $value;
};
my $paragraphs = $document->Paragraphs();
my $enumerate = Win32::OLE::Enum->new($paragraphs);
while ( my $paragraph = $enumerate->Next() ) {
my $style = $paragraph->{Style}->{NameLocal};
$debug->( style => $style );
my $range = $paragraph->{Range};
my $count = $range->ListParagraphs();
if ( defined $count ) {
my $real_count = $count->{Count};
my $paranum = $range->ListFormat->ListLevelNumber();
my $paraval = $range->ListFormat->ListValue();
$debug->( real_count => $real_count );
$debug->( paranum => $paranum );
$debug->( paraval => $paraval );
}
my $text = $paragraph->{Range}->{Text};
$text =~ tr{\n\r}{}d;
$text =~ tr{\x0b}{\n};
$debug->( text => $text );
}
close $out_fh;
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|