perl_seeker has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file like this:

try **until **you **succeed

I'm opening this file in a Tk text widget and searching for
the pattern **.
Then for each match found,
I need to extract the word next to the pattern
for example the word "until"
Here's the code I tried:

foreach $mitem(@missing) { $result=$t->search(-forwards,"**",'end'); #print"\nThe start pos of the pattern:"; #print "\n$result"; $start=$result+.2; #print "\n$start"; @chars=$t->get("$start","$start wordend"); print "\nThe word:"; print"\n@chars"; #$t->tagConfigure("wrong",-foreground=>"red"); #$t->tagAdd("wrong","$start","wordend");

The value of $result is printed as 1.4(5th char on 1st line)
Then $start holds the value 1.6(1.4 for the first *, 1.5
for the 2nd *,1.6 for the character 'u' in the word 'until')
Then I need to extract all the chars from 'u' to 'l'(all
chars in 'until')
But these lines in the code:

@chars=$t->get("$start","$start wordend"); print "\nThe word:"; print"\n@chars";
print only the first character i.e. 'u'.
What could be the problem here? Any help would be appreciated.

Replies are listed 'Best First'.
Re: Tk text widget indices.
by meredith (Friar) on Jul 16, 2003 at 13:41 UTC
    Hrm, why arent you using regexes? I'm no master, but I think this will help:
    m|\*\*(\w+)\b|g If you have all the text in a scalar, you can do this:
    while($text =~ m|\*\*(\w+)\b|g) { print "Found: " , $1 , "\n"; }
    Is this what you want? Or was there some reason you're using TK's functions??


    Update: explanation of regex
    m| \* #literal asterisk \* #ditto ( #start capture \w+ #one or more word characters aka [A-Za-z0-9_] ) #end capture \b #word boundary |gx #/x allows whitespace, /g makes it global (restart from last m +atch position in this case)
    In case you're not familar with regexes, run this: perldoc perlretut :)

    mhoward - at -
      Hi ,
      Thanks a lot for the reply.Is it possible to do
      regex matches in a Tk text widget? I need not only
      to search for the **'s, I need to extract
      the word next to the 2nd *, change it's font color
      and then delete the **'s.The user needs to see the text
      in a window, and the font color change for certain words
      on a button click.
      I need to mark certain word with
      the color change instead of the **'s I previously used.
Re: Tk text widget indices.
by tos (Deacon) on Jul 16, 2003 at 13:49 UTC

    the follwing test ...

    #!perl use strict; use warnings; use Tk; use vars qw/$TOP/; $TOP = MainWindow->new(); my $t = $TOP->Scrolled(qw/Text -scrollbars e/)->pack(); $t->insert('0.0', 'try **until **you **succeed'); my $result = $t->search(-forwards,"**",'end'); print "\$result: $result\n"; my $start=$result+.2; print "\$start: $start\n"; my @chars=$t->get("$start","$start wordend"); print "\nThe word:"; print"\n@chars\n"; MainLoop;
    ... with activePerl ...
    # aperl -v This is perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2001, Larry Wall Binary build 633 provided by ActiveState Corp. http://www.ActiveState. +com Built 21:33:05 Jun 17 2002
    ... on XP did it, at least i think, as you want.

    # aperl textwidget $result: 1.4 $start: 1.6 The word: until
    But are sure whether it's a good idea to do text/string-orientated things with widget-methods. Perhaps it would be easier and more efficient to try approaches with regexes or split.

    greetings, tos

      Hi, thanks a lot for the reply
      actually I am working with text in another language,
      not English text.I gave that example in my post just to
      state the problem(or so I thought).I had not at all tested
      with english text.

      In fact my code as well works with the english text file:

      x.txt try **until **you **succeed

      but strangely does not work for my text file which looks
      like this, if the display is set to an english font:

      y.txt ]] **]X] **]E] ]]

      In this case only the first character next to the 2nd
      asterisk is printed:

      $result: 1.4 $start: 1.6 The word: ]

      I need the whole word to be printed that is:


      What I then need to do is change the font color of
      the word so that it stands out in the widget window,
      and delete the **'s.
      I need to present the text in the Tk widget as a user
      interface.The user clicks a button, certain words get
      marked by the asterisks.But instead of the asterisks,
      I now want to change the font color of the words instead.

      I'm not sure how I can search within a widget window
      using regexps.

      here's my actual code, with a whole lot of it not relevant
      to this question omitted.

      #!/usr/local/bin/perl -w use Tk; #use strict; our($filename, $info); #Variables to be used in the su +bs &load_file and &save_file. #$filename stores the file name + typed by the user in the entry widget of the main window. #$info stores the text message +displayed at the bottom of the main window. my $mw = MainWindow->new; # Main window. # Create necessary widgets. my $f = $mw->Frame->pack(-side => 'top', -fill => 'x'); + #Create frame. $f->Label(-text => "Filename:")->pack(-side => 'left', -anchor => 'w') +; # Label widget. $f->Entry(-textvariable => \$filename)->pack(-side => 'left', -anchor +=> 'w', -fill => 'x', -expand => 1); # Entry widget. + #Button widgets. $f->Button(-text => "Sug", -command =>\&tged)->pack(-side => 'right'); + $f->Button(-text => "Save", -command => \&save_file)->pack(-side => 'r +ight', -anchor => 'e'); $f->Button(-text => "Load", -command => \&load_file)->pack(-side => 'r +ight', -anchor => 'e'); $f->Button(-text => "Det",-command => \&chfile )->pack(-side => 'right +', -anchor => 'e'); $f->Button(-text => "Show",-command => \&load_file)->pack(-side => 'ri +ght', -anchor => 'e'); $f->Button(-text => "Add?",-command => \&addit)->pack(-side => 'right' +, -anchor => 'e'); $mw->Label(-textvariable => \$info, -relief => 'ridge')->pack(-side => + 'bottom', -fill => 'x'); # Label widget. #Text widget. + my $t = $mw->Scrolled("Text",-font=>"{as-ttdurga} 24 {bold}")->pack(-s +ide => 'bottom', -fill => 'both', -expand => 1); MainLoop; + sub chfile { open LEX, 'soundex.txt' or die $!; my %lexicon; while(<LEX>){ chomp; my @words =split; @lexicon{@words} = (1) x @words; } close LEX; open FILE, "+<$filename" or die $!; my @missing; my @data; while(<FILE>){ push @missing, grep { ! $lexicon{$_} } split; push @data,split; } #print @missing; #print "\n@data"; seek FILE, 0, 0; # go to start of file truncate FILE, 0; foreach $ditem(@data){ $_ = process($_); print FILE; print FILE " "; foreach $mitem(@missing) { my $currentposition=tell FILE; seek FILE,0,1 ; if ($mitem eq $ditem){ syswrite FILE,"**",4 ; #print"\n$mitem"; $result=$t->search(-forwards,"**",'end'); # print"\nThe start pos of the pattern:"; # print "\n$result"; $start=$result+.2; #print "\n$start"; @chars=$t->get("$start","$start wordend"); print "\nThe word:"; print"\n@chars"; #$t->tagConfigure("wrong",-foreground=>"red"); #$t->tagAdd("wrong","$start","$start wordend"); } } } close FILE; sub process { return($ditem); } } 1;
      What is the alternative to using the 'wordend' index? That does not seem to work here. Thanx
        but strangely does not work for my text file which looks like this, if the display is set to an english font:
        y.txt ]] **]X] **]E] ]]
        Hmmm. Characters like these can cause unforseen behavior, because it's not ever clear whether they are taken as ESC-sequences or meta-characters.

        I googled a little with your as-ttdurga-font and saw that this is a kind of indic(assamese)-script-font. I like it's pretty appearance, but i wonder whether it can be fully represented by 8-Bit-ASCII-Code.

        In your code the font-declaration seems to be insignificant. This was my impression when i tested it.

        Do you actual see the desired font in your Tk-script ?

        I could imagine that you have to turn your line of thought in direction unicode. Therefore it's a good idea to use at least a perl-version >=5.6.1, i think.

        greetings, tos