Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Does OCR::PerfectCR work at all?

by saintmike (Vicar)
on Feb 11, 2007 at 01:19 UTC ( [id://599412]=perlquestion: print w/replies, xml ) Need Help??

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

Just poked around CPAN to find a module for a simple OCR (optical character recognition) task. I stumbled across OCR::PerfectCR, written by fellow perlmonk theorbtwo.

I realized that this module would only solve easy tasks, so I gave it a simple image containing a single character. The documentation is a bit sketchy (what's the format of a character map?), so I dumped out the result with Data::Dumper:

use OCR::PerfectCR; use GD; use Data::Dumper; my($file) = @ARGV; die "No file" unless defined $file; my $recognizer = OCR::PerfectCR->new; $recognizer->load_charmap_file("charmap"); my $image = GD::Image->new($file) or die; my(@out) = $recognizer->recognize($image); print Dumper(\@out);
which printed the following:
'width' => 18, 'str' => "\x{fffd}", 'endcol' => 18, 'startcol' => 0, 'color' => '163', 'bgrgb' => [ 148, 202, 212 ], 'chrwidth' => '18', 'md5' => '9454596172a32923111c25373a801472', 'prespace' => 0
Hmm. So where's the result? Is it really "\x{fffd}"? Alternatively, can anyone recommend a solution that works?

As previously discussed, there's issues with getting proper OCR, but if you know of anything that actually works, please let me know.

Replies are listed 'Best First'.
Re: Does OCR::PerfectCR work at all?
by eric256 (Parson) on Feb 11, 2007 at 05:48 UTC

    Looking at the source it provides a char map with the md5 checksum followed by the letter. Perhahps you need to add this md5 sum and '3' to the file then try agian? Thereby training it on your font? Just a guess based on what i'm seeing. It's tests use the phrase "about it" and the charmap file only contains entries for those letters. Good luck.

    Update: In fact that is exactly what to do. I just did it and it worked like a charm! ;)


    ___________
    Eric Hodges
      Really? Try it with this other image that also contains the single character "3". It comes up with a new checksum (4ca8f9278145f31c1999d5bb659bc493) which is totally different.

        I think if you reread the POD you might go a little easier on the module.. "OCR::PerfectCR requires that your input is in perfect shape -- that it hasn't gone into the real world and been scanned, that each image represent one line of text, and nothing else, and most difficultly, that the font have a fairly wide spacing. This makes it very useful for converting image-based subtitle formats to text, and probably not much else. However, it is very good at doing that."


        ___________
        Eric Hodges

        I think theorbtwo's module only recognizes exactly identical images/letters. Your two images are not identical and hence it won't recognize them as identical.

Re: Does OCR::PerfectCR work at all?
by theorbtwo (Prior) on Feb 12, 2007 at 01:31 UTC

    The charmap format isn't documented because there are both load and save functions, and trying to recognize unknown characters will automatically add a stub to the charmap in memory. Load the charmap (a blank file will work fine), try to recognize some text, then save the charmap. Load it up in a decent text editor, and modify it. Then you can repeat to taste.

    This should quite likely be clearer in the POD -- suggestions welcome. I'm not terribly good at forgetting what I know about the module in order to write the documentation for it. (And, alas, I'm not terribly good at follow-through -- I tend to get stuff working well enough for me and then stop working on it. This time I managed to get a little bit further, and do some cleanup, release, and then stop working on it.

    PS -- /msging me would have been helpful -- fortunately, brother Limbic~Region noticed this and /msged me.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Does OCR::PerfectCR work at all?
by dpavlin (Friar) on Feb 11, 2007 at 12:47 UTC
    AFAIK, best OCR available under open source licenses is Tesseract which might fit your bill if you wrap if with some perl code (it's a simple command-line utility).

    Update: fixed URI to point to googlecode where current project page is. There is also great documentation site if you want to know more.


    2share!2flame...
Re: Does OCR::PerfectCR work at all?
by dirving (Friar) on Feb 11, 2007 at 20:27 UTC

    When I needed to do some simple character recognition from perl I just shelled out to gocr, which worked well enough for my purposes.

    -- David Irving

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://599412]
Approved by GrandFather
Front-paged by moklevat
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2024-03-28 12:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found