Original BrickPi Bookreader: Installing the OCR Engine

Original BrickPi Bookreader

In this project we make an ebook reading robot with the BrickPi.

8. Installing the OCR Engine

The OCR (Optical Character Recognition) engine converts the image file we take of the book into text. We are using Tesseract OCR Engine. It runs well on the Raspberry Pi, it does not require an online connection, and it reliably converts images to text.

First, install Tesseract:

sudo apt-get install tesseract-ocr

Next, test the OCR engine.

Take a good image of a piece of text in a Book and run Tesseract:

tesseract image.jpg o

where image.jpg is the image which was taken by the Raspberry Pi Camera and o is the file in which the text will be saved (Tesseract will make it o.txt so no need to add the extension).

Now, wait a few minutes, the OCR takes a lot of processing power.

When it is done processing, open o.txt. In our experience, the recognition was >90% and works better with larger font size. If the OCR did not detect anything at all, try rotating the image and running Tesseract again.

Next: 9. Building the LEGO Platform