OCR-decisions testing system by MIL Team

3 min readApr 13, 2021

Many Computer Vision tasks are focused on extracting and detecting objects on an image, and some of our projects are also related to this task.

The latest project by our team is the Optical Character Recognition (OCR) — decisions testing system. OCR — decision consists of the set of methods and models for image text detection and character recognition. This system output is the list of quad coordinates (boxes) hedging the words and the text inside them.

By the end of the article you will know:

The reason why building OCR — decisions testing system is not a trivial matter.
The disadvantages of the existing testing systems.
What decision our team proposes.

What’s wrong with existing libraries?

For counting most of the metrics you should find a match between predicted and correct boxes. Sounds easy, so, what’s the catch?

Usually, object correspondence is determined by IOU measure: the box is thought to be correctly predicted when the IOU measure exceeds the set threshold. The most popular Python library aimed for detection metrics estimation is Object Detection Metrics. Let’s point its main disadvantages:

Recognizes only the text boxes parallel to coordinate axes, in other words, doesn’t recognize oblique text.
Uses quadratic pair search which works slowly in case of fair IOU counting and the presence of several hundred objects on the picture, i.e text documents.
Doesn’t account for False Negative results which increases the counting by 2 times.

What do we suggest?

We suggest creating an engine that will find pairs of predicted and correct boxes and have the existing library’s properties taking the disadvantages into account. For this aim, we use two main instruments:

Lib Shapely for honest IOU counting. It has a user-friendly interface and it allows to work with arbitrary convex polygons.
KD-tree method implemented in sklearn for searching the point nearest to the given.

With the help of these instruments, we suggest implementing the following algorithm:

Counting the centres of all predicted boxes and add them to KD-tree.
Finding several neighbours predicted centres for every centre of the right box (5 is more than enough as the practice has shown).
Looking at the closest neighbours for every centre of the right box and point the neighbour with the highest IOU, exceeding the threshold, as the pair of the centre.

Why does this method work?

It helps to find the best pair as it comes as a proper heuristic which is excellent for the case when predicted boxes don’t cross.
The same predicted box can’t go in a pair with the several correct ones when the threshold > 0.5. This directly goes from the IOU definition and meets our expectations. As for the case when the threshold < 0.5, changing matching algorithms would be the better option.

All in all: the algorithm by MIL Team works on the text documents consisting of several hundreds of the words spending less than a second whereas quadratic search could take up to half a minute.

We hope this method will be useful for your future research projects. Wishing you quick algorithms and qualitative models!

OCR-decisions testing system by MIL Team

What’s wrong with existing libraries?

What do we suggest?

Why does this method work?

Written by Machine Intelligence Laboratory