Mar 5, 2016

Document Layout Analysis

An important part of any document recognition system is detection and correction of skew in the image of a page. Page layout analysis and preprocessing operations used for character recognition depend on an upright image or, at least, knowledge of the angle of skew. One example of a process which is spoilt by skew is the use of horizontal and vertical projection profiles. Projection profiles have many applications in document image processing and they rely on horizontal and vertical lines being aligned to the axes.

Given a document image, we will try to analyze its structural layout. We will try to come up with the given document image with its (1)Individual characters boxed, (2) Individual words boxed, (3) Lines boxed, (4) Paragraphs boxed, and (5) the paragraphs with margins boxed.
LOAD AN IMAGE AND CONVERT TO GRAYSCALE:
The very basic operation in image preprocessing is loading and converting the image into grayscale.
     CODE:

USE PROJECTION PROFILE:


CODE:
HORIZONTAL PROJECTION:
CODE:
VERTICAL PROJECTION:


Horizontal projection is used to identify text lines and paragraphs while vertical projection is used to detect columns in a document image. The pseudocode and the output for horizontal and vertical projection of the given document image are shown above.

EXTRACTING INDIVIDUAL CHARACTERS:
After loading and converting the document image in grayscale, we have to get a good threshold for our image. In my case, I used binary inverse to obtain the threshold using otsu method.
CODE:


 Approach that I used to detect the text blocks:
1. Converted the image to grayscale
 2. Applied threshold (simple binary threshold, with a handpicked value of 150 as the threshold value)
3. Applied dilation to thicken lines in image, leading to more compact objects and less white space fragments. Used a high value for number of iterations, so dilation is very heavy (13 iterations, also handpicked for optimal results).
4. Identified contours of objects in resulted image using opencv findContours function.
5. Drew a bounding box (rectangle) circumscribing each contoured object - each of them frames a block of text.
6. Optionally discarded areas that are unlikely to be the object you are searching for (e.g. text blocks) given their size, as the algorithm above can also find intersecting or nested objects (like the entire top area for the first card) some of which could be uninteresting for your purposes.

1 comment: