Introducing Tesseract
Tesseract, an open source OCR engine maintained by Google. Lets determine the performance of the Tesseract OCR Engine on different document images.First, install tesseract along with image processing toolkit Leptonica on Ubuntu.
We will run Tesseract from command line as shown below.
tesseract image.png output
Here -
tesseract
- is the command.image.png
- is the path to the image on which we are running OCR. I am assuming thatimage.png
is inpwd
.output
- The output will be stored in an image text file named
By default
1. Machine-printed documentsoutput.txt
will be stored in the current directory.■ Font: Serif and Sans Serif (3 each)
Figure1: Serif |
Invictus
Ivy
William Ernest Henley (1349-1903)
pm of me night that covers me.
Black as me Pit from pole m pole.
miank whatever gods may be
For my nnconqnerable soul.
In «he fell clutch of circnmshance
1 have not winced nor cried aloud.
Under me blndgeonings of chance
My head is bloody, but nnbowed.
Beyond ans place of wraui and «ears
Looms but me Honor of the shade.
And yet «he menace of me years
Finds. and shall find. me unafraid.
It maflers not how strait the gate.
How charged with pnnishmems me scroll.
1 am uie maswr of my faw:
1 am «he caphain of my soul.
Figure 2: Sans Serif |
Invictus
IVY
W Ialn Ernest Henley (1349-1903)
out of the night that covers me,
Black as the Fit from pole to pole,
I thank whatever gods may be
For my unconquerable soul.
in the fell clutch ofcircumstance
I have not winced nor cried aloud.
Under the bludgeonings of chance
My head is bloody, but unbowed.
Beyond this place of wrath and tears
Looms but the Honor ofthe shade,
And yet the menace of the years
Ends, and shall find, me unafraid.
It matters not how strait the gate,
How charged with punishments the scroll.
I am the master of my fate:
lam the captain of my soul.
Lets try another example:
Figure 3: Serif |
SHORT RIDDLE
What is gteahar Lhan God,
more evil Lhan Lhe devil,
me poor have it.
me rich need it.
and if you eat it, you'll die?
Figure 4: Sans Serif |
SHORT RIDDLE
What is greater man God,
more evil than me devil.
me poor have it.
me rich need it.
and if you eat it, you'll die?
Figure 4 (Sans Serif font) has better ocr result than Figure 3. Lets take another example.
Figure 5: Serif |
Output in OCR:
Stress management strategy: Avoid unnecessary stress
Not all stress can be avoided. and it's not healthy to avoid a situation that needs to be addressed.
You may be surprised, however. by the number of stressors in your life that you can eliminate.
Learn how to say “mi” — Know your limits and stick to them. whether in your personal or
professional life, taking on more than you can handle is a surerire recipe for stress.
Avoid people who stress you out — If someone consistently causes stress in your life and
you can't turn the relationship around. limit the amount or time you spend with that
person or end the relationship entirely.
Take control of your environment — If the evening news makes you anxious, tuni the TV
off. If tramcs got you tense. take a longer but less—Lraveled route. If going to the market
is an unpleasant chore, do your grocery shopping online.
Avoid hot-button topics — If you get upset over religion or politics, cross them off your
conversation list. If you repeatedly argue about the same subject with the same people,
stop bringing it up or excuse yourselrwhen it's the topic ordiscussion.
Pare down your to-do list — Analyze your schedule, responsibilities, and daily tasks. If
you've got too much on your plate. distinguish between the “shoulds" and the “musts."
Dmp tasks that aren't truly necessary to the bottom orthe list or eliminate them entirely.
Figure 6: Sans Serif |
Stress management strategy: Avood unnecessary stre
ss
Neo aH slvess can be avmded, ane olfls neo neaoony oe avmd a solaaloan onao neeesoe be a
ddvessed vea may be savpnsed, hawevev, by one nambevm slvessms on yam noe onao y
on can ehrmnale
Leam new oe say "nu" — Knew yam nmos ane slockla onem Whelhev on yam pevs
enao av eveoessoenao noe, Iakmg en more onan yea ean naneoe os a savewe veeoeeo
av SIVESS
Avaid people wne sooess yea eao — no semeene oansoslermy eaases slress on yam
noe ane yea canllllam one veoaooensnoe amand, hrml one ameanoeoonne yea seene
wow onao pevsmo av ene one veoaooensnoe enoneoy
Take eemeo myauvenvivanmenl — no one evemng news makes yea anxoaas, lam
one TV eoo ooovaonens gal yea Iense, lake a oengev bal oess-ovayeoee male oogemg
oe one mavkel osan aneoeasanoeneve, ee yaavgvacevy sneeemg ennne
Avaid hat-human Iapics — no yea gel apsel avev vehgoan av pahlocs, cvass onem eoo y
aavoanvevsaloan nso no yea vepealecfly argue abaal one same saeoeeowoon one sa
me eeeeoe, slap brmgmg ol up av excuse yeavseoo wnen oons one Iapoc mdoscassoa
n
Pare dawn yauv Ia-da oiso — Anaoyze yam seneeaoe, vespansobohloes, ane eany Iask
s no yeanye gallaa maen en yaav eoaoeeosoongaosn belween one “shaa\ds“ ane on
e “masls “ Dmplaskslhal aven|]||va\y necessavyla one ballam eo one nso av ehmm
aoe onem enIwe\y
Surprisingly, in this example Figure 5 (document with Serif font) yields a better ocr result.
■ Font: decorative style and script type style
Figure 7: Decorative Style |
ABCDEFGPHJKLMNOP
QRSTUVVVXYZabcdefg
hijklmnopqrstuvwxyz
0123456789
.,=;"’!?@#$%a*{(/I \)}
Figure 8: Script Style |
Creamy Script
In this particular, Figure 8 has the best result perhaps the figure contains constrained alphabet.■ Layout: Left-aligned text (3 serif, 3 sans serif)
Figure 9: Serif, Left-aligned text |
oaober 26, 2006
Ms. Glenna wngnt
l-lnman llesonrpes Manager
Fashion Department store
2000 me nme
Fmrfnx, VA 22030
Dear Ms. Wnght:
l enjoyed mtennemng mm yon dunng yonrrecrnrnng vlsll. to Vlxglnla Tech on October 25. The
management tmlnee program you ontlmecl sounds both cnallengmg and mwanilng and l look
forward to your decision oonoermng an on—slI.e vrslt.
M mentloned dunng tne mterylew, l wl.I.l be graclnanng In Deoemberwllh a laacnelor's degree In
Fashion lvlercnandrsmg. Through my eclncanon and expenenoe rye gamecl many skllls, ts well ts
an nnderstanmng of retarlmg concepts and deahng mm the general pllhllc. l have worked seven
years In the retarl lndllstry In vanons posltlons crom Salesderk to Assmtant Department Manager:
l tlnnk my eclncanon and work expenenoe would oomplement Fashlon's management tramee
program.
l have endosed a oopy of my oollege transcnpl. and a llst of references Lhat yon regnestecl.
Thank you agam for the opportnmty to rnterylew mm Fashion Department store. The rnterylew
seryecl to remtorpe my stmng mterest m becoming a part of your management team. l can be
reached at (540) 555—1111 or by emall at boles@vL.edI| should you need addluonal lnfoxmauon.
smoerely
lvlananne Boles
Figure 10: Sans Serif, Left-aligned text |
Output in ocr:
October 25. 2006
Ms. Glenna wngnt
Human Resolute; Manager
Fashion Department store
2000 une Dnve
Fairfax. VA 22030
near us. wrrgnt:
I enjoyed rnterrnewrng wrtn you durrng your recruiting yrsrt to Inrgr Ia teen on October 25. the
management trarnee program you outIrned sounds both (hallenglng and rewa rdrng and I look
forward to your de(IsIon (on(emIng an on-site rnsrt.
As menuoned durrng the rntennew. I wI|| be graduatrng In De(em ber wrtn a Bachelors degree In
Fashion Mer(handIsI ng. nuougn my edI|(alIon and expenenre I-ye garned many skllls. as wen as
an understandrng of reta ng concepts and deaIrng wrtn the generaI pIIh|I(. I have worked seven
years In the retarI Industry In vanous posrtrons from Salesderk to Assrstant Department Manager I
lhlnk my eduoatron and work expenenre would (omplemenl rastuon-s management trarnee
program.
I have endosed a (opy of my (oIIege lrans(npI and a Irst of refereltes that you requested.
Thank you agarn rortne opportunrty to ntennew wrtn Fashion Department store. the rntennew
served to lelnfolte my strong Interest In be(omIng a parlof your management team. I can be
reached at Is4oI 5554111 or by emarI at bo|es@vl.edu should you need addrtronaI Information.
SIn(ere|y.
Mana nne aoles
In this example, Figure 10 (w/ Sans Serif font) yields a better ocr result. Lets try another example.
Figure 11: Serif, Left-aligned text |
Output in ocr:
Reflection:
I have realized that learning is not just a product of the
learners cognitive skills but it also a collaboration of the learner.
wacher (faciljhamr of learning) and the learning environment. For
eirective teaching m occur. wachers don't just mach what to
learn. rather teachers teach learners how to learn. Leamer is
mainly involve in the process so he/she must be intrinsically and
extrinsically motivated to learn. For that to happen, wacher has
no teach eirectively and student should respond enthusiastically
and the learning environment should be conducive enough for
learning.
The teacher has no apply waching principles for effective
learning m happen. Moreover the teacher should used different
waching strategies, instructional materials that will help motivate
learners to learn. In addition, the wacher has no begin with an
end in mind. meaning m say, he/she has to have a learning
objectives that he/she has no accomplish throughout the class
discussion and in order m measure how much the learners learn.
the teacher has to used no used appropriate assessment methods
and give feedbacks or follow—t.hmng}l afterwards.
Figure 12: Sans Serif, Left-aligned text |
Output in ocr:
Rellectioll:
I have realized that learning is not just a product of the
learners cognitive skills but it also a collaboration of the learner,
teacher (facilitator of learning) and the learning environment. For
effective teaching to occur, teachers don't just teach what to
learn, rather teachers teach learners how to learn. Learner is
mainly involve in the process so he/she must be intrinsically and
extrinsically motivated to learn. For that to happen, teacher has to
teach effectively and student should respond enthusiastically and
the learning environment should be conducive enough for
learning.
The teacher has to apply teaching principles for effective
learning to happen. Moreoverthe teacher should used different
teaching strategies, instructional materials that will help motivate
learners to learn. in addition, the teacher has to pegin with an end
in mind, meaning to say, he/she has to have a learning objectives
that he/she has to accomplish throughout the class discussion and
in orderto measure how much the learners learn, the teacher has
to used to used appropriate assessment methods and give
feedbacks orfo||ow—through alterwards.
As you can see above, Figure 11 and Figure 12 almost yield the same result except that there are minor errors like in Figure 12 instead of "Reflection:" what was recognized by tesseract is "Refelctioll:" Moreover, Figure 12 has minimal errors than Figure 11. Lets examine another example.
Output in ocr:
Figure 13: Serif, Left-aligned text |
1...»... r.....a.:..
1...m.g .5 . .......s.., .....:..:......a
......z; .5»... -—....m; ..5 .5...5........5
m-9.1. .....mm. ..5.5 ....a .,..:5
5......»........5ny ....a n...my n.. mm-q
5....u mt: mm.» 5.. ..n.n...nn.5 ....
...... ....m.g mm. ...5.. .n...... ...a
....... .m.....z. by hdnmq ..5 ....... u..
............5 am 5...»-an 3...... kn-nun!
...a .............. u.. _.a s... ....5...g
..........:5 .........L ...a p-u....5 wmk
...-n:.......mg u..5. um.-.u.5 -........5 .
.................z .. u... ....a .n..n. n .2»...
5...5 u... ...a ....nn 1.... ....
215...... ....:..g ......»... ......mn
..;..... ......x.an- um .......... ...a
...m an ....,..x..a.- ..m...... .... .......
am. ...a .n......... mm;
wn... .......».....a....m..m......u..
..........g .......;. m..n....5 u.. .........z A
.....zy .1 ma... ............5...5 ....
Ilia! 1......-.g Er .......u., m..a....5v
...n....: ....a g............; mchwumas
...n....... .....u.., 5. u.. IWYH.
dnuyhmry n..mm......5 1...: g........5 ..
..........y. w..u....5 .. .m...m ways. ...a
5....:...z5' pm ......:.age um). ..........
...a ............ -sue..5. 5...... .....
1...-mg Althm-gh ..........n .a.......ny
.....5.... .n .1 u..5. ............5...5
g.u.....g u.. ....5. ..;...... ..a............. .5
..ny.5 w55.m. .. .....5. ..:......-g ....a
...........u..; c. a. 5.. mm; a. 5.....5...
.... 4.» ...s..... ..,...5. .a.5.;.. L...
.a...5.....5 .;.,... ..u,..~....5 mm
....-me: :......n. an »..m ....I.... ma.
.ma...n..5 LL11. .a.....n....... ..:............
....5........m...5., ....a 4.; gm.
...5...........: ..a..m....5 LL11. xsngmmn
..m.. .....a s... ..a......: .........»
215...... ....n..g ......»... Ihnmnn u.
m... ..-1... ...m......... .1 ............
1......-.1 nu;-cu... ._......5, ....:
._..........: ..........
mu-.m.............a..u..5.-..:......5...5......
...u.....a...n....5...:............5.
1...»--1 .5 m. .45.... ....a ma...
;........g .5...........a ..y.... 4...... .5
...m......5 .n....:.... . .1... 5.. .1 1....»-g
..u,su..5.... u.. ...,..:...,.. ....a 5.1.5 um
....x.... g...:....5 .. .a............. by u..
....a .1. .....5.., m u.. ...5«...........;
.........5 Lu. .... m..a.5 :.:.5
.a5...55....5 ....n...5» 5mm ".5. ;......--g
..u,s....5 ny w.,...n--1 gm..........a
w..,.., ....a ..m.. .55.55.....5 Lu]. ...5.5
n-n..5. mm... 5.5. .»..-:...........5» mu...
w.u.—m..m.5 rm g........5 .. .a.......m....
...a w.......... ...,..:.a.,. ...a 5..n5
.n....:.....a .. u.. .m......5, ...a s...
...m.......5 .. nfler ....,.....a ram... um
.... am... am... ;...-.....g
25...... .........1 ......»... ............n
...u... ...-.......... .......n--.. :.......n
..u,....... ...a m....
n... .5 .....m.; .......... .. .1... .5
mm... .1 g...:....5 .....55 ..........
.:.5;......5 ....a ..... -mum .g-....
.5...-nu... Fm ........:., m... ...m.....5
...a.... ....y wmkrqxudy .....55 ......5.5.
.......5n......55.m. ...n.:........ .. .....
..,...5. ....u :. .....5.a...a .........g ..
....m... .5 ...5.:., g........5v ...~......5
....y ...n ......y. .,...5 m5 n.--1 .1... .;.....
....x .............5 ...a ....................g u.....
...:....:y :..n..5 m..n....5 1...-.. ...... ...a
nu.-1...... :....... .......:.....g .7... ;......--g
..u,su..5.... u.. ...,..:...,.. ....a 5.1.5 um
....x.... g...:....5 .. .a............. by u..
....a .1. ......5.» .;.-.5 m..a....5 ..:... urge!
.......:.......a....:.:.5u..................m....
w-uu.55 flung u.. ...y s.....:.n; mg
..m.... mm ......5. .m....5 Lu]. .. .;.55
nu...-n....... hymn ..5., ....a 1....
.55-;u---...m .. u.. syn.:...5 ...a .. .:.55
.n....5 ..5 .. ....».. .m.......5 urly ....a
.......5 .5 ..a.... ....n...5 ...a ..u5.....5 am
....y ...5. Anna... my .x.:...u...5 ..
. uni! ..n.a....... ;........g ................t s...
.n m..a....5
Figure 14: Sans Serif, Left-aligned text |
Output in ocr:
aggml-u -uaaaugg
l’ea(vIvI1 as a mmrlu. maaaarasgagaa
asaaaaamaangaa aguaaaaaaq as as aasaaaamaas aaa
n-sag maanaug aasas am gnals
samaaaaaagaaasay am ihuhly mg muaaaaaaa-,1
smau mu pnwgam 5:! rflrmrrlrls gaaa
maag agaaaaag mm maag diam»: am
maaag gmgagax. ray rag:-na as gagaag mg
maaamaas ma! sax-nun mmgm agamam
am maaaamaag mg aamaa fwrcvlilll
makulals. (nmux. am riding wmag
Imrlanemnq mgsg rmnclrls mums a
(nmmnmux In aamg aaaaa gnaaag aaamgaa
sans aamgaaaaa gaaaaay uaa. aaa
Ensmwg ta-atnng Irmivs am-aaaaaq
rdmlan aaaamgagg mint saaaagaaas am
aasan; mag huwneflge m amaaam aaa. magsg
aagsaqaa aaaaa (lasmwn rgamaaa;
Wrul w:ma(h,w:dunm|\m\na(h!I:
(nmux, aaag agam saaaagags mg maggaaa A
aaaaagxy ti saaaagag maaasagaasaags (an angm
agamaaaq ma gaampng. 1\Iflux§'(ull\lal
aaaa qgagaaaaaaaaaa hadramwms amaaagaagg
haw mgy sgg mg vmvld. asganaaaavy
gack-aaaaaaaaaas agaaa smaagms m amaaaaam
naaagagms In rfllaux ways. and saaaaaaas
nmaa aaaaamngaa: mm. asgagaag ama
Ina((\lax< asnans) mans aagaaa agamam
ummam aaag gamam meumrly mgasagg
an fl mag maaasagaasaags, qamgaaaag mg
mun agagaaaaa amaaaaaaaamaaa as gany as
ngssagag aaamaaasg rlaulvln aaaaa maaaaaaaaaaa
mm saa mllllj mg sgaamga gaaa ¢a)
amaaam magsg mam Lu. auasagas aggaaz
ohxsmvgs. hating. gxamngs vaaaaaaau. mu
radra gxaaaaa maaagag rmfinlns ¢gaa,
aagmmgaaaaaaa fl mmmaa. masmaumaasx.
aaaa m gang aamaamaaaaa adatllannrs
4:41.“ lxlnmlnnu mag aaggaa fwafiunnal
hva( gg
Ensmwg ta-atnng aamaaaas align"; mg
mgg mam (wllrllvuns If Irdrudiuvlz
agamaaaq mlathvs, amsaagaaxs am
aasmamaaga aamwams
l’akIvI1 mg aamg aaa aaaa ms urflmm sans aamg
In mg gaaaa aaaaa agaaas aa a gguga magsg
Eaanna as maag dlemvg am ymrlux
agaaama as gamaaggaa aamgaa ¢a: wg. as
aasmgmas atnnlalt: agagaa sga amgaam-1
magmas (LL mg kaaaaaaagagg aaaaa saaaus ma!
aaag gamgga mnuns m aagmaaaasaaaag hy mg
gaaaa fl a maasgx. my mg aaasmamaaaaa
a(nvm6 Lu]. gasg maria, nars.
asgassaaaas. agaaaaaaas) mm mag agaaaam
magmgs hy namaam-1 walmvuullm
I~a(n(=. and m mg amgmgaxs Lu]. mas.
par-as ramhlun s:Is.rK1fuma(5)IIl\1v|\‘k
manna-aangs vaaamaaagms aa aaaaaaamaaag
and x-amggug aaaaaaaaaagmag aaaaa saaaus
amgaaaagaa aaa mg Whatnvs. am fnv
aasmgmas aaa amga aaaggagaa iumack man
(an Dime iwmu agaaaaaaq
Blgmwg rgmaaaaa aamaaags amnlahrll
smut gmxzamaas agaaaaaaaq ngaaaaaq
magmaas am paaaagaas
mgag as amazaaaaa vanannu In vma! as
srnstrai vi saaaagaaas aamss Amultan
(laslmms am gagaa wumn a azvgaa
risnrllv: ma gxaaaaaug. vma! (nvsnunfi
gaaaagagg my me qagany asm (01:55.
what as Dwmlsshl: muagaaaamaaa In aag
magsg mam gg masaaagagaa mgaaaaaag In
aaamga Asa agsaag mantras’ gmgnamaas
may aaaaamaaam mus mas ggaaq (lea: amam
aaaaa amgmamaaas and (wnmun(3nvI1 mgaa
srllmiy vans maaagms agam maaag am
ngmaam hgua Amulannq mu agaaaaaaa
magmas (LL mg kaaaaaaagagg aaaaa saaaus ma!
aaag gamgga mnuns m aagmaaaasaaaag hy mg
gaaaa ma magsgu gm saaaaagms agagaa aaaggg
aaa aam fwand mans man aaa maaaaaaaa mga
um-aagss aamg mg way smamy. raga-1
atrium agam maaaag minus Lu]. nu aaass
namgapaxaaaaa uanw nag, am aaag
aslmmmll In mg Syllabus am In snag
auaaaaas as aaa msdv: anggaaggs gaany am
agaaaas mrmute mmms aaaaa agaasaaaas that
may aaasg um-1<mga,ggaaq gzna 2 agaaas aa
a mag wnnutnve agaaaaaaaa umlmmem iv:
an saagagags
Ooops!! Figure 13 has the worse result than Figure 14 perhaps the tesseract ocr has the difficulty recognizing smaller fonts. Lets look at another example.
■ Layout: Justified text (3 serif and 3 sans serif)Figure 15: Serif, Justified text |
Reflection:
I have realized that learning is not just a product of the
learners cognitive skills but it also a collaboration of the learner.
wacher (faciljmtor of learning) and the learning environment. For
eirective waching to occur, wachers don't just teach what to
learn. rather wachers teach learners how to learn. Learner is
mainly involve in the pmcess so he/she must be intrinsically and
extrinsically motivated m learn. For that m happen, wacher has
no teach eirectively and student should respond enthusiastically
and the learning environment should be conducive enough for
learning.
The teacher has to apply teaching principles for effective
learning m happen. Moreover the wacher should used different
waching strategies. instructional mawrials that will help motivate
learners to learn. In addition, the teacher has no begin with an
end in mind, meaning no say, he/she has m have a learning
objectives that he/she has to accomplish throughout the class
discussion and in order m measure how much the learners leam.
the teacher has no used no used appmpriaw assessment methods
and give feedbacks or followfllmngh afterwards.
Figure 16: Sans Serif, Justified text |
Output in ocr:
Rellectioll:
I have realized that learning is not just a product of the
learners cognitive skills but it also a collaboration of the learner,
teacher (facilitator of learning) and the learning environment. For
effective teaching to occur, teachers don't just teach what to
learn, rather teachers teach learners how to learn. Learner is
mainly involve in the process so he/she must be intrinsically and
extrinsically motivated to learn. For that to happen, teacher has to
teach effectively and student should respond enthusiastically and
the learning environment should be conducive enough for
learning.
The teacher has to apply teaching principles for effective
learning to happen. Moreover the teacher should used different
teaching strategies, instructional materials that will help motivate
learners to learn. in addition, the teacher has to pegin with an end
in mind, meaning to say, he/she has to have a learning objectives
that he/she has to accomplish throughout the class discussion and
in order to measure how much the learners learn, the teacher has
to used to used appropriate assessment methods and give
feedbacks orfo||ow—through alterwards.
In this example, Figure 16 (Sans Serif, Justified text) yields better ocr result than Figure 15. Lets try another example.
Figure 17: Serif, Justified text |
Output in ocr:
Skphm Cm:y’5 "Fig Racks” rm.“ 3 . m mm Thing mm mm: rmddlz al.: Vemm.u an um: rmuugemem Aemlh Covey m km mm m Thmp Fun mm. um “um .n um: Fm .q.m R:xhu\gundenh:uN: he pulled rmmhed g.n||m\J.u ma 1:nIm\Ih:uMz|\exIm.: p|.m¢A coveted wnh fm nzed Leda “How many ollhgm mm dn ymnhmk we mu gel mm: }.n'7“ A: uked mg mndmre Ana; mg uudeun made an gugua mg Vemm.u mm um “um um rm am “ He pm on: ma mm: ,.. u.¢..mu.. u.¢.....m.¢.,u.u.| M rum: Leda would 1“ mu A: mm ~~u mg ,.. mm" Evenyhody mum we um mm nsmeohhg mm would 1“ mlhgy mm “Ya “ “Nm mtm wm...um.1 Fmm um. mg um he um mu . mm olgmvel dumped H mm: ,.. ma mm H m g;.nve|Vhd mmmfllhg um Vpnu ma 5,» mg mg mm Gmuung mg Vemm.u mm mm mm: rum: ‘1m.¢,.. mm“ A um wuen By now me uudeun Aexpmuded ‘ mm M “ “Goad “ mg umchu mm mm .m.¢.n....2. me until: m hung up: kynrkglolmnd He umed dumpu\g mg mm m mg ,.. \vm|¢m¢ uudenu wma me mad filled m um um Vpxex ma 5,» u.mm....1 guvel om rum: A: mm 4: an mu ma mm “New u an ,.. mm“ “Na “ evenymn Vhmned ma “Good” mam Vemuun mm who um. gmhbd . pndlx mm. ma ksegmua pmn H mm: ,.. He gm mmeIhu\g|xk¢.:qu.u\ol\A'.n¢A mmlhnyn helmet: um “L-dm.mdg¢m\zm¢|\ mg ,.. u nww mu C.n\.n\ybndy ull me an mm“ yam mu mm [mm mm \Vh.m my pmm"“ ,\.mg¢. p.u\mp.u\l Vpnkeup “mu um.“ my mymu mam And ulywnnanlly wmklt n yam c.u\.n|w.:yV m rum: mm yum nu “
Figure 18:Sans Serif, Justified text |
Output in ocr:
Slephen cevays “Big Racks"
men a— PM Fusmnngs Flrsl
In me rmdtfle ev a sermnar en nme managemem recaus ceyey m ms beak Fvsl Thmgs
Fvsl Vecmrersawd ‘Okay r1‘snmeVaraqmz“Reacmng under me name ne puued mem
ned gaHan ‘Ar and set A en me name nexx la a mailer covered wen Ms! swzed racks “How
many ev mese rem de yau mmkwe can gel m me ;ar7“ ne asked me aumenoe
Nler me smdems made men gueges me semmar Veader sawd ‘Okay Vex‘; nnd am “ H
e pm ene rack m me ‘Ar men anamer men anamemmm ne more new wamd M rnen
heasked ’lsme;aHuH7“
Everybody oamd see max nex ene more ev me racks wamd m se mey sawd "Ves
“No! se «as: “ ne caunaned From undermexame ne med am a bucket ev grave! dumpe
d u m me ‘Ar and shank u rnegrayex shd mm aH me hmespaoes veu ey me mg rem
Grmmng mesemmar Leader asked enee more “B me;armH7‘
A Mme wwser ey new me smdems responded “Pm eamy nex “
Gaad hexeacher sawd rnen ne reached under me vamem bnng upa bucket ev sand
He svaned dumpmg me sand m me an wnneme smdems walched me sand nued m m
e Mme spaces veu eyme new and grave! Once more ne Vaaked ax mecass and sawd “
Naw wsmew new
“Na “ eyeryenesnemed back
eeed- sad mesemmar Veader wne men grahbed a pndnerev water and began in pan
nu ma me ‘Ar Hegel samemmg hkea quart ev water mm that ;ar eevere ne sawd “Lame
sand germemen me ‘er \s newmu can anybody IeH me me xeswn yaucan Veamvam
W57 wnars my pawn?“
In this example, Figure 17 (Serif, Justified text) has the worse result than Figure 18. Lets examine another example.
Output in ocr:
Output in ocr:
Figure 19: Serif, Justified text |
n...11-.. nu...-1... 1...1..1. 15 . ...111ne; ...1.1...... .....z1 .1... ._....1.1 .5 .5 ..........5 .. 11-gal. 1...1.111. 1.5.5 ...1 5.1.15 5....1......51. ...1 .1511». n.. ..11...11-1 5...11 1... 11....-...1 5.. .1 1.....1.1.5 .... ..... ....1.1.1. 1.... 1.... eight: ...1 1.... .15.... 1-1 1..11n-5 .5 ...... u.. ....1.....5 .1... 51-1111... m..1.... 1......-1 ...1 ........ u.. ....1 1.. ....51-1 ........15 ......... ...1 ...1....5 w...1. 1.-.1.......1. u..5. 1.....1.1.5 ..-11111.5 . ............ 1.. u... ....1 .1... 1. .1... 5...5 u... ...1 ....-111 1.... .. 21...... ....1... 1...»... ...11..1—11 ..1..... ......1.1.. .1... ....1..5 ... .1... .1... .....1.a1~ .. .1... .. .5... .—1.. ... .1....... ...1... wr=..1-..e1.-:1.-..a.1.1t:\-.1e1.-1.1.111. ......... .. ....1. 51.1.4.5 u.. ......... 11 ...1.z1 .1 ma... .............5 .... .1... 1....... 11.. ....1.1., ...1....5v ..1....1 ....1 g......¢.....1 ...1=1u111...5 ...11...... 1.... :1... 5. u.. ...11 ...1.n....1 1..15......5 1... .......5 .. .......1. 11...1...5 .. ..11...... ...y5 .. 5..1...5' 111... ......1.a.u um). ........ ...1 .......... .s.m51 51..11.5 ..... 1...-.111-1 .1u.n-.1. .. ....... .....11...11 ....5... .11 .1 ".5. .............5 ~.1.u....1. 11.. 1.... ..1..... ...1.......... .5 ...11 .5 1.55.1.1. .. ....5. 111...... ...1 ........1.1 .. .1. 5. .....1. u.. 5...... .... 4.1 1.1.... ....5. ..51.. «.11. ....5....5 .11.... ..,.....5 11...... ....1.1=. :......¢1.m1..111 ...1.... 5...... ..1...1...5 Lu]. ....m...... .1 ......... ..............51, ...1 4.1 .1... ............1 ....1.....5 Lad]. .s1-11...... .1... .....1 s.. .........1 1......» 21...... ....1.1-. ....1... .n.1-.1. 2.. u... .5... ...11.....»5 .1 ...5..... 1...... m1...... .......... ... ...5......1........ m1--1u...........1.5..1.....5...5... ..u..........1...5.............5. 1...... .5 1.... .11..... ...1 3...... 1........ 15 ..........1 ..1.... 4.1 ..., .5 ..5......5 .....1... . .1... 5.. .11....... .u1sc...5 .. 11.. .....1.1.. ...1 5.11. .1... .. an-nut 5..1....5 .. .......5.... 11. u.. ....1 .1 . ....5.1, An) .1. ..5.........1 .......5 Lu]. .... .....5 1.115 .1.5..55...5 .....1.51 swllhrt u..5. 1......-1 mush-.5 1-1 1111...»... 5..1.........1 111...... ...1 4.1.1.. .55.55.....5 Lu]. ...5 1:-1»..5, 11.1.1.1... 5.». 1...-1...........111m..1. wn1.—m...1..5 1.. .......5 .. .1.......... ....1 1...... .1. 1.....1.ag. ....1 5.115 .....1.... .. u.. .m.....5, ...1 :. ..5......5 .. .1... «.1g.m..1 1...... .1... .... 111.1. 1...... 1...-...1-1 211...... ....1... ....1... .....1.... ...11.. ...1..:..... .......1. 1...... .11,-..... ...1 11...... n.... .5 ......... ......... .. .1... .5 ......1 .1 51.1.4.5 ....55 11........ .1.55....5 .. ..... ....1... . .1... ..5.1111.... 1:. ......1., .1... ..........5 ...1.... 1.... .111... g...u1 ....5. ....5.5 .1... .5 11.....55.1. ..11.......... 1.. ... ....5. ...1. 1.. ....51...1 .1....... 1.. ....m... .5 . 1.5.1. 5......5v .....«.....5 ...1 Int 1...... ...5 1.1.5 1...... .1... ...... .. ....«.¢...5 .... ............... u.... ...11..111 1.1.. m.1...5 1.... .... .. 1>..-1..... 1...... .....1..... .1. 1...... .u1sc...5 .. 11.. .....1.1.. ...1 5.11. .1... .. an-nut 5..1....5 .. .......5.... 11. u.. ....1 .1.....5.1....5 ...1...5 . .1... (Air! ............1...1.1.5u..............1... 11m51.55 ...1. u.. ...1 s11111I-H; 1...: ...11... ...... ....5. 1..1....5 L:.n. .. .1.55 1:-n..1...... 1....» 1.5., ...1 1... .5.11uu....m 1.. u.. s.11...5 ...1 .. .1.55 .11...5 .5 .. ..5.1.. ..1........5 and] ...1 .....5 .. ...1.... ....11...5 ....1 .......5 .1... ...1 ...5. .1....u..., 1.11.1 ernhcn 1...5 .. . 1.... 11......... 1....... u............ s.. .11 .......5
Figure 20: Sans Serif, Justified text |
Output in ocr:
nmn.-u -nmaua.
nmn.-u -nmaua.
rgamv... .5 a g.....pI=va m.n..ragg.g..
a(uvIIlY5 .mg.. ma....n.. ..5 a5 Immnnvs .a
n-sag mlnrl: .a5.g and gnals
5.m.n.a..g....s:y am ugamuy mg um...-1
5ma.. my. mmmgm 5g. m pvlnnrls gay.
mang .gam.-1 hm. m...g diam»: am
mmg dfiuux. my mum ..5 g.ga.g mg
g....m......5 ma! sun:-It a....m... .ga....-a
am m...m.zg mg .gga fw .gy.5..-1
ma.gy.a.5 tumult. am m.:.g.g5 wmg
Imrlanemnq ms hnnnnls .ga....5 a
(nmmnmux ... ..mg am dlwt. .. ac...
sans ..mgam mam um. um
Ensmwg rgmua Irmivs am.-um
mutant mwmeax am...» gunman; am
..5.n; that mmmgaae m .m...... ml gmmag
agsqy. am tlasmwn .gam...;
wngy. w:\u(h,w:dnm!|\m\ra(h!r:
(nmux, mg team 5...ag...5 mg (mg... A
yanay m5....ag... (hala(\n|§I(§ (an angu
ngammq Fur gxammg, 5...ag...5' nlmval
am .,x...ya......a. hadramwms ummuzg
haw mgy 5gg mg wand. risnrllnxy
macka.m.....15 ngaa 5..g....5 .a awmam
hmhluns ... mlaux W313. and 5....1gy..5'
hnnv mmmgame mm. a((waAr am
...am.a..g asnettsl 5:.amg5 ..gm uganma
umm.y. mg (anm a.a...a.ay mga5..g
an m mag (hala(\rn§I(§, qamg....q mg
mam .g.gya... Imwmannu a5 gany a5
nmaamg ... (mus: may.-v-.1 and (nmmm
m m. 5.. .1......; mg rnmrv (an .a)
.m...m g....5g my. (:41. .a:.5.....5 am».
..n.m5..yg5 naan... gxamng5 v...mau. ml
1.4:: gxaa... marten animus Lu.
..mm.«ga....... m (wnmnu m|s(nv(=rI‘mr§L
am 4:) q.-ng ...am..g..m.a. anamrammm
<:.n.n.a§...;...m. m mg ..gga my a.a..m.a.
hva( gg
Ensmwg rgaannq Irmtvs a...;...a mg
mgg mam (umrnnuts .1 Inimthuuz
.gam....; .2:-may.-5 asgmmu am
...ay..gmm.a.an.y.m.5
ran...-1 mg ..mg .a an m.5 urflmm sans ..mg
m mg gm. and .gam5 m a ham: gm..5g
Eaauna .5 m...g diam»: am marten
learn") .5 g..y.a..gga win. .a) we. a5
...5m.g.m5 a...g.na.g a (lean 5g: m .gam.....
.m.m5..ys us. mg mmmugage am 5...u5 max
mg awed 5....my..5 .a dannmwanr my mg
and m a (mag). my mg ...5m.g.....a.
a(nv|n5 4:41. gasg a..mg5 lahs.
m5g..5a.m.5. .ga.1.....5) war-rt mag uganma
.m.m5...¢ my mum.-1 wahwnuxad
r-agngg. am .4 mg asmmg...5 Lu, .ga5
harass. Kantian 5g5, memmmaymu rluwme
nnnnu.-....g5 my mmens .a nemnuwalr
and m-amgg mg mmmgage am 5...u5
amnlaam m mg mxnvs. am my
...5m.g.m5 .a mug. ralqnun vamagk that
gay. ~;...ag «mg. .ga.....q
Bvgmwg texnng Ivmivs ay..g.:az..-;
smut smgnamam -agayanz ugmmnq
.m.gmy.-5 am rmum-5.
mu: .5 amazu... yay.a...m. ... vmax .5
sperm‘! m 5...ag...5 anms Amultan
tlaslmms am am m.m... a 5......
m5g.n..g Fw gxammug. vmax g....a......5
gymgagg may may mgany anus g....5g5
max .5 m=.....55.m.g (rllamvannu ... mg
gm..5g mud mg gmmgu g..ga....... ...
aymgy As a ram. mamas‘ amgna......5
may m. man. ....5 n...5, hang (lea: ammm.
m... gxx-ma......5 and (wnmuntafinj m...
g.n.g..¢y hdns a....m...5 ngam mmg am
nammm mgug Amnlmnq m... uganma
.m.m5..ys us. mg mmmugage am 5...u5 max
mg awed 5....my..5 .a dannmwanr my mg
and m a gm...5=. ms a....m...5 a (lea: Kama
manmfwatnmahislianmmnvllnvliuv
nma.g55 am... mg may s.m..any. mam;
smut ammm gm..5g palms cm. In gm
Damupaunu. nanam ..5g. am .a..g
aslylmmll ... mg 5yuam.5 an. m (lag
a..mm5 ..5 .a lad»: rillumts gany am
.gy..a5 m .gm..gg gm..ug.5 am .gy5.m5 man
may ay.5g Alm9:!ha,harI1 Bmllu .ga..5 .a
a m...g wmucnvg leamlnj umlrnmull my
an 5...ag...5
As see in the example above, ocr results are poor due to the fact that the given document has a low quality. The worse result is yield by Figure 19 (Serif, Justified) since characters are poorly recognized.
■ With figures (one serif, one sans serif)
■ With figures (one serif, one sans serif)
Figure 21: Serif |
.PERFDRMA FOR VERIFICATION OF NATIONALITY STATUS/ANTECEDENTS IPLEASETYPEOK m NT) FULL NAME: AL1AsEs, IF ANY: (a) F L NAME 01: F3)? L NAME 01: (C) FULL NAME 01: SPDUSE: DATE AND PLACE 01: (DD-MM-YYYY) (VILLAGE) PRESENT OCCUPATION: NATIONALITY: Whe|l1er holds dual na|ianali|y. (1? bo|l1 |l1e na|ionali|ie.~' k: be clearly memioned) PRESENT ADDRESS IN USA Telephone No.
Figure 22: Sans Serif |
Output in ocr:
As shown in all the examples above, tesseract ocr recognizes more characters in most machine-printed documents with Sans Serif font.
2. Hand-written documents
■ One-paragraph description of myself
Output in ocr:
It's hard for tesseract ocr to recognize all characters in a handwritten document as shown in the result above.
■ One-paragraph description of two other persons
Output in ocr:
Output in ocr:
This is the worst result I got so far because individual characters are not recognized.
CONCLUSION:
Tesseract OCR can recognize individual characters better in machine-printed documents than handwritten documents because tesseract has trained data for languages. In our example, we used English language and English characters are recognized. Handwritten documents gives a worst result. Handwriting - that is, of handwritten printed characters, not cursive- has a wide amount of variation. For example if I write same sentences 5 times, the look at each page - no two characters will be exactly alike. If I trained tesseract on multiple examples from the same font - I have to learn the variations. On top of that, handwriting is 'unique'. Each person's handwriting should be thought of in terms of different fonts and there's no way to train for that.
Tesseract OCR is quite powerful, but does have the following limitations:
PERFORMA FOR VERIFICATION OF NATIONALITY STATUS
TECEDENTS
IPLEASE TVPE on PHI
rm
. FULL NAME:
. ALIASES, IF ANY:
. (a) FULL NAME OF FATHE
(h) FULL NAME OF MOTHE
(C) FULL NAME OF SPOUS
. DATE AND PLACE OF BIRT
BDMM-WW)
. PRESENT OCCUPATION:
. NATIONALITY:
Whether halds dual natianalit
(If sa, bath the natianalities ta
be clearly mentianed)
A
Telephane Na. Email:
(VILLAGE)
AN
(DIS
. PRESENT ADDRESS IN US
In this example, Figure 22 yield a better result than Figure 21.
As shown in all the examples above, tesseract ocr recognizes more characters in most machine-printed documents with Sans Serif font.
2. Hand-written documents
■ One-paragraph description of myself
Figure 23: Handwritten |
1',“ 0 FheI9mG§:|C*SGn3M;n9 wanalcwf. W9 phdamo.-Ho
y‘; ma makes mo om irrhrvw-Vi’ wo‘Hn a Pence-\¢>v3n5,
\-b.,.\‘,.m._d fmrson 41:4,. M0s+ pzopto +h2nk
gicle 0
con6€'V0‘*;V"’s °md W“
hm‘, 1;“ qu‘m4-Q shy! of ‘Hm ctrious fince I'm Q‘/a\m,om¢\‘
stow *9 “ad, dug 40 being Fh¢l3mwRo- l3u+ once you've. ‘A TV
Pad. O? my ..,;,¢,\¢, of friends, ujoutfl dcscovtr my samauma Sa 9
.— «pkg? ¢\c;\'ua“y I'M 9u."’3D;n9 I mdVen+w’u6 ’ tag’ *0 39
Wm, and was ’ro wander and +0 ‘hrawd 4'0 °\“§Z\?"""“"
being w‘:+h -Gun and ofak-€m3;+1c f~eop\e . °F"*'v"*Wes«, I
“fess {M gi.\.ua{-:ons and ntqfiuvfia P¢OP‘€4 beowsc -HR}.
d~¢(2\c-fine ms’ mu-gsj. Painfins ha; been my eflhufi
Qiqlly whafl 1"“ g,n§9*;ynfi“Y ‘B5+h0"5‘ Or * I
[Ike moon , ‘hens, b;rI\.s, sunsfis an my «f§"""‘F"
pqin+ . Rcacux-My , I haw; bu-an ¢=<p¢.vi'mc.nh'n3.
gnsngd cu; 4-he usual , WI‘!‘n;'je:'o%or or ac§~yli't._,’ ___ T‘
{by my Pgirnhne ¢q.vff(I.3
gin and ol$-than ‘
■ One-paragraph description of two other persons
Figure 24: Handwritten |
H’ I wdo +0 akscrfbc my mower , I wouid
bably say the is -Haw |nes‘\' . And I wish I
you bcauvor 415-0*;ph'on -Hna+s' reddly nu +hc.m is
95¢’; the \ocs«\' hugger, Hue bcslr cuddtcr, -we 1915+
ch“; ’ and wt 1,“? ;n;PIra+{m, Sh-t has +he, beg!" A ""“
Sm;‘¢, ‘HR. beck laugh and ‘H1: ‘O25? adV;C3‘-
In this example, the result of ocr is poor since only few characters are recognized. Lets try another example.Figure 25: handwritten |
4,-lr.=<T
4} .
(R59 ‘Hm .’
‘'5 OP?” 4-he OVPWW
,,..';¢ 0,,
9°” Na ‘W
. {°‘ , .
56“... ‘,o::5 ‘.o5_H~rI¢v1d
K a
an
ha
143
doll‘
o
comma" Vow T
" G 10 4
J °‘ 4*‘
¢“' W0 ma 5&0 has
“M ‘J
10""
O9
u-.6
. 0‘
std“
«w n =4"
#5 . M‘. a 4 pm5h:o”' “
. W A , In f‘ .
as d‘(‘ r6&‘“ +0 fiozflnabfi 0 I. _
955 4,. ‘ , W of
M’ M’ I 7'4" -..m\ “"”‘t».;w 5 FM
gas - , <1‘ V‘
yo” ,w 1"" ‘ q.o**”
_ 11*" a“
afkcr
NW‘
‘ht
e\’ “M
*h;flg‘ 7
“M
This is the worst result I got so far because individual characters are not recognized.
CONCLUSION:
Tesseract OCR can recognize individual characters better in machine-printed documents than handwritten documents because tesseract has trained data for languages. In our example, we used English language and English characters are recognized. Handwritten documents gives a worst result. Handwriting - that is, of handwritten printed characters, not cursive- has a wide amount of variation. For example if I write same sentences 5 times, the look at each page - no two characters will be exactly alike. If I trained tesseract on multiple examples from the same font - I have to learn the variations. On top of that, handwriting is 'unique'. Each person's handwriting should be thought of in terms of different fonts and there's no way to train for that.
Tesseract OCR is quite powerful, but does have the following limitations:
- Unlike some OCR engines, Tesseract is unable to recognize handwriting and is limited to about 64 fonts in total.
- Tesseract requires a bit of preprocessing to improve the OCR results; images need to be scaled appropriately, have as much image contrast as possible, and have horizontally-aligned text.
- Finally, Tesseract OCR only works on Linux, Windows, and Mac OS X.
No comments:
Post a Comment