Character Spotting and Autonomous Tagging: Offline Handwriting Recognition for Bangla, Korean and Other Alphabetic Scripts
This paper demonstrates a framework for offline handwriting recognition using character spotting and autonomous tagging which works for any alphabetic script. Character spotting builds on the idea of object detection to find character elements in unsegmented word images. An autonomous tagging approach is introduced which automates the production of a character image training set by estimating character locations in a word based on typical character size. Although scripts can vary vividly from each other, our proposed approach provides a simple and powerful workflow for unconstrained offline recognition that should work for any alphabetic script with few adjustments. Here we demonstrate this approach with handwritten Bangla, obtaining a character recognition accuracy (CRA) of 94.8% and 91.12% with precision and autonomous tagging, respectively. Furthermore, we explained how character spotting and autonomous tagging can be implemented for other alphabetic scripts. We demonstrated that with handwritten Hangul/Korean obtaining a Jamo recognition accuracy (JRA) of 93.16% using a tiny fraction of the PE92 training set. The combination of character spotting and autonomous tagging takes away one of the biggest frustrations—data annotation by hand, and thus, we believe this has the potential to revolutionize the growth of offline recognition development.
Majid, Nishatul and Barney Smith, Elisa H.. (2022). "Character Spotting and Autonomous Tagging: Offline Handwriting Recognition for Bangla, Korean and Other Alphabetic Scripts". International Journal on Document Analysis and Recognition, 25(4), 245-263. https://doi.org/10.1007/s10032-022-00410-x
Majid, Nishatul and Barney Smith, Elisa H. (2018). "Boise State Bangla Handwriting Dataset". [Data set]. Signal and Image Processing Lab, 1. https://scholarworks.boisestate.edu/saipl/1