Knowledge extraction from document image using pesudo-code expression and statistical analysis
Finished by March 31, 2012
Future University Hakodate School of Systems Information Science Associate Professor
In this study we aim to extract knowledge form the scanned document image database. Historical, handwritten, or damaged documents are regarded to be hard to use because optical character recognition (OCR) does not work well for such documents. Instead of OCR approach, we convert the document image into pseudo-code expression by means of only appearance-based information. Such pseudo-code expression makes it possible to develop a fast algorithm for the full-text searching, and also makes it possible to apply the statistical knowledge extraction technique such as text analysis or text mining.