Indexed by:
Abstract:
Correcting historical corpora in digital version is a crucial task for the historical research, however, scan quality, book layout, visual character similarity can affect the quality of the recognizing. OCR is at the forefront of digitization projects for cultural heritage preservation. The main task is to identify characters from their visual form into their textual representation. In this paper, we propose a model combining recurrent neutral network(RNN) and deep convolutional network(DCNN) to correct OCR transcription errors. The experiment on a historical book corpus in German language shows that the model is very robust in capturing diverse OCR transcription errors greatly. © Published under licence by IOP Publishing Ltd.
Keyword:
Reprint 's Address:
Email:
Source :
ISSN: 1742-6588
Year: 2021
Issue: 1
Volume: 1873
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: