Authors: Anant Khandelwal
This paper mainly focus on how to extract clean text from the stained document. It may happen sometimes that due to stains it becomes very difficult to understand the documents and from the previous work it has been seen that one particular modelling technique either through Image processing or Machine learning which alone can’t perform for all the cases in general. As we all know ensemble techniques combine many of the modelling techniques and result in much reduced error that would not be possible by just having single model. But the features used for different models should be sparse or non-overlapping enough to guarantee the independence of each of the modelling techniques. XGBoost is one such ensemble technique in comparison to gradient boosting machines which are very slow due to this it’s not possible to combine more than three models with reasonable execution time. This work mainly focus on combining the truncated convolutional Autoencoders with sparsity take into account to that of machine learning and Image processing models using XGBoost such that the whole model results in much reduced error as compared to single modelling techniques. Experimentation’s are carried out on the public dataset NoisyOffice published on UCI machine learning repository, this dataset contains training, validation and test dataset with variety of noisy greyscale images some with ink spots, coffee spots and creased documents. Evaluation metric is taken to be RMSE(Reduced Mean Squared Error) to show the performance improvement on the variety of images which are corrupted badly
Comments: 14 Pages.
Download: PDF
[v1] 2019-06-12 14:18:43
Unique-IP document downloads: 72 times
Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.
Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.