Authors: Anirban Chatterjee, Smaranya Dey, Uddipto Dutta
Text classification is the task of automatically sorting a set of documents into predefined set of categories. This task has several applications including separating positive and negative product reviews by customers, automated indexing of scientific articles, spam filtering and many more. What lies at the core of this problem is to extract features from text data which can be used for classification. One of the common techniques to address this problem is to represent text data as low dimensional continuous vectors such that the semantically unrelated data are well separated from each other. However, sometimes the variability along various dimensions of these vectors is irrelevant as they are dominated by various global factors which are not specific to the classes we are interested in. This irrelevant variability often causes difficulty in classification. In this paper, we propose a technique which takes the initial vectorized representation of the text data through a process of transformation which amplifies relevant variability and suppresses irrelevant variability and then employs a classifier on the transformed data for the classification task. The results show that the same classifier exhibits better accuracy on the transformed data than the initial vectorized representation of text data.
Comments: 5 Pages.
Download: PDF
[v1] 2019-03-23 09:29:24
Unique-IP document downloads: 121 times
Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.
Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.