May 2016 Issue Vol.6 No.5
Associate Professor, Department of Computer Science,Sri Parasakthi College for Women, Courtallam, Tirunelveli, Tamil Nadu, India
Associate Professor & Head, Department of Computer Science & Applications,Periyar Maniyammai University, Vallam, Thanjavur, Tamil Nadu, India
Abstract: The text mining gaining more importance recently because of the availability of the increasing number of the electronic documents from a variety of sources. In the current scenario, text classification gains lot of significance in processing and retrieval of text. Due to rapid development of the information technology, large numbers of electronic documents are available on the internet instead of hard copies. Therefore, proper classification and knowledge discovery from these resources is an important area for research. Automated document classification becomes a key technology to deal and organize huge volume of documents and its frees organizations from the need of manually organizing document bases. A traditional approach to text categorization requires encoding documents into numerical vectors. This type of traditional document encoding causes two main problems are huge dimensionality and sparse distribution. This research proposes a new representation of documents, where string vectors are received as input vectors, instead of numerical vectors and a self organized neural network using the representation for its input vector. The experimentation and results with various documents and compared with existing methods and it provides better results.
Keywords:Text Categorization, Support Vector Machine, Neural Networks, Part of Speech,BP,RBF,NN.