April 2016 Issue Vol.6 No.4
A Novel Text Stream Clustering Technique for Web Pages using Sliding Window
https://ia601509.us.archive.org/0/items/vol6no0401_201808/vol6no0401.pdfV.Kumuthavalli
Associate Professor, Department of Computer Science,Sri Parasakthi College for Women, Courtallam, Tirunelveli, Tamil Nadu, India
Dr.V.Vallimayil
Associate Professor & Head, Department of Computer Science & Applications,Periyar Maniyammai University, Vallam, Thanjavur, Tamil Nadu, India
Abstract: The text mining gaining more importance recently because of the availability of the increasing number of the electronic documents from a variety of sources. In the current scenario, text data streams gains lot of significance in processing. Due to rapid development of the information technology, large numbers of electronic documents are available on the internet instead of hard copies. It provides beginning advice to information in social network for making decision, a clustering for text stream algorithm is proposed to cluster the text stream, which is formed by web crawler to continuously grab the web pages. The time sliding window able to split the text stream into continuous segments of web page news associated to velocity of stream and size of sliding window. Here, multilevel cluster method is used to merge the cluster in each sliding window. The results of experiments, used 2750 web page news simulate text stream by web crawler using the algorithm with executing efficiency and the higher clustering quality in terms of precision and recall rate. The experimentation and results with various documents and compared with existing methods and it provides better results.
Keywords:Text Categorization, sliding window, data stream, text mining, clustering.