Saturday, January 16, 2010

Topic 9 - Text and Web Mining

This week's lecture is quite interesting as it involves something that most people have done before. Text mining is a method to uncover information hidden in text and applies data mining to unstructured or less structured text files. It entails the generation of meaningful numerical indices from the unstructured text and process these indices using various data mining algorithms. It attempts to categorise textual data and not understand its contents. Text mining is used in automatic detection of email spam or phishing through analysis of the document content, analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses etc.

There are 3 types of Web Mining. Web content mining refers to the extraction of useful information from Web pages, Web structure mining refers to the development of useful information from the links included in the Web documents, and Web usage mining refers to the extraction of useful information from clickstream analysis of Web server logs containing details of webpage visits, transactions etc. As mentioned above, web mining can extract information such as visits to websites like From there, the website can make recommendations based on the customer's past visits and purchases.

No comments:

Post a Comment