Imagine that you own the top-trafficked site on the Web, i.e. Yahoo!, millions of people come to it looking for information. The job you have to do is to organize this information so that people could efficiently browse and search for what they need. So far, the best practice of organizing the browsing is a taxonomy, i.e. putting the content to be searched over into a classification tree, and Yahoo! Shopping is no exception. The taxonomy is the electronic way of representing the department, subdepartments and finally racks in the store. Go to shopping.yahoo.com and browse Apparel (department), then Shoes (subdepartment), then Running Shoes and very soon you will land on pair of Reebok or Adidas shoes of your dream. Along the way you may have to say that you want certain brand (see above), color (e.g. white), size and even possibly gender (male) and age (Kids).
To ensure that you will get exactly your dream we will have to go long ways --- read on for a popular description or here is a more scientific version, my recent paper
D. Pavlov, R. Balasubramanyan, B. Dom, S. Kapur, J. Parikh.
Document preprocessing for Naive Bayes classification and clustering w
ith mixture of multinomials.
Proceedings of Tenth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (KDD-04)
2004. Postscript.