Lab 10 - Classification Learning


Objectives

  1. Classification learning: 1R algorithm and Decision Tree algorithm
  2. About final project


The Dataset

Here is a table with contact lenses related data....

AGE
SPECTACLE-PRESCR
ASTIGMATISM
TEAR-PRODUCT-RATE
CONTACT-LENSES
young
myope
no
reduced
none
young
myope
no
normal
soft
young
myope
yes
reduced
none
young
myope
yes
reduced
hard
young
hypermetrope
no
reduced
none
young
hypermetrope
no
normal
soft
young
hypermetrope
yes
reduced
none
young
hypermetrope
yes
normal
hard
pre-presbyopic
myope
no
reduced
none
pre-presbyopic
myope
no
normal
soft
pre-presbyopic
myope
yes
reduced
none
pre-presbyopic
myope
yes
normal
hard
pre-presbyopic
hypermetrope
no
reduced
none
pre-presbyopic
hypermetrope
no
normal
soft
pre-presbyopic
hypermetrope
yes
reduced
none
pre-presbyopic
hypermetrope
yes
normal
none
presbyopic
myope
no
reduced
none
presbyopic
myope
no
normal
none
presbyopic
myope
yes
reduced
none
presbyopic
myope
yes
normal
hard
presbyopic
hypermetrope
no
reduced
none
presbyopic
hypermetrope
no
normal
soft
presbyopic
hypermetrope
yes
normal
none
presbyopic
hypermetrope
yes
normal
none


Classification Learning

We want to classify the tear-production-rate. Which is the best one-attribute classifier according to 1R?

Now, lets build a decision-tree classifier for the contact-lenses dataset using the algorithm discussed in lecture.


Some Suggestions For The Final Project

The final project has been posted here. It is an open project -- you can choose the dataset(s) you are interested in. While there are no constraints on the datasets you will choose, here are some thoughts/recommendations.

Of course, choosing an appropriate dataset is the most important step. Unfortunately, some choices of datasets are not suitable for mining meaningful patterns. For instance, we consider GDP growth data in UNData: World Development Indicators. Why is the dataset not suited for data-mining?

You may be able to come up with a model to describe GDP growth of a specific country, but intuitively, finding a model that fits multiple countries is quite hard. To obtain a model that can generalize to previously unseen countries, we would need additional attributes that describe the countries and that might be relevant to their GDP.

Again, before looking for a specific dataset, we should have a well-defined problem.

To choose appropriate datasets from the thousands that are available online, a better approach is to have a question in your mind first and look around for information that seems relevant. For instance, you could explore the GDP growth data from different perspectives (in combination with other datasets): e.g. discussing the importance of agricultural and industrial growth in the general GDP growth;] or the relationship between GDP growth and life expectancy. These are just my superficial thoughts, I am sure that you can find better topics of interest and produce meaningful results.

Maybe you are interested in predicting values in time series data, such as predicting prices for stock data. But it is not an easy task to predict values without complementary information, because you need a good model to predict something. At the same time, it is difficult to evaluate your prediction if the time series data is not large enough. The easier way is to try and uncover relationships between the time series data and other information.


CS105