Here is a table with contact lenses related data....
We want to classify the tear-production-rate. Which is the best one-attribute classifier according to 1R?
Now, lets build a decision-tree classifier for the contact-lenses dataset using the algorithm discussed in lecture.
The final project has been posted here. It is an open project -- you can choose the dataset(s) you are interested in. While there are no constraints on the datasets you will choose, here are some thoughts/recommendations.
Of course, choosing an appropriate dataset is the
most important step. Unfortunately, some choices of datasets are not
suitable for mining meaningful patterns. For instance, we consider
GDP growth data in
World Development Indicators. Why is the dataset not suited for data-mining?
You may be able to come up with a model to describe GDP growth of a specific country, but intuitively, finding a model that
To choose appropriate datasets from the thousands that are available online, a better approach is to have a question in your mind first and look around for information that seems relevant. For instance, you could explore the GDP growth data from different perspectives (in combination with other datasets): e.g. discussing the importance of agricultural and industrial growth in the general GDP growth;] or the relationship between GDP growth and life expectancy. These are just my superficial thoughts, I am sure that you can find better topics of interest and produce meaningful results.
Maybe you are interested in predicting values in time series data, such as predicting prices for stock data. But it is not an easy task to predict values without complementary information, because you need a good model to predict something. At the same time, it is difficult to evaluate your prediction if the time series data is not large enough. The easier way is to try and uncover relationships between the time series data and other information.