CS105
LAB 10- 1R and WEKA


Exercise (SQL or Data Mining?)

1. Given records of hospital treatments we need to find out how many of these took more than 2 days

Answer: SQL! Since we are just trying to count the number of tuples which have a specified attribute.

2. Given records of patients check-ups we need to predict when a patient will come in for a check up for the next 12 months.

Answer: Data Mining! Here, we are trying to make an estimation based on a dataset.

3. Given our results for 2 we need to find the day of the year for which the hospital will perform the most check-ups.

Answer: SQL! The data is coming from question # 2, here we just perform queries on this dataset for finding the most crowded day of hospital.

4. We have micro-array expression data of various genes. We need to find groups of genes that perform similarly to each other.

Answer: Data Mining! We are trying to find a group of genes, which have common properties. We should use clustering. (Notice that, there is a slight difference between classification and clustering.)

5. We want to discover relationships between products sold by an e-store.

Answer: Data Mining! 'Discover relationships' can be done by association learning.

6. Based on a human's height, weight and month of date we want to predict her height in the age of 2.

Answer: Data Mining! We need numeric estimation.


Running the 1R algorithm

We want to classify the tear-production-rate! Which is the best one-attribute classifier according to 1R?

Answer: Use contact- lenses, since it gives 21/24 accuracy, which is greater than using other attributes.


CS105