Anton Denissov

CS440 Artificial Intelligence

Prof. Betke

 

NEURAL NETWORKS II

Choosing Automobiles

FILES:

Below is the list of files you might be interested at looking while grading:

CODE:                                                 DATA:                      NETS:                                         MISC:

Modified train.cpp                    test.inp  data.inp           1  2  3  4                     

            Modified nn-helpers.cpp

 

PURPOSE:

            The purpose of this project was to identify an acceptable automobile, given certain criteria. This specific project takes five inputs: buying price, maintenance prince,  human capacity, cargo space and overall safety. The result is computer’s decision of whether the vehicle is acceptable. The importance of this project can not be overstated. Although in it’s present form, this project is quite rudimentary, a somewhat expanded version could prove extremely useful to modern consumer. Presently, buying a car is an enormous hassle. What do you value more? Would you like to drive the hutch-back Toyota that’s safe and efficient, or would you like to showboat in 1995 Ford Mustang that has elegance of a panther, but is owner’s worst nightmare maintenance-wise? This network eliminates the guess factor, as well as difficult choices commonly associated with car purchase. Furthermore, the applications of this network expand beyond simple consumer assistance. Weights on the edges can be changed to custom-taylor computer’s selection to a particular application. For example, school district can use this program to decide on which busses to buy, appropriately stressing safety over all other factors. On the other hand, a farmer can find safety secondary to cargo room. Regardless of semantics of the application, this neural net is useful to a large range of professions.

 

METHODOLOGY:

This program uses a modified version of the database called “car” from the MLRepository website. Each dataset contains 5 inputs and 1 output. The inputs are buying price, maintenance price, passenger room, cargo room and safety. The result is the rating of the car as unacceptable, acceptable, good or very good. Currently, our web only rates vehicles as unacceptable or acceptable. We did modify the original version of the database, insofar as the original version contained the sixth input, the number of doors. General observations of the data suggested to us, however, that the number of doors is a negligible factor. Consequently, we have removed it in order to simplify network infrastructure, which can no be explained as follows:

Observations of data indicated that cost is a dominating factor in a vehicle rating. Furthermore, we have observed that passenger room is generally more important then cargo room Our strategy is, therefore, to lower the overall value of the net as much as possible. We accomplish this as follows: The initial net (please see picture below) divided into  three major categories. The first node .

(3) represents the combined monetary investment in the vehicle. We observed from the data that maintenance price has a higher impact than the buying price by a factor of roughly one third. As a result, we weighted the edge of the maint. Price at three. Meanwhile, we have also given the node a threshold value of 3. We use the standard sigmoid function for our net, with sigma defined according to our previous assignment as the sum of Input*weight. Following our strategy, if we have high monetary requirement for the vehicle, the sigmoid is going to evaluate close to 1. If, on the other hand, we have low monetary requirement, sigma is going to evaluate to a negative number, resulting in an extremely low value of the sigmoid (close to 0).

            The second accumulation node (-6) measures the impact of cargo room and passenger room. By the same reasoning as before, passenger room is valued more then cargo space (based on the data) and is hence given a higher edge weight. It is important to note, that the edge weights as well as threshold for the node are negative. This sign change represents our change in perspective. Beforehand, we were trying to minimize monetary requirement, thus, lower the overall value of the net. Now, we are trying to maximize the space available, thus, we use negative numbers to represent that. As a result, if we have a large amount of space, the value of sigma will be negative and the value of the sigmoid function would be low, just as we want. Lastly, we evaluate the safety factor (-5). As you can see, as with spaciousness, we are trying to maximize safety, thus we need negatives to invert the value for net’s calculations. Finally, we observe that safety is the second most important consideration in a vehicle, seconded by the price and followed by the capacity. As a result, we calculate the overall value of the net based on the values of the three nodes described above with an appropriate scale. We thus obtain the first version of our neural net.

            In the second version of the net, we have attempted to capture a cumulative nature of the benefits and downfalls of different contributing elements.

Namely, as you can see, the inputs and the first row of elements remain the same, however, we add 6 more nodes. The first column accounts for combined effects of money and capacity. Second one accounts for money and safety and third one safety and capacity. The last row interleaves the results from the previous combinations and combines them to obtain the overall rating of the vehicle (result node). In effect, the second version of the net propagates the merging idea that we have introduced in the first one. We know individual effects of any one variable on the outcome, but properly evaluating the combination of input elements would yield the desired result.

            In our third attempt, we tried to increase the integration between different factors. We took the nodes from the second column of the second graph, but instead of combining them, we wired each node to each other node from column 3. We see this as another way of measuring the combined effect of money, safety and capacity.

Lastly, we decided to take a totally new approach. We created a completely connected net with 2 intermediate layers. We accomplished the same effect as in the pervious nets (inasmuch that we considered effect that each input has on the output), however, we considered inputs without categorizing them. This lack of categorization would assure us that we did not make a mistake (or did make a mistake as the case might be) by placing several inputs into a category in which they did not belong. Consequently, we have neural net 4.

 

TRAINING:

            We used the algorithm kindly provided by Steve in order to train our neural net. We introduced minor modifications to the eval() procedure as well as backPropogate() procedure in order to control the number of iterations that an algorithm goes through before it terminates. We have also implemented our own version of readData() in order to accommodate different data format. As we have explained in the previous section, we started with a simple model and proceeded to add layers to increase the complexity and better capture relations between agents. We used a simple squash function, being the most common as well as robust. It furthermore has the attractive properties of mapping all output between 0 and 1 and gradient-ascent-learning capabilities. It was unclear from the beginning which learning rate would be best for training our net. Finally, we adopted the following convention: we started with a high learning rate – 10 - and progressed down by 1 unit until we hit 8. At this point, we jumped to 6 and then to 3, 2 and 1. Finally, we tried 0.1, 0.01 and 0.001 as our training rates, as we felt that those would give us the broadest range of values. After the training rate has been chosen, we proceeded with incremental and simultaneous neural nets as outlined in the experimental section above. One problem that we have encountered during training was the failure of the values to converge. We implemented two ways to combat this: firstly, we used a very small dataset (about 20 sets) to train the net. This, however, yielded negative results. Although the net actually converged during training (RMS < 0.06), scoring during the live exercise proved low (~51 – 54 %). This forced us to resort to a second method: the termination condition. We altered backprop() such that if ITERATIONS reached a certain number (in our case 100,000), the net would consider itself trained and proceed to evaluation. Unfortunately, the second measure was necessary, as shall me try to run the training until it would converge, we would run out of allocated CPU time on CSA and would be forced to quit anyway.

            One problem that we have encountered during training that is particularly worth noting is the absence of variety of outputs in the training data. In out particular example, if  all the cars in the training section were unacceptable, we would end up with a very low recognition rate, furthermore, the net would not converge to a significant value (best RMS value achieved on any trial is >0.72). For this reason we have to pay careful attention to constituents of test sets. Another interesting phenomenon we observed was the oscillation. If the learning rate chosen was not high enough for divergence but too high for convergence, RMS values would oscillate, decreasing, and then increasing again.

 

DATA:

            As previously discussed in methods, we used an abridged data set available from the MLRepository under CAR directory. In the course of our research we tried several different datasets, but the best results showed improvement that was directly proportional to the dataset size. As a result, we chose 1/3 of the total data available for training and 2/3 for evaluation. These were the proportions suggested by the donor of the data (Marko Bohanec). These are the abridged training and data sets that we used.

 

 

RESULTS:

 

 

RMS

SUCCESS

RMS

SUCCESS

RMS

SUCCESS

RMS

SUCCESS

  NN

1

1

2

2

3

3

4

4

0.001

0.168892

84%

0.164536

89%

0.163555

87%

0.794984

30%

0.01

0.165050

88%

0.163083

89%

0.163435

89%

0.794984

30%

0.1

0.189913

87%

0.220090

62%

0.406031

33%

0.794984

30%

1

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

2

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

3

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

6

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

8

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

9

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

10

0.606630

70%

0.606630

70%

0.606630

70%

0.606630

70%

 

 

 

Evaluating neural nets specified above, we obtained the following results. As you can see, despite increases in complexity, elementary design of the first net proves to be the most robust and effective. (We are looking for the smallest RMS and highest % success). In order to calculate results (as can be seen from the driver for train.cpp – please see above), we used two counters, numerator and denominator. After the net has been trained, we would run it through an eval() procedure and afterward we would compare the desired result for the node with the actual output. If the two matched to within desired degree of closeness (EPS), then we incremented the numerator. We incremented the denominator per every dataset. In the end, we divided the two and obtained a percent value. For smaller datasets, we printed out the outputs as well as desired outputs and verified the correctness of the algorithm. It is important to note that using different learning rate on the same neural net and same set of data yielded different recognition percentages.

As we have already stated above, there were several issues training the nets such as RMS oscillation and diversion (please see methods and data sections for discussion). Also please note that for each neural net, values for r that are 1 and greater do not provide reliable data. At these values the neural net diverges and the percentages in the SUCCESS column are not representative of true decision-making power of the net.

Aside from that, nets with proper learning rate showed reasonably good performance (~89% best case). This means that almost 9 times out of 10, the computer predicted the overall value of the automobile properly. As can be seen from the table, serial

training is faster then simultaneous it reduces the load on the CPU by “preprocessing” of sorts, since part of the net is already optimized.

 

CONCLUSION:

In the end, we must admit that although practical and useful in its present form, our neural net suffers some flaws. We make complicated decision and successfully analyze participation of components in rating of a vehicle, however, we are only correct 90% of the time. Furthermore, we analyze only 5 inputs, whereas there could be more, not to mention that we can only distinguish between 1 and 0, unacceptable and acceptable, whereas the acceptable category spans acceptable, good and very good. Thus, although useful, our network is rudimentary, however, it is a step in the right direction and with some more time and resources can be turned into a very valuable asset.