Last time I spoke about exploring some algorithms that could be able to analyze a set of neural networks with different structures and then select the best one.

These kind of algorithms are very interesting in the fact that they make the best choice of neural network in term of architecture (i.e. the number of hidden neurons or of hidden layers…). It fits the neural network structure to the problem they have to solve, keeping the number of parameters (i.e. the complexity) of the problem as low as possible.

I found these algorithms very interesting to me because they could allow me to automate the process of trying manually different values of neural networks parameters such as the number of hidden neurons. It also extract some figures/scores that are comparable even if the networks have different structures. These scores would allow me to numerically compare many neural networks and see clearly which one performs best to finally use this one and throw away the others.

I then decided to give this big part a try. This is a big challenge because it includes lots of calculations that are not straight forward to me.

Fortunately, I bought a very complete book that’s called Apprentissage Statistique – G. Dreyfus. This book even if I don’t find well structured gives a lot of good tips and tricks and describes in quite good details each steps one should follow to make these algorithms come true.

So I followed the guidelines from that book and went through these steps :

- Understand enough of the whole algorithms to determine the class architecture I should use. This step is something I always do to start with as good conditions as I can and not be trapped later on because I didn’t think about a particular case or didn’t understand how some stuff was actually working. That’s why I think it is really important to put things into perspective before rushing headlong (because you can’t see the wall in front of you once you run) !
- Code some calculation routines such as calculating the number of parameter of a given model, calculating its Jacobian matrix, the mean squared errors, the residuals, the leverages, the determination coefficients, the Predicted Residual Sum of Squares (PRESS) score, etc.
- Put all these figures together to make a decision on which model is the best among a batch.

While going through all these steps, I hit a complex problem I already faced some time ago : how can I know when I should stop my model learning loop ?

This is a recurrent problem when dealing with neural network. The learning phase is an iterative weight modification of very small amounts until the model finally converges or reaches an acceptable threshold of error for example 0.01%.

The problem in neural network is that if you have you model learn for too few iterations it does not perform well, but in the opposite, if you have it learn for too long, it becomes overfitted. This means it has become too specific to the problem you want to solve and thus, it does not perform well neither. So you have to find a middle point that is hard to find because it depends on many parameters such as the number of parameters of the model, the structure of the model (number of neurons, number of hidden layers and so on).

Until that day, I always used the basic workaround which consists of setting an empiric number of iterations which seems to give good results. But now I couldn’t use that one since the selection algorithm will have to perform on multiple different model structures so that I couldn’t consider using the basic approach…

I searched on the web to find how I can deal with that recurrent problem. I finally found that marvelous algorithm from Lutz Prechelt. « Early Stopping – but when? » In Genevieve B. Orr and Klaus-Robert Müller: Neural Networks: Tricks of the Trade, volume 1524 of LNCS, Springer, 1997.

This paper looked so incredible to me because I found it quite easy to code and to integrate in my current workflow. Once again, I tried to put it into perspective and decided how I should structure it in term of code.

After some tests and some hours of intense reflexion I finally came with a working early stop algorithm that I could integrate in the model analyzer algorithm and have it decide for me when it should stop learning.

In the end I have a working analyze and selction algorithm that is able to benchmark a given set of models and extract the best one to be used further on ! This was really exciting since I thought it would be quite out of my reach because it meant going through a lot of steps with each their own issues and difficulties…

I made a quick snapshot of the final results it gives me.

On the upper left corner there’s the GraphViz code of the architecture of the whole system (see my previous post). On the bottom, there’s graphs that shows the final results :

- The blue plot on the bottom represents the input data of the models
- The blue plot on the top represents the desired output data (the data I measured directly on the system)
- The orange plot is the model output of a linear model (ARX) that’s been taught at the same time with the neural networks. This is to compare this linear model with non linear models (i.e. neural networks)
- The red one is the neural network that has been selected by the selection algorithm. So this one is the one that performs best
- Finally, the orange bar plot represents the final scores of all the benchmarked models. On the X-Axis the number represents the number of neurons in the hidden layer.
- The graph with Frequency written on it was for test purposes so there’s nothing on it…

Ok, this is it ! I’ll have to test this deeper to make sure everything works correctly but the results are pretty encouraging !