Some of my disappointment with neural networks
Some of my disappointment with neural networks
(OP)
Hi there,
I just wanted to share some of my impressions after some extensive immersion into artificial neural networks in the context of regression analysis. True that neural nets are a powerful tool but like any tool there are limitations. I feel I still have a lot to learn in this field. Nevertheless I would like to outline some of the reservations and opinions I have come to on this subject. I would be happy if this can trigger further discussion/debate.
- Neural network as a "black box". It took me some time to try to understand what the reference to a black box truly means. Actually I have my interpretation about it. A neural network is a black box because none can truly predict what the hell is going in terms of network behavior when the topology parameters are adjusted; the whole thing is simply unpredictable. A neural net with 2 units in the hidden layers may do a good job when one with 3-unit / hidden layer would overlearn your problem. Now make a new random seed, the reverse will happen: the 3 unit per hidden layer neural net is the one that performs better. This is terrible.
-The learning method. It is incredible the number of updating methods that actually exist in order to update the neural network weights during training procedure (I refer especially to feedforward / back propagation nets). But what is disappointing is that the repeatability is very poor: solve a problem with a quick propagation and again with a batch back propagation and you get different outcomes ranging from small discrepancies to quite tangible differences considering same problem set up. Move to Levenberg-Marquardt and 2nd order methods such as scaled conjugate gradients - each time a new outcome again. Change the tuning of the learning parameters (momentum, learning rate etc,) here again different results. Regularization techniques, such as weight decay and alike do not produce same results too versus the selected updating method, etc.
-Overfitting. Neural networks need to be configured to the strict necessary number of layers/units to not overfit a problem. This is key. But go find the right balance without sacrifying (sometimes seriously) the accuracy, GOOD LUCK. Even hours of work on preconditionning data / doing principal component analysis for example could be of limited help while it is time consuming and a resources drainer. Overfitting is ugly. You could have all calibration indicators okay (RMSE, max/min differences, etc.) and still have a distastrous prediction on some points which are WITHIN the training range which will prompt immediately an "OMG" type of reaction.
- Estimating the intervals of confidence. Quite a complicated procedure to estimate theses intervals. And at the end, they only hold on the mathematical parametrization of the network but tells nothing on the physics itself. These can be misleading too (e.g. false positives). In addition spliting your training set over validation and test sets can be costly for you if you do not have enough data at hand; sometimes it is a luxury I would say.
- Randomness: Take the same network architecture and apply it to train and then predict the same problem / data set. Each time you get different outcome because of the weights random initialization procedure. If you set up a network/model once and do not touch it again - maybe that is fine. But if you want to have a network integrated as part of a prediction tool, you would never have two times the same weight distribution. As a consequence each time a difference set of predicted parameters. Go explain this an operator or to management who are ignorant about the peculiarities of the neural nets - here too good luck.
I find neural networks useful to deal with very, very complicated data structures as long as it is used as a tool to aid the engineers/scientist in finding patterns and understand how systems work - that is great. I would refrain from relying on a neural network in an engineering system design which in case of failure could have consequences on safety and have impact on public (injury/harm) and/or goods on the ground that as far as it goes it is truly a black box.
Sorry if I may have offended the expert in the field, it is really not my intent. I am just sharing modest impressions (and yes - some are personal disapointments) as someone who continues to learn the method and tries to make the most out of it.
I just wanted to share some of my impressions after some extensive immersion into artificial neural networks in the context of regression analysis. True that neural nets are a powerful tool but like any tool there are limitations. I feel I still have a lot to learn in this field. Nevertheless I would like to outline some of the reservations and opinions I have come to on this subject. I would be happy if this can trigger further discussion/debate.
- Neural network as a "black box". It took me some time to try to understand what the reference to a black box truly means. Actually I have my interpretation about it. A neural network is a black box because none can truly predict what the hell is going in terms of network behavior when the topology parameters are adjusted; the whole thing is simply unpredictable. A neural net with 2 units in the hidden layers may do a good job when one with 3-unit / hidden layer would overlearn your problem. Now make a new random seed, the reverse will happen: the 3 unit per hidden layer neural net is the one that performs better. This is terrible.
-The learning method. It is incredible the number of updating methods that actually exist in order to update the neural network weights during training procedure (I refer especially to feedforward / back propagation nets). But what is disappointing is that the repeatability is very poor: solve a problem with a quick propagation and again with a batch back propagation and you get different outcomes ranging from small discrepancies to quite tangible differences considering same problem set up. Move to Levenberg-Marquardt and 2nd order methods such as scaled conjugate gradients - each time a new outcome again. Change the tuning of the learning parameters (momentum, learning rate etc,) here again different results. Regularization techniques, such as weight decay and alike do not produce same results too versus the selected updating method, etc.
-Overfitting. Neural networks need to be configured to the strict necessary number of layers/units to not overfit a problem. This is key. But go find the right balance without sacrifying (sometimes seriously) the accuracy, GOOD LUCK. Even hours of work on preconditionning data / doing principal component analysis for example could be of limited help while it is time consuming and a resources drainer. Overfitting is ugly. You could have all calibration indicators okay (RMSE, max/min differences, etc.) and still have a distastrous prediction on some points which are WITHIN the training range which will prompt immediately an "OMG" type of reaction.
- Estimating the intervals of confidence. Quite a complicated procedure to estimate theses intervals. And at the end, they only hold on the mathematical parametrization of the network but tells nothing on the physics itself. These can be misleading too (e.g. false positives). In addition spliting your training set over validation and test sets can be costly for you if you do not have enough data at hand; sometimes it is a luxury I would say.
- Randomness: Take the same network architecture and apply it to train and then predict the same problem / data set. Each time you get different outcome because of the weights random initialization procedure. If you set up a network/model once and do not touch it again - maybe that is fine. But if you want to have a network integrated as part of a prediction tool, you would never have two times the same weight distribution. As a consequence each time a difference set of predicted parameters. Go explain this an operator or to management who are ignorant about the peculiarities of the neural nets - here too good luck.
I find neural networks useful to deal with very, very complicated data structures as long as it is used as a tool to aid the engineers/scientist in finding patterns and understand how systems work - that is great. I would refrain from relying on a neural network in an engineering system design which in case of failure could have consequences on safety and have impact on public (injury/harm) and/or goods on the ground that as far as it goes it is truly a black box.
Sorry if I may have offended the expert in the field, it is really not my intent. I am just sharing modest impressions (and yes - some are personal disapointments) as someone who continues to learn the method and tries to make the most out of it.
RE: Some of my disappointment with neural networks
And, I'm not sure that "finding patterns and understanding how systems work" is necessarily true, either. Recent news articles demonstrate that the patterns people find are the ones that they wanted to find. Amazon's facial recognition algorithms developed racial bias because the training set was biased for white males. Because of the "black box" nature of the process, it's unclear whether we humans learn anything at all, except that the machine "found" a solution to the dataset.
Nevertheless, it's nice to finally get a thread that is actually talking about actual engineering in this forum.
TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! https://www.youtube.com/watch?v=BKorP55Aqvg
FAQ731-376: Eng-Tips.com Forum Policies forum1529: Translation Assistance for Engineers Entire Forum list http://www.eng-tips.com/forumlist.cfm
RE: Some of my disappointment with neural networks
True. My modest experience is that with even much less then what you mentionned, the risk of overfitting is serious unless the data set is of really good "grade". In presence of data of poor quality (spread/variance, number of samples, noise, etc), in order to prevent overfitting the topology has to be made over-restrictive (units/layers/activation function) to the point where the accuracy may become worthless. If you have no control a posteriori of the data that will be fed into the network - there is no way you could put such a system into production to my opinion.
For example, to workaround some overfitting issues that I encountered, I tried to design a system based on an approach of parcimony. This is how it works. First an accuracy level is preset (say RMSE). The method then starts on the basis on an architecture composed of one hidden layer with a single unit - this is ridiculously low but intentional. The network is then trained several times on the same training samples; the concept is to make sure that after N draws (read random initialization of weights) there is not a single change (virtually) that the pre-set accuracy can be matched given the architecture considered. Thus the neural net / layer is increased by one extra unit. The process is repeated. So forth and so on until the pre-set accuracy is reached thereby ensuring that ALL attempts to reach the pre-set accuracy based on less complex topologies have failed. Two problems I am faced with:
- The procedure is not fast..and this is an understatement.
- How to determine the pre-set accuracy level that would be "compatible" with the data set..in other words, what is the best accuracy a given a data set will get you.
Another point I did not mention earlier and which I would like to add: Extrapolation. Clearly yes, neural nets should not be misused for extrapolation purposes. But it turns out that it is not so obvious to draw a sharp line of where the model is extrapolating and when it is not. Well you could look at the training data set and set the bounds accordingly. Yet, the situation is bit more complex because of the combination of the variables. There we go back again to the intervals of confidence which as I said before is not an easy matter (some software make such feature available but it can be expensive to access to).
Also, if for any reason extrapolation does happen - neural networks are definitely a big no no. At best the behavior is somewhere between random and the profile resulting from the asymptotic behavior of the unit(s) chosen activation function. It is difficult to "force" the neural network to follow a pre-defined path outside training boundaries, if such an excursion would happen (despite the fact it should not be allowed to); some approaches:
- select a special activation function that has the desired asymptotic behavior; next question is: how tested and proven is such function?
- adopt a "semi-physical model" which is a hybrid model, mix of a data-driven model and physical-driven model (knowledge model).
RE: Some of my disappointment with neural networks
I don't think that this would be remotely close to trivial except in hand cherry-picked data. Most of the data that I thought needed segment had non-obvious boundaries. And that sort of thing is likewise an issue with polynomial regressions; you can have something that's within certain error bounds when new data is bounded by existing fit data, but new data that's outside of the existing boundaries could wind up with a Loony Tunes answer.
But, the danger of trying a posteriori "correction" potentially can muck up whatever benefit the NN gave you in the first place, specifically because what you see as rational demarcations for segmenting the data aren't necessarily what the NN saw. That's the sort of thing that got Amazon's facial recognition software into trouble.
TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! https://www.youtube.com/watch?v=BKorP55Aqvg
FAQ731-376: Eng-Tips.com Forum Policies forum1529: Translation Assistance for Engineers Entire Forum list http://www.eng-tips.com/forumlist.cfm
RE: Some of my disappointment with neural networks
Currently I have a neural net model that can achieve great job provided I have control over the topology selection, tuning of parameters etc.
Now the challenge is to design a neural network system as a prediction tool that can be put in the hand of a lambda person / operator who is not knowledgeable about neural nets; the operator needs basically to interact in the most simplistic manner; this utopia can be described as follows: user has one area to inject the data in, they have one big button to click "RUN" and they collect the predicted data in the next form. They do not need to know what the heck is going on with the net. To some extent, if possible, they do not need to know there is a neural net animal there. To help myself, I will design a specification for the data that the user needs to provide so the "quality" of the data is more or less under control, at least it is a safeguard. Now I must design the neural network so that it operates with parcimony. Typically I will use a one hidden layer feedforward neural net with resilient backpropagation as a selected updating method which I found to be quite robust (learning rates adapted automatically, less tuning required). I tried both (A) a model with several outputs so called multi-task learning and (B) another model with several nets of single output each that I merge at the end. Pros and cons for both. (B) is good for intra-transfer (if any) of features from outputs and surprisingly is also good at preventing over fitting issues but it does require up to 10 units in the hidden layer to arrive at a descent accuracy. (A) can overfit easily and I have to select between 2 or 3 or even 4 units in hidden layer and that is difficult to do automatically. My challenge is how to pre-set the accuracy upfront case by case (based on user data), for example : given a user input data, should the RMSE that can reasonably be targeted be 0.02, 0.01 or 0.005? I am exploring a method to base the accuracy on PCA analysis (first and second components). So in short:
- A user spec. will requires the user input data to be according guidelines somehow restrictive;
- A PCA analysis that will set the accuracy level based on 1st and 2nd components variance;
- A parcimony marshing algorithm that will select the neural net topology based on the strict minimum number of units to match the pre-set accuracy level.
May I ask your opinion/thought about this? Also any experience with PCA analysis? Any shortcomings?
Thanks
RE: Some of my disappointment with neural networks
TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! https://www.youtube.com/watch?v=BKorP55Aqvg
FAQ731-376: Eng-Tips.com Forum Policies forum1529: Translation Assistance for Engineers Entire Forum list http://www.eng-tips.com/forumlist.cfm