Community

(REALLY) Deep Recur...
 
Share:

(REALLY) Deep Recurrent Networks  

  RSS

jfrjunio
(@jfrjunio)
New Member
Joined: 3 months ago
Posts: 2
15/04/2019 1:31 am  

 Hi everybody. I have worked with a series prediction problem using recurrent neurons (GRU and LSTM).

 My experiments have shown that the best performance comes with one unique hidden layer. Only two layers and I start losing performance (accuracy prediction measured as recall). The more layers, the worse the performance.

 I already tried principles from residual nets (to the best of my understanding), but no success at all.

 Any ideas?

Regards.

JR

Jose F Rodrigues-Jr
Associate Prof, CS Dept
University of Sao Paulo, Brazil


Quote
Mo Rebaie
(@mo-rebaie)
Eminent Member
Joined: 4 months ago
Posts: 46
16/04/2019 2:45 am  

Hello jfrjunio,

not always adding more hidden layers will help in increasing the performance of your model, your issue may be a result of vanishing gradient, so that the more hidden layers you add, the less significant a change is accounted for, and the performance decreases.

 

Frequent causes of failure are:

1. Not having enough weights to adequately characterize the training data.
2. Training data doesn't adequately characterize the salient features of non-training data because of measurement error, interference, noise, insufficient sampling size & variability.
3. Random weight initialization.
3. Fewer training equations than unknown weights.
 
 
Check these various techniques to mitigate these causes are:
 
1. Remove unnecessary data and eliminate outliers.
2. Use enough training data to sufficiently characterize non-training data.
3. Use enough weights to adequately characterize the training data.
4. Use more training equations than unknown weights (The stability of
solutions with respect to noise and errors increases as the ratio increases).
5. Use the best of multiple random initialization & data-division designs.
6. Use an appropriate Optimizer.
7. Apply Regularization (It may be early stopping).
8. Apply K-fold Cross-validation (choose a suitable number of K).
This post was modified 3 months ago 2 times by Mo Rebaie

M.R


ReplyQuote
jfrjunio
(@jfrjunio)
New Member
Joined: 3 months ago
Posts: 2
20/04/2019 2:15 am  

Hi, thanks a lot for the reply. Yes, all those factors apply.

For recurrent netwoks I found out that the use of tanh activation causes a strong vanishing gradient; however, it does not work well with other activations. I am still researching.

My best regards!

 

 

 

JR

Jose F Rodrigues-Jr
Associate Prof, CS Dept
University of Sao Paulo, Brazil


ReplyQuote
Mo Rebaie
(@mo-rebaie)
Eminent Member
Joined: 4 months ago
Posts: 46
29/04/2019 4:21 pm  

Hello jfrjunio,

a common problem in RNN is the vanishing gradient problem,

"Sigmoid" activation function is used as the gating function for the 3 gates (in, out, forget) in LSTM (it outputs a value between 0 and 1), it can either let no flow or complete flow of information throughout the gates.

Sigmoid function has all the fundamental properties of a good activation function.

To overcome the vanishing gradient problem, you need a function whose second derivative can sustain for a long range before getting to zero, and in this case, "Tanh" may be a good activation function.

Tanh function often found to converge faster in practice, and gradient computation is less expensive.

A good neuron unit should be bounded, easily differentiable, monotonic and easy to handle. If you consider these qualities, then you can use "ReLU" instead of "Tanh" function since they are good alternatives for each other.

ReLU function doesn't saturate even for large values of z, it has much success in computer vision applications.

M.R


ReplyQuote
Share:
Subscribe to our newsletter

We use cookies to collect information about our website and how users interact with it. We’ll use this information solely to improve the site. You are agreeing to consent to our use of cookies if you click ‘OK’. All information we collect using cookies will be subject to and protected by our Privacy Policy, which you can view here.

OK
  
Working

Please Login or Register