Natural Language Processing (2)
NLP note
RNN
The problem is that the gradient of the reccurent gate is multiplicative while backpropagation with small value of loss(ex. 0.001) or high value of loss(ex.1.5). It means that there has an expoential function or vanishing gradient problem.
In order not to destroy whole internal states we’ve just calculated, it is better to pad ‘0’ in the beginning of embedded texts.(not always)
When it needs to be truncated, it is better to truncate at the end of embedded texts due to lose essential information of ditinguishing the text’s sentiment.
LSTM
It is based on RNN with a cell state and few gates which enables information flow through time without passing non-linear activation function.
A cell state takes important data and abandon unnecessary data each steps.
LSTM can overcome vanishing gradient problem but it’s difficult to train.