Setting initial hidden state of an LSTM with a dense layer

Question

Yildirim Kocoglu 2021년 1월 7일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/710858-setting-initial-hidden-state-of-an-lstm-with-a-dense-layer

편집: Asvin Kumar 2021년 2월 10일

I have been working on an LSTM seq2seq forecasting problem and I wanted to set the "initial hidden states" of the network myself using static features (not changing over time).

Problem description:

I'm trying to have a stateless lstm since each time series is independent from one another in my case so I use a minibatch size= 1. I have a default sequence length = 'longest'.

Qeustions:

1) I'm not sure if 'longest' in seqlength means longest sequence of all the batches? For example: if I have 2 independent time series to feed the network with varying time steps (For example: 10 timesteps for batch#1 and 15 time steps for batch#2) would the 'longest' option change between each batch such as seqlength = 10 for batch#1 and seqlength=15 for batch#2 or will it consider the longest of both batches (seqlength=15) for both batches and the pad the rest of the 'missing' values for batch#1 with a default value?

2) The real problem I'm encountering is that initial hidden state dimensions has to be (num_hidden_units, 1) and the same initial hidden state is used between batches I believe (not sure). Is this set "initial hidden state" reset automatically to the same set init_state between batches during training if my minibatch size=1? I'm also not sure if the required column dimension of initial hidden state "1" is due to the selection of minibatch size = 1?

In my case, there are 7 static features available for each independent time series so, if I have 10 independent time series, I have a matrix of size (7,10).

In order to set the initial state of the lstm, I pass my 7 dimensional feature vector (static features) with size (7,10) through a dense layer and assign it as initial hidden state by outputting the required size (num_hidden_units,1) but, it does not make sense to me to use the same initial hidden state (if it resets to the same value between batches) for all the batches because then It seems to me that it loses its individual properties.

I understand that the questions are different from each other (even if they are related to one problem) and I'm not expecting all the answers but, any kind of clarification will be appreciated.

Thank you.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Asvin Kumar 2021년 2월 10일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/710858-setting-initial-hidden-state-of-an-lstm-with-a-dense-layer#answer_619702

편집: Asvin Kumar 2021년 2월 10일

Sequences are padded to the 'longest' in a mini-batch. More on that here: Sequence Options
The same initial value for Hidden state is used across all sequences and mini-batches. The column dimension is 1 because the hidden state is same for all sequences. It is not related to the min-batch size being 1. Refer docs here. This is also related to your follow-up comment from another question. The column dimension for 'HiddenState' property would never be 2.

You mention this:

I'm trying to have a stateless lstm since each time series is independent from one another in my case so I use a minibatch size= 1.

and this:

It does not make sense to me to use the same initial hidden state [...] for all the batches because then It seems to me that it loses its individual properties [...]

An LSTM network trains on a dataset of one particular kind. It's learnable parameters are 'InputWeights', 'RecurrentWeights' and 'Bias' as seen in this example here. Every LSTM has a fixed initial 'HiddenState' property. On training a network, the input and recurrent weights are learnt and adapt to the required targets / error which needs to be minimized. The initial hidden state would only be a small influence when you have relatively long sequences.

So, to clear some confusion, the individual properties of sequences from a dataset are captured in the input and recurrent weights. Every LSTM has to be trained on some dataset to learn from it. I'm not sure what you mean by stateless LSTM or about each time series (sequence) being independent from each other.

Take the example of the Japanese Vowel Dataset. All the sequences are separate from each other but they also share similarities in the sense that they are all about vowels. Training an LSTM on such a dataset would mean that the network captures the individual properties in its weights.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Setting initial hidden state of an LSTM with a dense layer

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Setting initial hidden state of an LSTM with a dense layer

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기