Why the CNN model is not trained?

Question

Mohsen Abyani 2024년 1월 14일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2069596-why-the-cnn-model-is-not-trained

댓글: Mohsen Abyani 2024년 2월 24일

Test_2.m

I am trying to train a CNN model in Matlab to predict the mean value of a random vector (the Matlab code named Test_2 is attached). To further clarify, I am generating a random vector with 10 components (using rand function) for 500 times. Correspondingly, the figure of each vector versus 1:10 is plotted and saved separately. Moreover, the mean value of each of the 500 randomly generated vectors are calculated and saved. Thereafter, the saved images are used as the input file (X) for training (70%), validating (15%) and testing (15%) a CNN model which is supposed to predict the mean value of the mentioned random vectors (Y). However, the RMSE of the model becomes too high. In other words, the model is not trained despite changing its options and parameters. I would be grateful if anyone could kindly advise.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Maksym Tymchenko 2024년 2월 14일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2069596-why-the-cnn-model-is-not-trained#answer_1408713

Edited_script.m

Hi @Mohsen Abyani,

The problem you are trying to solve is very interesting.

It seems that you are creating images that contain MATLAB figures of 10 random points in the range of [0,2] and then you are trying to teach a network to predict the correct mean of the 10 values plotted given the image of the plot as input.

You can certainly accomplish this using a deep learning network. I have looked at your code and everything seems to be set up correctly, the only thing that needed to be changed for the network to train was the architecture itself.

In your original architecture, you are using convolutional layers, but the network is not downsampling the spatial dimensions enough. In fact, the final fully connected layer receives activations of size 328x437x16x1 which results in a weights matrix of size 1x2293376:

This defeats the point of a convolutional architecture whose goal is to gradually downsample the spatial dimension while incresing the channel dimension.

I have created a more suitable architecture for this task by downsampling the activations gradually using max pooling layers:

This architecture results in a much better training behaviour where the loss goes down gradually.

I have chosen some initial training options in the attached script and achieved an RMSE of 0.25 with a 1 minute training loop.

However, I am confident that you can achieve a much better result with some tuning of the training options.

Please let me know if you have any further questions!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Mohsen Abyani 2024년 2월 24일

@ Maksym Tymchenko

Dear Maksym,

Many thanks for your helpful answer. As you have correctly mentioned, this issue could be solved by downsampling the activations gradually using max pooling layers.

Kind Regards,

Mohsen

댓글을 달려면 로그인하십시오.

Why the CNN model is not trained?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

Why the CNN model is not trained?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기