Deep Learning : Network output a flattened image (help with theory and ideas)
조회 수: 3(최근 30일)
Thank you so much for taking time to read this post :)
So I have been playing around with simple nets to do classifiaction. I have experemented with networks for image classifiaction and have had succuss with those. However, I am interestred in building a network that takes an input image performs a transformation so that it looks like an output image. I believe this is called image-to-image regression. I have tried several types of networks so far and will list them below. I'm hoping that you can help guide me towards a better solution to my problems:
Input images are human eyes focused on the iris (I have 30k quality images). The output image is the iris extracted in a rolled out format (I have 30k of these target images).
Attempt 1: For my first attempt, I had an input image layer followed by many fully connected layers and an output regression layer (the target images were flattened). This had very noisey results that were not very good. I think because many of the eyes in my database are not exactly centered. Did some resarch and started playing with conv layers since these aren't location dependent (they slide over the entire input image). I was hoping this would minimize the varriations in the iris center.
Attempt 2: Started playing with conv layers at various depths including residual nets. my input images are 150 x 200 and the outputs are 32 by 720. This forced me to use transposed conv layers to match the output size and I kept getting either all 0's, all 1's or just NaN as ouputs. This discouraged me so I moved on to a hybrid:
Attempt 3: As a front end i have a residual convolutional nerual network. this is followed by a fully connected layer and then my output regession layer (again, the target images are flattened). This is my best results so far. I can vaugley see the desired output images. However it is pixilated column wise (very odd). I think this is an artifact of flattening the target images which is causing me to lose some spatial infomration.
So, here I am, not quite where I would like to be and not sure what to try next. Any advice would be greatly appreciated :)
Mahesh Taparia 2020년 11월 20일
It seems you are perfroming image regression. You may try with convolution autoencoder based networks like SegNet or any other custom network which can learn the spatial feature as well. It may improve the performance.