To sove my problem, you can implemented a CNN encoder-decoder net
This worked very well for large images when I tried.
See this example:
In this example, they show how to a train Variational AutoEncoder (VAE) but you don't have to do that.
You can create the encoder and decoder sections on the same net and train like a regular autoencoder with multiple downsampling to a small vector followed by multiply upsampling layers (transposeCNN) until you reach the same size as input images.
If this doesnt work, try training each encoder-decoder layer one step at a time, like an onion from the outside inwards.
First create a net with one downsampling and one upsampling layer.
Then, load it in network desing app and add another pair of downsampling and upsampling layer between these two and tain again, and repeate this until you reach you desired downsampled vector length.
After that, you can either split the network into two networks or use "activations" function to activate the compressed layer to encode the images.