These all figures boils down to number of learnable parameter v/s training data size. Regularization and all does have impact on the loss and yes it is possible that it might be the case. Also there are many other reasons, the graph which you plot describing the losses, are these optimal? does all hyperparameters are optimized properly? Prof. Andrew Ng talks about cases when optimality is reached. Now if you increase the training data. The optimal loss with same number of learnable parameter and more training data will be higher. It is a tradeoff. The explanation given in the link which you shared also make sense. There is no denial.
I hope my insight gave you enought help.