If you have data for "unknown" classes that you do not care about, you should remove them from the training set when using the deep neural networks.
The link mentioned in the question is for the classic neural networks and not deep neural networks. That page is referring to situations where you have a network with multiple outputs, but an instance only has a label for one those outputs. Then you can pass NaN for the labels for the other outputs, and the instance will only contribute to the loss for one of the outputs. This situation does not occur in deep learning yet because we do not support deep networks with multiple outputs.