Suppose I am trying to iteratively produce a completed image from some subset using a combination of convolutional/DNN methods. What Image norm is best?
The natural (for me) norm to ascribe to an image is to take the bitmap as a vector with L2. If the input image is anime or something else, the uniform coloring makes this very likely to be a good fit in a low dimension - that is: no overfitting.
However: pictures of fur. Given a small square, the AI, set to extrapolate more fur from that single image, should be expected to get that stuff right next to the given subimage right, but further away, i want it to get the texture right, not the exact representation. So, if the AI shifts the fur far away from the image left by just the right amount, it could get an incredibly poor score.
If I were to use the naive L2 norm directly, I would be guaranteed to overfit, and you can see this with some of the demo algorithms for image generation around the web. Now, the answer to this is probably to use a fourier or a wavelet transform and then take the LN norm over the transformed space instead (correct me if I'm wrong.)
However, we get to the most complex class: images with different textures in them. In this case, I have a problem. Wavelet-type transforms don't behave well with discrete boundaries, while pixel-by-pixel methods don't do well with the textured parts of images. Is there a good method of determining image similarity for these cases?
More philosophically, what is the mathematical notion of similarity that our eye picks out? Any pointers or suggestions are appreciated. This is the last of two issues I have with a design I built for a Sparse NN.