7
votes
How can you have the img's src attribute point to a web page itself instead of an image?
Consider the strange case of this reddit preview page for example:
If you inspect the primary <img>
element on the page, you'll find its src attribute not pointing to any image file but (behold!) that link itself!
Through this mechanism, they've effectively hidden the direct link to that image, isn't it? How is this even possible? Is this a new phenomenon or way in web development?
The server can detect the intent of the client (do they expect an image or a webpage?) and then give a different response accordingly. An img tag can tell the client to request a jpeg specifically.
That's right. The
Accept
request header for the first page starts withtext/html
and has a fewimage/*
entries afterward. The server therefore serves HTML because it takes precedence. (Sometimes, depending on the server software, the server will ignore the text/html and just serve an image always.)On the other hand, the
<img>
uses a differentAccept
request header ofimage/*
only. So the server knows to send the image and not HTML.Actually, that's the conventional or widely popular way of serving all image URLs (ones that end with .JPG, .PNG, etc.). This non-standard way of serving based on conditional header content will likely break the working of most archiving or crawling utilities such as httrack, archive.org and search engine crawlers?
Right. I'm certainly not a fan of MIME types not matching file extensions. Worse yet when a site also checks User-Agent, and to some browsers/devices serves an image, and to others serves HTML.
Edit: To address the archiving/crawling issue: Crawlers can probably set a narrower
Accept
header if they believe it's an image file extension. Then the server must serve the image and not HTML. Otherwise the server would be considered broken.I don't see why that would break crawlers.