Robots.txt governed the behavior of web crawlers for over thirty years; AI vendors are ignoring it or proliferating too fast to block
41
votes
Is anyone here familiar with crawling the web? I’m interested in broad crawling, rather than focusing on particular sites. I’d appreciate pretty much any information about how this is usually done, and things to watch out for if attempting it.