-
35 votes
-
Content Independence Day: No AI crawl without compensation!
14 votes -
Anubis works
35 votes -
Please stop externalizing your costs directly into my face
121 votes -
Trapping misbehaving bots in an AI Labyrinth
40 votes -
Block AI scrapers with Anubis
27 votes -
FOSS infrastructure is under attack by AI companies
39 votes -
LLM crawlers continue to DDoS SourceHut
11 votes -
Nepenthes: a tarpit intended to catch AI web crawlers
33 votes -
Websites are blocking the wrong AI scrapers (because AI companies keep making new ones)
18 votes -
Chrome/Firefox Plugin to locally scrape data from multiple URLs
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it: A free chrome or firefox...
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it:
- A free chrome or firefox plugin
- Local scraping: it runs in the browser itself. No cloud computing or "credits" required to run
- Scrape data: Collects predefined data from certain data fields within a website such as https://www.dastelefonbuch.de/Suche/Test
- Infinite scroll: to load data that only loads once the browser scrolls down (kind of like in the page I linked above)
I am not looking into programming my own scraper using python or anything similar. I have found plugins that "kind of" do what I am describing above, and about two weeks ago I found one that pretty much perfectly does what is described ("DataGrab"), but it starts asking to buy credits after running it a few times.
My own list:
- DataGrab: Excellent, apart from asking to buy credits after a while
- SimpleScraper: Excellent, but asks to buy credits pretty much immediately
- Easy Scraper: Works well for single pages, but no possibility to feed in multiple URLs to crawl
- Instant Data Scraper: Works well for single pages and infinite scroll pages, but no possibility to feed in multiple URLs to crawl
- "Data Scraper - Easy Web Scraping" / dataminer.io: Doesn't work well
- Scrapy.org: Too much programming, but looks quite neat and well documented
Any suggestions are highly welcome!
Edit: A locally run executable or cmd-line based program would be fine too, as long as it just needs to be configured (e.g., creating a list of URLs stored in a .txt or .csv file) instead of coded (e.g., coding an infinite scroll function from scratch).
8 votes -
Robots.txt governed the behavior of web crawlers for over thirty years; AI vendors are ignoring it or proliferating too fast to block
41 votes -
The creators of TikTok caused my website to shut down
12 votes -
Mastodon's dubious crawler exemption
4 votes -
Spiders
Is anyone here familiar with crawling the web? I’m interested in broad crawling, rather than focusing on particular sites. I’d appreciate pretty much any information about how this is usually...
Is anyone here familiar with crawling the web? I’m interested in broad crawling, rather than focusing on particular sites. I’d appreciate pretty much any information about how this is usually done, and things to watch out for if attempting it.
10 votes -
XSS attacks on Googlebot allow search index manipulation
7 votes -
An analysis of Cloudflare's email address obfuscation
5 votes