-
58 votes
-
The great LLM scrape
24 votes -
Pay up or stop scraping: Cloudflare program charges bots for each crawl
46 votes -
As consumers switch from Google Search to ChatGPT, a new kind of bot is scraping data for AI
28 votes -
Anubis works
35 votes -
Please stop externalizing your costs directly into my face
121 votes -
FOSS infrastructure is under attack by AI companies
39 votes -
LLM crawlers continue to DDoS SourceHut
11 votes -
Nepenthes: a tarpit intended to catch AI web crawlers
33 votes -
Websites are blocking the wrong AI scrapers (because AI companies keep making new ones)
18 votes -
Chrome/Firefox Plugin to locally scrape data from multiple URLs
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it: A free chrome or firefox...
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it:
- A free chrome or firefox plugin
- Local scraping: it runs in the browser itself. No cloud computing or "credits" required to run
- Scrape data: Collects predefined data from certain data fields within a website such as https://www.dastelefonbuch.de/Suche/Test
- Infinite scroll: to load data that only loads once the browser scrolls down (kind of like in the page I linked above)
I am not looking into programming my own scraper using python or anything similar. I have found plugins that "kind of" do what I am describing above, and about two weeks ago I found one that pretty much perfectly does what is described ("DataGrab"), but it starts asking to buy credits after running it a few times.
My own list:
- DataGrab: Excellent, apart from asking to buy credits after a while
- SimpleScraper: Excellent, but asks to buy credits pretty much immediately
- Easy Scraper: Works well for single pages, but no possibility to feed in multiple URLs to crawl
- Instant Data Scraper: Works well for single pages and infinite scroll pages, but no possibility to feed in multiple URLs to crawl
- "Data Scraper - Easy Web Scraping" / dataminer.io: Doesn't work well
- Scrapy.org: Too much programming, but looks quite neat and well documented
Any suggestions are highly welcome!
Edit: A locally run executable or cmd-line based program would be fine too, as long as it just needs to be configured (e.g., creating a list of URLs stored in a .txt or .csv file) instead of coded (e.g., coding an infinite scroll function from scratch).
8 votes -
Web scraping for me, but not for thee
19 votes -
Report: Potential New York Times lawsuit could force OpenAI to wipe ChatGPT and start over
75 votes -
‘Not for machines to harvest’: Data revolts break out against AI
40 votes -
The shady world of Brave selling copyrighted data for AI training
59 votes -
Google updates its privacy policy to clarify it can use public data for training AI models
44 votes -
Web scraping doesn’t violate anti-hacking law, appeals court rules
12 votes