-
18 votes
-
Looking for help scraping and deleting a Reddit account
I have a couple of old Reddit accounts I’d like to delete as fully as possible. However one of them dates back to my teenage years and it’s some of the only writings I have from that time. Any...
I have a couple of old Reddit accounts I’d like to delete as fully as possible. However one of them dates back to my teenage years and it’s some of the only writings I have from that time. Any recommendations on good simple ways to scrape all the comments off of it and save them? Then what’s the best way to completely erase a Reddit footprint these days?
Looking for as simple a solution as possible, I’m not tech illiterate by any means but it’s also not a real strong suit for me.
18 votes -
Chrome/Firefox Plugin to locally scrape data from multiple URLs
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it: A free chrome or firefox...
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it:
- A free chrome or firefox plugin
- Local scraping: it runs in the browser itself. No cloud computing or "credits" required to run
- Scrape data: Collects predefined data from certain data fields within a website such as https://www.dastelefonbuch.de/Suche/Test
- Infinite scroll: to load data that only loads once the browser scrolls down (kind of like in the page I linked above)
I am not looking into programming my own scraper using python or anything similar. I have found plugins that "kind of" do what I am describing above, and about two weeks ago I found one that pretty much perfectly does what is described ("DataGrab"), but it starts asking to buy credits after running it a few times.
My own list:
- DataGrab: Excellent, apart from asking to buy credits after a while
- SimpleScraper: Excellent, but asks to buy credits pretty much immediately
- Easy Scraper: Works well for single pages, but no possibility to feed in multiple URLs to crawl
- Instant Data Scraper: Works well for single pages and infinite scroll pages, but no possibility to feed in multiple URLs to crawl
- "Data Scraper - Easy Web Scraping" / dataminer.io: Doesn't work well
- Scrapy.org: Too much programming, but looks quite neat and well documented
Any suggestions are highly welcome!
Edit: A locally run executable or cmd-line based program would be fine too, as long as it just needs to be configured (e.g., creating a list of URLs stored in a .txt or .csv file) instead of coded (e.g., coding an infinite scroll function from scratch).
8 votes -
Mastodon's dubious crawler exemption
4 votes -
Web scraping doesn’t violate anti-hacking law, appeals court rules
12 votes -
Google open-sources their robots.txt parser and releases an RFC for formalizing the Robots Exclusion Protocol specification
10 votes -
Is it OK to scrape Tildes?
I wanted to keep the title---and the question, for that matter---generic, but my use case is that I want to make a backup of my posts on Tildes, and I'd fancy automating that with a script that...
I wanted to keep the title---and the question, for that matter---generic, but my use case is that I want to make a backup of my posts on Tildes, and I'd fancy automating that with a script that curls up my user page and downloads fresh stuff from there periodically. So for my personal case, the question is that is this allowed / welcome practice?
The generic question is that is it welcome to scrape Tildes' public pages, in general?
19 votes -
Uber, statistics, and a chrome extension
5 votes -
Ryanair, Berlin, and Hamiltonian cycles - finding a travel route using graph theory
8 votes