How do you feel about social media archiving tools such as Pushshift?
On and off throughout the years, I have attempted to make my online footprint as small as possible, taking steps such as:
- using pseudonyms on social media
- creating a new account every year or so
- overwriting old posts with a new message blanking out my original post
- "deleting" posts after a few days if the account has a higher probability to be tied to my real life
The last point, I put quotations around deleted because I understand that once I post something, it is not ever really deleted but it adds a barrier of entry to trying to dig into my personal life. Pushshift comes up because, try as I might, I seem to have difficulty getting accounts removed from their searches. Additionally, I think they allow you to download reddit data in bulk so even if I were able to get my name removed from the search results, the data could still exist on someone's hard drive, somewhere.
From your perspective, are services like Pushshift, that archive people's information without their explicit knowledge, ethical? On the one hand, I think of detestable content that users might post then delete later to avoid accountability. On the other hand, I think of people like me who want to keep their data footprint as small as possible because of the crazies who might utilize this information to do harm.
Everything is being trawled and archived.
The only difference with the public archives you can see, is that they're also available to you (and anyone else, not just paying clients of archival service companies).
If someone wants a small online footprint, they have to participate only on closed platforms where content is hidden behind a robust login. Those areas are less trawled, but it's no guarantee.
The greatest value Pushshift and others have, is that they make more people aware of the fact that they cannot delete things once they put it out on the web: Almost anything you post could potentially be attributed back to you at some point.
This means we all have to think carefully before we submit anything online, and have this type of talk so others don't share information about us that exists publicly forevermore.
I know the guy that runs Pushshift. He's an interesting fellow and we've had some discussions about how his tool can be both used and misused. There are definitely both pros and cons to it, such as enabling research projects as well as harassment campaigns. It's been a minute since I've talked to him about it though.
The strange thing is that in the grand scheme of surveillance capitalism, Pushshift is a very small operation compared to what tech companies and governments do. But because it's much more open to the public it creates a unique threat. I don't trust facebook with my data, but in practical terms I don't have a choice and was never given one. If you know enough about the internet, you can hide large parts of your online presence, but that gets harder every year, takes extreme discipline and takes a great deal of specialized knowledge that can only really be acquired by going on the internet in the first place. But even then facebook is less likely to be actively malicious than some reactionary asshole looking for someone to dogpile on.
It's complicated and the internet as it exists today is basically a nightmare if you value privacy. But that's the result of billions of tiny decisions "we've" made over the last ~40 years. I certainly hope more efforts are made to roll back pervasive surveillance. But at least in my opinion Pushshift isn't a primary offender, it only archives a portion of what's publicly posted on specific sites, rather than the deeply creepy shit that other organizations do.
Edit: I'll put it this way. If I decided I wanted to evade Pushshift's tracking it would be relatively simple. If I wanted to evade Facebook's tracking, time travelling and stopping my younger self from making an account wouldn't be sufficient to stop them from tracking me and building a profile on me.
A bit offtopic, but it's worth noting that you can request to have your account data removed from PushShift, and excluded from their API: PushShift Online Removal Request form
You can also make a similar request of Internet Archive's WayBack Machine service as well. If you control a domain, you can email them requesting that they exclude it from their services. See: FAQ (CTRL-F "remove")
Google also offers similar exclusion mechanisms for their web cache service through Google Search Console.
And most other reputable (<-keyword) search engine/cache/archival services offer similar exclusion mechanisms as well. You often just have to go hunting for them, or contact them directly to inquire about it.
So there actually are some practical steps you can take to eliminate/minimize your digital footprint, if you're willing to put the work in to do it. But overall, I still agree with the majority of people here that you realistically shouldn't expect much privacy when you're using a public facing website. At some point someone will cache/archive what you have written there, and there isn't always going to be a way to request those records be deleted, since not every service is run by people who care about ethics, privacy, or even legality.
At least on a practical level I think there's nothing that can be done about it. The internet is written in ink just by how it works. Every client, be that computer, phone, or bot is entitled to do whatever they want with the html document a server sends them, just as a recipient of snail mail is. No one is obligated to redact information in private archives, and only through mechanisms like copyright is there issue with public archives.
On a purely ethical level, I can see both angles, but I would lean towards the user being ultimately responsible for what they put out there, even if users are not well trained currently for that responsibility.
In my opinion, you are repsonsible for anything and everything you publish on a public platform. And that means that if you do something bad, you should be held accountable for it. With that being said, I think social media archival is not only ethically acceptable, it's an important tool to hold people accountable for their actions.
As far as privacy goes, I consider that to be mostly your responsibility as well. There are some aspects that the platform should protect, such as what IP address I use to access them, my usage data, and a handful of explicitly private information (which ALWAYS includes passwords, and sometimes may also include things like addresses and real names). But the things you post for others to view are all your responsibility. You can't blame a company for the people harassing you if you are the one who gave them the ability to do so.
I have respect for the idea that we should have the ability to delete our public posts, but at the same time I think it's not a good thing to do. Public conversations have worth. By deleting your comments, you are defacing a public resource, and that's something that I think can be argued as being unethical. It's better if you took the time to consider what value your comment might have before you deleted it, but if you went on a campaign to remove all of your comments en masse it feels like a much bigger problem.
In any case, I find that third-party archiving is far less problematic than some of the terms that people are already blindly agreeing to when they join these social media platforms. Every time you sign up you agree to give them all the benefits of the content you generate without the liablity. Some of them can be fairly extensive; the facebook Terms of Use document links to 12 other documents that you may have inadvertently agreed to, and some may limit your ability to legal actions against them.