10 votes

Topic deleted by author

14 comments

  1. [2]
    synergy-unsterile
    Link
    I don't see why Lynne is singling out the Archive Team for "consent" when scraping publicly available data or their refusal to remove any such data, especially when it's done for public interest...

    I don't see why Lynne is singling out the Archive Team for "consent" when scraping publicly available data or their refusal to remove any such data, especially when it's done for public interest purposes (to preserve history). There are already plenty of unknown actors (state intelligence, private intelligence, internet hoarders, etc...) who do not advertise their scraping activities. While it's unfair to have one's data stored indefinitely and used for unknown purposes, it's not like the Fediverse was ever going to guarantee the privacy of their data in the first place. Fundamentally, it should be assumed that every single bit of data made available through federation will become public, as Lain pointed out in their blog post, and this expectation should be extended to any platform on the internet.

    12 votes
    1. [2]
      Comment deleted by author
      Link Parent
      1. ssgjrie
        (edited )
        Link Parent
        This is an old problem. Before the internet, what we did and said could be printed on a newspaper and stored in archives. Now it's saved on server farms. It's true that just because it's public,...

        This is an old problem. Before the internet, what we did and said could be printed on a newspaper and stored in archives. Now it's saved on server farms.

        It's true that just because it's public, it doesn't mean that we want "our" content to be saved "forever". There's a big chance it will though, so act accordingly. I know that this comment is available for everyone to see and would never expect the Archive Team, Google, etc, to not cache/archive Tildes just because I posted a few comments on this site.

        I personally have a folder with some pages for offline access. One page is a tutorial from a forum that no longer exists and some of the comments were saved with it. Other is a reddit page with some Android commands to enable the hotspot feature on carriers that block this functionality... and yes, that page had comments when I saved it.

        The Archive Team will not store this data forever. As far as I know most stuff is uploaded to the Internet Archive, which is then able to hide what people don't want to be publicly available.

        The Archive Team is a pain in the arse, but they would never archive anything if they respected all complaints and robots.txt. And it's not only people asking them to delete dumb comments, sometimes it's people trying to censor what happened (EU's "right to be forgotten" is sometimes used for this). They archive, the Internet Archive controls what's available on their "wayback machine".

        Unless someone has an alternative (that can't be abused) to the current "screw you, we'll save everything", this is the best way to preserve the internet.

        9 votes
  2. [6]
    zaarn
    Link
    The problem is that so many websites shut down with little to no warning that the Archive Team has almost no choice but to scrape whatever they can without being able to check what it is. Places...

    The problem is that so many websites shut down with little to no warning that the Archive Team has almost no choice but to scrape whatever they can without being able to check what it is.

    Places like Sporum, Minecraft Forums or Tumblr shut down parts of their website or all of it with little warning, there is lots of content on them that is worth archiving but there isn't enough time to moderate all of it and find out what should be archived and what not.

    Ideally, website owners should submit the appropriate data to the archive on their own, then the team won't have to archive anything. Or atleast help them to not backup private data and only the relevant parts.

    Historical archiving is important and too much of the web is lost each day.

    9 votes
    1. [6]
      Comment deleted by author
      Link Parent
      1. [2]
        trobertson
        Link Parent
        The internet is public by default. You would expect people using the fediverse to understand that if they don't want their data out there, then they shouldn't put it out there in the first place....

        The internet is public by default. You would expect people using the fediverse to understand that if they don't want their data out there, then they shouldn't put it out there in the first place. It is them who put their data into the public domain, not the Archive team.

        6 votes
        1. [2]
          Comment deleted by author
          Link Parent
          1. ssgjrie
            Link Parent
            True, but you know that anyone can record you saying or doing something silly. You also know that you can't do much about it. I was a bit surprised to see the admin saying "yep, there was a lot of...

            Once you step outside, the world is also "public by default". But you still maintain basic expectations of privacy. Just because the internet is public doesn't mean all expectations for any privacy and basic decency (asking for permission) should fly out the window.

            True, but you know that anyone can record you saying or doing something silly. You also know that you can't do much about it.

            A reminder that we're talking about children. Children make mistakes and frankly often aren't developed enough to make decisions of actual consequence like this.

            I was a bit surprised to see the admin saying "yep, there was a lot of children here...". Laws are differ from place to place, but usually there are rules that prevent us to have kids as users. Snapchat, Instagram, etc, all have a "13 year old or over" rule for a reason.

            What about people who have had their personal information published by others, without their consent? Fuck them because they're an edge case?

            The Archive Team saves a copy of everything, then they upload the content to the Internet Archive (IA), which in turn makes it available through the WayBack Machine. Contact the IA and they'll hide the page and they'll do it.

            The post author seems to think that the Archive Team will host a copy of the site forever, which is not true. Their work is essentially to make full copies of everything that is available online. Permanent hosting is IA's responsibility.

      2. [3]
        zaarn
        Link Parent
        The amount of data is too much. When you're archiving thousands if not millions of documents a day, anonymizing the dataset is impossible without hundreds of volunteers working full time. And the...

        The amount of data is too much. When you're archiving thousands if not millions of documents a day, anonymizing the dataset is impossible without hundreds of volunteers working full time. And the urgency of some projects couples with that and makes it even harder.

        1. [3]
          Comment deleted by author
          Link Parent
          1. [2]
            zaarn
            Link Parent
            How do you filter out these conversations among a 1400TB dataset (Google+ in this case)? "They're the ones deciding to make that data permanent." is true to some extend but I think it's more...

            Then, simply said, maybe some conversations held between teenagers for a few weeks just isn't actually worth archiving.

            How do you filter out these conversations among a 1400TB dataset (Google+ in this case)?

            "They're the ones deciding to make that data permanent." is true to some extend but I think it's more important to archive historical events than not and on Google+ there is likely a lot of forgotten data that people cared about or that could be interesting to future generations.

            Personally I try to sort out personal data from datasets I obtain but on the flipside, I can't do that for all datasets. Even a 300GB dataset would far exceed what I can handle, yet I have larger archives on my disk (for example a collection of usenet and BBS texts that is largely no longer available anywhere)

            1. [2]
              Comment deleted by author
              Link Parent
              1. zaarn
                Link Parent
                Yes but again, how do you effectively filter this, blacklisted or whitelisted? Unless there is some magical AI that can differentiate between harmless teenager conversations and any data worth...

                Yes but again, how do you effectively filter this, blacklisted or whitelisted? Unless there is some magical AI that can differentiate between harmless teenager conversations and any data worth archiving for the next century.

  3. [3]
    unknown user
    Link
    No, it is exactly, totally, completely, inevitably, irrefutably the same. Or else, nothing is public, ever. To publish something is to put it out there for everybody to see. Something can't be...

    Making something public is not the same as consenting to having it stored for all eternity

    No, it is exactly, totally, completely, inevitably, irrefutably the same. Or else, nothing is public, ever. To publish something is to put it out there for everybody to see. Something can't be partially public. If something is published, people will do whatever they want with it, unless they are caught doing so AND what they are doing is illegal, and even then it will essentially remain public because there is no way of wiping it from the entire world full of devices, backups, caches, archives, and whatnot. It will remain in hard disks as unassociated data waiting to be discovered. It'll be sitting on a pile of computer parts somewhere in Nigeria. It will have been seen and will sit in people's minds. This is the phenomenon as it is, not a social construct around it. You can't have something appear on other people's screens (including bots) and be sure it is not replicated and stored. A robots.txt can not be more than a kind request. If a URI is accessible without authentication, the data is public. There is no way to enforce the inverse. There may not be any grey areas between private and public, for that is not possible. You can't be forgotten, as it is up to other people, and even not really up to them in many cases (you can't deliberately forget things, and really deleting stuff on computers is hard). You may build legal frameworks around it, but it will serve nothing. You will always run the very likely danger of being archived. And law can only help when those archives are published and you encounter them and you sue and you're lucky enough to win. Even then, there is no possible way to ensure that the data is got rid of forever.

    What I'm trying to convey is that, well, there may be a right to have public stuff be half-private---which I don't think there is, but let's brush my opinion aside for this---, but there is no way to enforce it. Which means, there is no reliable way in which this can be guaranteed, which in turn means that this is futile. If you want stuff to be private, keep it private. If you will publish stuff, think well, twice, thrice. In the olden days we did not have this problem because it was a hassle to publish stuff, but now if you sit wrong your phone in your arse pocket will publish stuff for you. And most of the blame is on apps that default to publicly-viewable posts. They should default to private instead.

    edit: I want to also add: this sort of suggestion is placing people in the peril of security/privacy theatre, which is really dangerous. Just like you can't un-eat things, you can't unsee things or make people unsee things.

    9 votes
    1. [3]
      Comment deleted by author
      Link Parent
      1. unknown user
        Link Parent
        So you one up my hyperbole with hypocrisy. I really don't like doing this, but well, if putting words in my mouth is not immoral for you, what can I do... Being mugged and publishing something is...

        So you one up my hyperbole with hypocrisy. I really don't like doing this, but well, if putting words in my mouth is not immoral for you, what can I do...

        Being mugged and publishing something is completely different. Walking home alone is not consenting to it at all. Educating the society, lighthing streets and persecuting crime will render the streets safer for everyone. That second blockquote is totally your words, I did not say any of it, and I'd be glad if you made it not be a blockquote so that it does not look like I said it. Or else I'll have to ask mods' help. I don't think you're doing it out of bad will, but it looks like those are my words.

        With publishing, everything changes. Making something public means making that something public. Telling people that they can trust that it is possible to enforce public things to not be recorded, also most possibly permanently, and possibly in a distributed manner, is putting them in peril, because preventing such recording and remembering of public stuff is impossible. Maybe in some instances you may succeed, but even then, how do you know? And I don't say it should be laissez faire, but it is not reasonable to depend on enforcing some sort of right to be forgotten. It is like shooting a fully loaded gun into your temple, and hoping that your right to life save you; you'll at least get severely injured, and your right to life will not be able to revert anything. It is the nature of things: it is really really hard to survive unaffected with an exploded skull and brain. And the advice here is not much different than telling people "hey, there is no harm in shooting that gun into your temple, you have the right to live!"

        Rape, harassment, crime, diseases... these things are preventable, and hopefully will be prevented. I am one of those who aim to minimise these things, with my talking in my circles, educating my friends and my family, commenting around the internet, and whatever else is in my power as a random guy at a random place that knows him some good things thanks to good people that wrote books or blogposts or comments in forums like this one. But this public/private divide is completely different in nature, and advice that things can be make private again after having been publicised is dangerous, maleficient, perilous, hazardous, and whatever scary words you can come up with. And as a person with good know-how when it comes to computers and internet, I will never give that advice to anyone. I'm sure many people who wanted to make their SO a sexy surprise video/photo and ended up seeing themselves on some porn website will agree me.

        There is also a scale problem here: on a street in the night, there are at most a few thousand people out there. With internet, the entire blinking world is out there to get you (well, not really, but public stuff is visible to the entire world, and something illegal here is totally legal there, and vice versa).

        How about you don't throw strawmans at me, talk about some solutions to salvage data unintentionally made public, for sure? I did read the comment you linked before you posted. It has no suggestions for a solution to that. You tell us that these parties should be lawful, obey the laws in some particular geographical area, and act in good will. Please tell me (1) how any of that can be ensured, and (2) how can you defend against bad actors. And don't forget anything public is public worldwide on the internet. Also keep in mind that the failure of your solutions might mean irreparable damage to some people's lives who will depend on them.

        6 votes
      2. cge
        Link Parent
        Without wishing to gain a reputation of being Cato severus qui venit in theatrum, I do have to express some dismay at this. Is such a tone necessary in what otherwise seems a reasonable conversation?

        I do love me some good old-fashioned hyperbole.

        Without wishing to gain a reputation of being Cato severus qui venit in theatrum, I do have to express some dismay at this. Is such a tone necessary in what otherwise seems a reasonable conversation?

        2 votes
  4. kfwyre
    Link
    Genuine question here: what is there to be gained by archiving this instance? I say this not to be patronizing in the slightest, and this isn't a dig at the users of the instance nor the archivers...

    Genuine question here: what is there to be gained by archiving this instance?

    I say this not to be patronizing in the slightest, and this isn't a dig at the users of the instance nor the archivers themselves. I guess it's more a question about what merits archiving in the first place. I know there are people out there who believe that we should save everything simply because we can, but outside of a sort of loss aversion, what positive or constructive usecase will anyone feasibly have with this data?

    I can think of a number of negative and destructive ones. Bad actors can plumb it for identifying information or ammunition for doxxing. I can also picture any number of companies who traffic in user data tossing this into their already robust data sets (though, to be honest, the major ones probably already had it anyway).

    I probably have a bit of a blind spot in this area, and I'm also probably a bit biased in that I deleted my own Mastodon account just this week, but I guess I don't understand why we would need to save the instance and its posts in the first place. The people on it who wanted to save their data probably already did (as I did with mine), and I can't see the data at large being terribly meaningful or important to someone that wasn't on the instance (just like I can't imagine what a stranger would want to do with my posts).

    4 votes
  5. cge
    Link
    To approach this from some different perspectives (and I should note that this is all my speculation, as I have not done any research on the matter). I think there is a question here of how...

    To approach this from some different perspectives (and I should note that this is all my speculation, as I have not done any research on the matter).

    I think there is a question here of how technology, and the increasing prevalence of digital interactions, changes the availability of resources for historical research on the lives of the general population.

    With physical artifacts, while there are differences in the probabilities of different forms of texts surviving, we end up with some subset of quotidian texts preserved by chance, while many are lost. Some receipts, some private journals, some letters, some notes, will survive, often with little thought to preservation: enough, hopefully, to gain an understanding of an era and culture. Thus, for example, we have the graffiti of Pompeii, cuneiform receipts, and so on, or miscellaneous letters that have survived through a few generations.

    With many digital forms of expression, however, this chance of survival without regard to intentional preservation seems often to be replaced with something very different: more of a binary choice between everything being preserved, through comparatively minimal but conscious efforts of archiving, or everything being lost. Wanting to gain consent for this archiving would limit its usefulness as a record of culture at the time. Even having people opt out of the archiving would. Both would present a sanitized view that would present peoples and cultures only as they wished to be remembered. Yet we don't have the chance-based preservation to temper this record, and keep it from being overly invasive. Neither do we have the difficulty of access and searching that would make many problematic uses infeasible for physical records.

    There is, perhaps, a balance to be struck here between individual privacy and the pursuit of knowledge. One option that could provide a compromise, but would require very long-term planning and stable infrastructure and governance, would be to archive without regard to consent, but, absent opt-in consent, make material accessible only after long periods of time, perhaps 150 or 300 years, enough to ensure that the material cannot be used to directly invade the privacy of living people.

    This is, to some extent, what is often done with authors and others in the public eye, regardless of their wishes, where there are extant remnants of their communications and lives that are sufficiently sought after or cared for to exist in such detail. I am reminded of the recent TLS article on the continuing publication of TS Eliot's letters, something that the article points out was very much against his wishes, at least as they were expressed in 1938 (though a later letter to the editor points out some complexities here of consent and coherence): the TLS, for some odd reason, seems to contain a multitude of titles for the same article, and I would point to the "No posthumous privacy" title being quite apt. We often value these posthumous perspectives, though invasive, and I do think that there would be something lost without them.

    4 votes
  6. vakieh
    Link
    I have what I once thought to be a minority opinion but increasingly find it echoed by others. I don't believe the person posting info publicly owns that info anymore. Once you put something into...

    I have what I once thought to be a minority opinion but increasingly find it echoed by others.

    I don't believe the person posting info publicly owns that info anymore.

    Once you put something into a public space, then that becomes something that the audience you posted it to has a claim to. The person who posted it still has rights under this understanding of mine, right of attribution, right to commercialisation, to an extent rights over derivatives. But to delete? Fuck no on a fuck no sandwich.

    I hit this fairly regularly with my behaviour around certain game modding scenes. I do not believe in the 'right' of a mod developer to a) remove their mod from public access, or b) restrict subsequent modifications to that mod. So I quite often rehost these where ones I have are removed. I am a rather polarised figure in some of these communities as a result - certain mod creators don't like me at ALL, people who developed 'mods of mods' before finding their dependency removed and people who like playing removed mods like me a lot.

    I have also hit it in my own work. Left pad impacted the company I worked for at the time, and in several cases there have been free but not properly FLOSS softwares that have stopped being updated where we had the physical but not legal ability to keep them updated and useful to us (and others!). So in at least 2 cases we just did it anyway and covered our tracks.

    This applies to things that aren't code just as much - my experience is with bug trackers and development forums that close down, especially of the mailing list variety. You'd better believe they get archived, and if an author asked to have their posts removed they would be flat out laughed at (never happened that I know of).

    If you haven't already, you should read The Cathedral and the Bazaar. It's focused on the open source software models, but it can be applied the same way to public discussion. The public OWNS the public space. Just as you can be filmed on the street without the expectation of privacy but not in your home, if you post on a public site you don't and shouldn't own jack.