51 votes

Google seems to be running OCR on photos in my Gmail. Is this happening to you too?

This morning I was asked to find an archived email with photos of some scientific equipment. I searched "Powerlab," the name of one of the instruments, in gmail, and the email came right up. Great! But then I noticed that the word "powerlab" never appeared in the text of the email. I tried searching "ML206", an arbitrary character string from one of the photos in the email, and again, the email appeared in the search, without the search phrase highlighted in the search result, as it normally would be. I tried different phrases from jpgs in emails; not all yielded search results but some did.

I'm not happy about this. I accept some compromises to privacy when using Gmail, but sending text as an image can be a way of specifically avoiding information being harvested. All I ask for is a way to turn it off.

Can anyone replicate this? Did anyone already know about this?

85 comments

  1. [57]
    takeda
    Link
    It does this. It also uses AI to categorize pictures, for example if you search dog or will show pictures with a dog. I think it also asks you to assign names to faces and then automatically...

    It does this. It also uses AI to categorize pictures, for example if you search dog or will show pictures with a dog.

    I think it also asks you to assign names to faces and then automatically recognized them.

    A lot of people say it is super convenient and helps with searching, but I also think it is creepy.

    46 votes
    1. [29]
      Aerio
      Link Parent
      It should be an opt-in feature.

      It should be an opt-in feature.

      29 votes
      1. [2]
        Matthias720
        Link Parent
        Agreed. But knowing Google, they'd likely do it anyway, without consent.

        Agreed. But knowing Google, they'd likely do it anyway, without consent.

        29 votes
        1. kru
          Link Parent
          Nah, you proffered consent when you clicked a button marked "agree" next to a 500 page wall of text that only a tiny handful of people ever read.

          Nah, you proffered consent when you clicked a button marked "agree" next to a 500 page wall of text that only a tiny handful of people ever read.

          27 votes
      2. [14]
        Adys
        Link Parent
        Why? "Privacy" is the generic answer, but people tend not to know what that actually means. If this is your answer, what are the implications, and how does it differ from /not/ doing it and just...

        Why?

        "Privacy" is the generic answer, but people tend not to know what that actually means. If this is your answer, what are the implications, and how does it differ from /not/ doing it and just storing the photos etc as unencrypted blobs?

        10 votes
        1. [13]
          Aerio
          Link Parent
          Why? Because privacy is a fundamental right that users should have control over. By offering users the choice to enable this feature, we empower them to exercise control over their personal data....

          Why? Because privacy is a fundamental right that users should have control over.

          By offering users the choice to enable this feature, we empower them to exercise control over their personal data. It goes beyond simple encryption of stored photos and recognizes the broader implications. OCR technology raises concerns about the potential scrutiny of personal communications and the risk of misinterpretations. Opting in ensures transparency and informed consent, establishing a clear boundary that respects individuals' privacy. It strikes a balance between convenience and privacy, allowing users to evaluate the potential benefits of the feature while maintaining control over their data.

          12 votes
          1. Adys
            Link Parent
            I am pro privacy, you missed my point. How does face recognition inside your photos and giving you access to them damage privacy any more than simply hosting your photos? Especially when you have...

            I am pro privacy, you missed my point. How does face recognition inside your photos and giving you access to them damage privacy any more than simply hosting your photos? Especially when you have the capability either way and it’s just a toggle.

            I don’t think people understand that it really doesn’t change anything. The only argument you can make is “what if there’s a data leak” but… let’s be real, this is google we are talking about and security is the one thing they cannot be faulted on.

            10 votes
          2. starchturrets
            Link Parent
            Respect privacy? If it weren’t enabled, Google would still have your unencrypted photos anyways. I don’t see how it would change anything.

            Respect privacy? If it weren’t enabled, Google would still have your unencrypted photos anyways. I don’t see how it would change anything.

            7 votes
          3. [9]
            fantom1979
            Link Parent
            I just think this is the new normal. I am sure there were people that were very upset when their name, address, and phone number were printed in a book that was mailed out to the entire city, and...

            I just think this is the new normal. I am sure there were people that were very upset when their name, address, and phone number were printed in a book that was mailed out to the entire city, and the white pages were also opt out, but at some point that invasion of privacy just became normal for people.

            Of course it sucks, but I honestly don't see a law coming anytime soon (in the US anyways), and the tech companies aren't going to restrict themselves.

            3 votes
            1. [8]
              Aerio
              Link Parent
              That's not equivalent at all. This would be the same as the post office opening and reading all your mail, so that you could have the convenience of "just asking them".

              That's not equivalent at all. This would be the same as the post office opening and reading all your mail, so that you could have the convenience of "just asking them".

              4 votes
              1. [7]
                Adys
                Link Parent
                The post office in your comparison is a data processor which passes the data around without ever having access to it. It’s nothing like google photos. Google on the other hand HOSTS this data. For...

                The post office in your comparison is a data processor which passes the data around without ever having access to it. It’s nothing like google photos.

                Google on the other hand HOSTS this data. For your comparison to make sense, you’d have to compare someone who opens your mail, reads it, and keeps it in a room for you, vs someone who does all this AND can tell you which ones are from Bob.

                4 votes
                1. [6]
                  itdepends
                  Link Parent
                  So it would be like a bank opening your safe deposit box, reading through your documents and categorizing everything in a database.

                  So it would be like a bank opening your safe deposit box, reading through your documents and categorizing everything in a database.

                  1 vote
                  1. [5]
                    Adys
                    Link Parent
                    No, because once again there's legal protections around safe deposit boxes, including even in the contract you sign with the bank. This whole line of thinking presupposes that Google cannot read...

                    No, because once again there's legal protections around safe deposit boxes, including even in the contract you sign with the bank.

                    This whole line of thinking presupposes that Google cannot read or touch the photos in the first place if they don't do face/object recognition but… what exactly do you think google photos does? They read, recompress, host, duplicate, convert, resize, they even occasionally edit and auto-generate albums.

                    You want to make a comparison, make one with your accountant: They know everything in the first place, they can do a bit of extra work to search into the files.

                    6 votes
                    1. mat
                      Link Parent
                      An even better comparison would be with accounting software. No person at Google knows anything about anyone or their photos. Some machines do, but that's different. I would even suggest that...

                      An even better comparison would be with accounting software. No person at Google knows anything about anyone or their photos. Some machines do, but that's different. I would even suggest that machines can't meaningfully violate privacy, that sort of has to be a human. Otherwise every email provider up to and including a self-hosted system isn't private. https isn't private. Nothing mediated by machines is.

                      2 votes
                    2. [3]
                      itdepends
                      Link Parent
                      Well in all fairness discussion of legality and morality are different issues, I was talking about the extent of the invasion of privacy. On the legal aspect I'm sure Google's legal team has...

                      Well in all fairness discussion of legality and morality are different issues, I was talking about the extent of the invasion of privacy. On the legal aspect I'm sure Google's legal team has ensured we've all agreed to everything the might possibly try to do via the Terms and Conditions so there's not much to be said there apart from "it's legal so, eh".

                      On the issue of invasion of privacy I'd say all the operations you described are different to pulling text out of your pictures and using that. Besides the benign (easier search) are you really comfortable knowing that Google might be adding data pulled from say, invoices you send or receive to your digital profile? Major partners and vendors? Scraping pictures for text to automatically add location information? Locating product brand names to target their ads better? Knowing that everything Google touches immediately becomes digitized, categorized, searchable and cross-referenceable?

                      Again, I'm sure this is completely legal but it is certainly way beyond the scope of what privacy the users expect to give up.

                      1. [2]
                        Adys
                        Link Parent
                        I wasn’t talking about legality, but it’s not possible to ignore it either. The two are deeply intertwined. For example: What are those invoices doing in google photos? It’s not an appropriate...

                        I wasn’t talking about legality, but it’s not possible to ignore it either. The two are deeply intertwined. For example:

                        are you really comfortable knowing that Google might be adding data pulled from say, invoices you send or receive to your digital profile?

                        What are those invoices doing in google photos? It’s not an appropriate place for them in the first place and has different implications from Google Drive where they would belong.

                        Now, did you know Google Drive does the same thing? It runs OCR on most documents. BUT. The legal guarantees behind it are wiiiildly different. So while I wouldn’t be comfortable with my accountant putting my invoices in Google Photos, I would be comfortable with him putting them in Google Drive. And as you can see it has nothing to do with the ocr itself.

                        beyond the scope of what privacy the users expect to give up.

                        So is the issue around expectations then? I suspect most people just don’t realise that all this data is there in the first place and they’re surprised when it shows up. It’s not so much that Google does it, but it’s that Google is the one telling them “did you know all this info is in your photos?”, in a similar way as I can post a nighttime photo of me on the internet and people can instantly localise me based on the constellations. And the first they see it happen, people are like “huh, didn’t even know that was possible”.

                        I maintain my position that none of what google is doing has privacy implications any different from the data just living there in the first place. I’m not really sure how to explain it more clearly, but I feel like the burden of proof is on you to bring me a good reason why the violation of privacy is bigger than the mere cleartext storage of your photos.

                        1. itdepends
                          Link Parent
                          I think I might have gotten confused, the initial user ws discussing Gmail and my reasoning is that if they run OCR on photos there is no reason to assume they don't run OCR on .pdfs which is one...

                          What are those invoices doing in google photos?

                          I think I might have gotten confused, the initial user ws discussing Gmail and my reasoning is that if they run OCR on photos there is no reason to assume they don't run OCR on .pdfs which is one of the most common formats for invoices. It is perfectly reasonable to expect .pdfs to be in Gmail, tons of small companies, freelancers etc use it, not to mention exchanging family info, who paid what for which invoice etc.

                          Please correct me if I'm wrong on the above. If not then again it's not unreasonable to think that the average user does not expect that gmail will be scanning, OCRing and archiving all the info on their documents. Sure, it might be legal and we can trot out the old "it was in the license agreement" but then we're still discussing legality.

                          So let's agree that it's legal. There's still the discussion of whether it is moral or downright creepy and imho it's terrible, super-shady and will drive me to change my habits.

                          I maintain my position that none of what google is doing has privacy implications any different from the data just living there in the first place. I’m not really sure how to explain it more clearly, but I feel like the burden of proof is on you to bring me a good reason why the violation of privacy is bigger than the mere cleartext storage of your photos.

                          I admit I cannot produce a detailed report based on technological findings but I feel that my assumptions for concluding it's over the line are not crazy.

                          Google is in the business of building profiles to better direct ads.

                          Google is running OCR on photos

                          I assume they are running OCT on pdfs.

                          The results of the OCR are indexed in a database

                          I assume they use that OCR for enriching their profiles

                          Google has that level of knowledge about me

                          at which point I'm left to choose between

                          I trust that Google will not abuse that capability and keep my data as secure as if it never existed.

                          or

                          I do not trust that Google will not abuse that capability and keep my data as secure as if it never existed.

          4. [2]
            Comment deleted by author
            Link Parent
            1. Aerio
              Link Parent
              As you say, even more reason for it to be opt-in People seem to think making it opt-in means burying it in some obscure settings screen somewhere. They could easily put it front and center, even...

              The average person doesn't understand the broader implications of this technology.

              As you say, even more reason for it to be opt-in

              The problem is, if you made this opt-in the average person would be less likely to enable it

              People seem to think making it opt-in means burying it in some obscure settings screen somewhere. They could easily put it front and center, even in a popup "New feature! Enable now! Much convenience"

              This seems to be a cultural divide. I'm European, and privacy is the default here.

              I find this commonplace fatalistic acceptance of the invasion of our privacy sad and worrying.

              2 votes
      3. [7]
        Octofox
        Link Parent
        Use another mail provider if you want extreme privacy. I want my email to just work out of the box. I don’t want to spend an hour configuring my setup and syncing dot files in a git repo. It...

        Use another mail provider if you want extreme privacy. I want my email to just work out of the box. I don’t want to spend an hour configuring my setup and syncing dot files in a git repo. It should just work.

        7 votes
        1. [6]
          Aerio
          Link Parent
          It would "just work". Opting in to a feature like this would be as easy as clicking a checkbox, don't be obtuse.

          It would "just work". Opting in to a feature like this would be as easy as clicking a checkbox, don't be obtuse.

          3 votes
          1. [4]
            Octofox
            Link Parent
            I don't want to click a checkbox. It's also not just a checkbox, its one of hundreds of check boxes buried in a settings menu I won't look at. If I wanted to configure trivial implementation...

            I don't want to click a checkbox. It's also not just a checkbox, its one of hundreds of check boxes buried in a settings menu I won't look at. If I wanted to configure trivial implementation details, I'd host my own email where I can configure everything. Why on earth would want to configure OCR scanning? The search box should just give me what I want, with whatever technology is available to do that best.

            7 votes
            1. [3]
              Aerio
              Link Parent
              That's an incredibly silly take, imo. Just sign away all your right to privacy so you don't have to go through the extreme trouble of clicking a single checkbox. Do you also want the post office...

              That's an incredibly silly take, imo. Just sign away all your right to privacy so you don't have to go through the extreme trouble of clicking a single checkbox.

              Do you also want the post office to just open all your mail and read it for you?

              2 votes
              1. [2]
                Octofox
                Link Parent
                Gmail is not a privacy platform. It’s a general consumer platform built for convenience. If OCR scanning images is too much for you, you should get on a self hosted or privacy focused email host....

                Gmail is not a privacy platform. It’s a general consumer platform built for convenience. If OCR scanning images is too much for you, you should get on a self hosted or privacy focused email host.

                For me, I am perfectly happy for Gmail to scan attachments. The only things in my email are invoices and random utility emails. All I care about is that I can easily find what I am looking for and not have to spend any extra thought managing email.

                5 votes
                1. Aerio
                  Link Parent
                  Bullshit, they pretend they care about privacy: https://support.google.com/mail/answer/10434152?hl=en And Gmail terms do not mention OCR as far as I can see. The irony of clicking a checkbox being...

                  Gmail is not a privacy platform. It’s a general consumer platform built for convenience.

                  Bullshit, they pretend they care about privacy: https://support.google.com/mail/answer/10434152?hl=en
                  And Gmail terms do not mention OCR as far as I can see.

                  If OCR scanning images is too much for you, you should get on a self hosted or privacy focused email host.

                  The irony of clicking a checkbox being too much for you, but I should self host my email.

                  For me, I am perfectly happy for Gmail to scan attachments. The only things in my email are invoices and random utility emails. All I care about is that I can easily find what I am looking for and not have to spend any extra thought managing email.

                  Good for you. Sounds like you'd be the perfect candidate to opt-in and forget about it forever.

                  This seems to be a cultural divide, I'm from the EU and privacy is expected to be the default. I find this laisses faire attitude and acceptance of the invasion of our privacy off-putting and worrying.

                  2 votes
          2. Carighan
            Link Parent
            Well that's how it works. You clicked a checkbox when you signed up for GMail, didn't you? The text above it even explained the opt-in. And it was definitely opt-in, you had to manually sign up...

            Well that's how it works. You clicked a checkbox when you signed up for GMail, didn't you? The text above it even explained the opt-in. And it was definitely opt-in, you had to manually sign up for GMail.

            2 votes
      4. [4]
        PuddleOfKittens
        Link Parent
        "Opt-in" means 90% of people won't use it, because you opt-in by going to the settings and most people don't even think about touching the settings. I'm not necessarily disagreeing with you here,...

        "Opt-in" means 90% of people won't use it, because you opt-in by going to the settings and most people don't even think about touching the settings.

        I'm not necessarily disagreeing with you here, but I'm not surprised that devs don't like making things opt-in.

        4 votes
        1. Octofox
          Link Parent
          And then those users all move to the other platform that has everything turned on by default.

          And then those users all move to the other platform that has everything turned on by default.

          3 votes
        2. CptBluebear
          Link Parent
          I just need to remember to go to the privacy settings once in a while. I do often disable options like that but it happens that new options get added that are automatically opted in but not...

          I just need to remember to go to the privacy settings once in a while. I do often disable options like that but it happens that new options get added that are automatically opted in but not clearly communicated they've been added.

          1 vote
        3. pete_the_paper_boat
          Link Parent
          There's plenty of apps (google included) where you go through basic first set up. A couple of sliders that are enabled by default isn't going to be a problem.

          There's plenty of apps (google included) where you go through basic first set up.

          A couple of sliders that are enabled by default isn't going to be a problem.

      5. Fortner
        Link Parent
        I don't trust Google enough to think that if I opted out of anything with them, that they would actually "turn it off". I've read too many articles about services that users have disabled that are...

        I don't trust Google enough to think that if I opted out of anything with them, that they would actually "turn it off". I've read too many articles about services that users have disabled that are still running, but just hidden from the user.

        1 vote
    2. [13]
      cfabbro
      (edited )
      Link Parent
      Do you actually have any proof of this, or are you just speculating/relaying something you've been told by others? I've never encountered the behavior OP describes, or been asked to assign names...

      Do you actually have any proof of this, or are you just speculating/relaying something you've been told by others? I've never encountered the behavior OP describes, or been asked to assign names to faces in images in my own gmail. And surely those features would be mentioned somewhere in their docs, right? But all I can find is references to an OCR feature on Enterprise, Education Standard, and Education Plus Google Workspace accounts:
      https://support.google.com/a/answer/6358855

      8 votes
      1. [9]
        Aerio
        Link Parent
        I'd believe it, because it does the same in Google Photos. At least on Pixel phones. My dog

        I'd believe it, because it does the same in Google Photos. At least on Pixel phones.

        My dog

        7 votes
        1. [8]
          cfabbro
          (edited )
          Link Parent
          They specifically mention those features in their docs for Google Photos: https://support.google.com/photos/answer/6128838 But AFAICT similar features are not mentioned anywhere in the docs...

          They specifically mention those features in their docs for Google Photos:
          https://support.google.com/photos/answer/6128838

          But AFAICT similar features are not mentioned anywhere in the docs related to Gmail or Google Workspace. That's why I am asking for some proof, since I would prefer not to see a bunch of misinformation based on pure speculation spreading here on Tildes.

          19 votes
          1. [2]
            takeda
            Link Parent
            Their privacy policy allows it, they have technology to do that on a large scale (as shown on Photos) and like majority of public companies they are driven by maximizing profit. So the only thing...

            Their privacy policy allows it, they have technology to do that on a large scale (as shown on Photos) and like majority of public companies they are driven by maximizing profit. So the only thing preventing them from doing it would be if that information was not valuable.

            3 votes
            1. Carighan
              Link Parent
              Isn't EU preventing this because you're no longer allowed to mix sensitive data from separate services? Part of why Threads hasn't launched in the EU yet? We might be seeing replies from US...

              Isn't EU preventing this because you're no longer allowed to mix sensitive data from separate services? Part of why Threads hasn't launched in the EU yet?

              We might be seeing replies from US customers mixed with EU customers here.

              3 votes
          2. [5]
            ku-fan
            Link Parent
            Are you asking for proof that Google does OCR and catalogs text from images? They definitely do that.

            Are you asking for proof that Google does OCR and catalogs text from images? They definitely do that.

            1 vote
            1. [4]
              cfabbro
              Link Parent
              No, I was specifically asking for proof they do any of that in Gmail, which is what OP was asking about.

              No, I was specifically asking for proof they do any of that in Gmail, which is what OP was asking about.

              7 votes
              1. [3]
                EnigmaNL
                Link Parent
                I don't have hard proof but from experience I can confirm it definitely does this. It can pull up search results that don't include the words in the body of the e-mail. It can even show results...

                I don't have hard proof but from experience I can confirm it definitely does this. It can pull up search results that don't include the words in the body of the e-mail. It can even show results from scanned documents in JPG format.

                4 votes
                1. [2]
                  cfabbro
                  (edited )
                  Link Parent
                  Yeah, based on your and other people's answers here, I am beginning to believe they do actually do it in gmail on attachment images... which is a little disconcerting, especially since I can't...

                  Yeah, based on your and other people's answers here, I am beginning to believe they do actually do it in gmail on attachment images... which is a little disconcerting, especially since I can't find mention of it anywhere in their official docs. Before I completely believe it, or jump to condemning google, I am going to have to test for it myself though.

                  3 votes
                  1. EnigmaNL
                    Link Parent
                    They don't publish a lot about how Gmail works, but their terms of service does cover this very broadly. It says they can use your content to customize their services for you for personalized...

                    They don't publish a lot about how Gmail works, but their terms of service does cover this very broadly. It says they can use your content to customize their services for you for personalized search results (and many other things). The terms of service covers all current Google services (Search, Maps, Gmail etc.).

                    2 votes
      2. takeda
        Link Parent
        Sorry I was primarily referring to Google Photos. If you search for example for "dog" it will show you pictures that you took with dogs. Though I think it is generally known thing. I did assume...

        Sorry I was primarily referring to Google Photos. If you search for example for "dog" it will show you pictures that you took with dogs. Though I think it is generally known thing.

        I did assume that if they do that with pictures on massive scale, then what's stopping them from applying the same thing to emails. Anyway I just tried it in Gmail and:

        • I searched a name that it was in pdf attachment and the email showed up (it didn't have that person's name in the text or recipient)
        • I searched for word "pumpkin" and got email with image of a flyer for fall festival that had pumpkins as a decoration. There was no word "pumpkin" in the text or subject, they called it "fall festival". Though after looking at detail there was pumpkin bread mentioned in another attachment.

        So I guess as far as email the OCR definitively works and is available to the user, but as for image recognition they do have the technology, their privacy policy doesn't prevent them from doing it, so they could still use it to provide additional metadata for advertising. The only reason I could see them not doing it, would be if the value gained was too small.

        3 votes
      3. [2]
        updawg
        Link Parent
        Well this was for work so OP is almost certainly using the enterprise version. That would answer your questions, right?

        But all I can find is references to an OCR feature on Enterprise,

        Well this was for work so OP is almost certainly using the enterprise version. That would answer your questions, right?

        2 votes
        1. cfabbro
          (edited )
          Link Parent
          Sure, but the OCR they're describing in Enterprise is meant only for content compliance and objectionable content filtering, and specifically has to be turned on. But AFAICT there is nothing about...

          Sure, but the OCR they're describing in Enterprise is meant only for content compliance and objectionable content filtering, and specifically has to be turned on. But AFAICT there is nothing about them using OCR on everything that goes through gmail to assist in search. I'm not looking for proof they can do OCR, that is a gimme, the question for me is what are they using OCR for exactly, and on which of their services.

          2 votes
    3. Akir
      Link Parent
      I'm OK with this. It sounds creepy, but if you're using Gmail in the first place then you've already signed away your right to privacy. Google already reads all your mail and uses it for basically...

      I'm OK with this.

      It sounds creepy, but if you're using Gmail in the first place then you've already signed away your right to privacy. Google already reads all your mail and uses it for basically anything they want. Having them analyze your photos is just the next logical step, so I don't see any reason to protest this if you were OK with them having the text already (including text in attachments), which is more likely to contain sensitive information. More importantly, you actively benefit from it since you now have more data to search and find the content you are looking for. It's a very convenient feature.

      7 votes
    4. Carighan
      Link Parent
      I will say that as much data-grabbing as this is - But at least we all knew this when we signed up for GMail/Photos/etc, right? Or at least I did, and that was ~19 years ago! - it has big upsides....

      I will say that as much data-grabbing as this is - But at least we all knew this when we signed up for GMail/Photos/etc, right? Or at least I did, and that was ~19 years ago! - it has big upsides.

      I frequently had to find pictures where I only knew the description of the content. And hey, Photos can find me that! Which is frankly amazing! I would have never found those pictures again in the sea I have otherwise.

      So long as people are aware when they opt it (in reguards /u/Aerio, it is opt-in. You opt out by not using Gmail for example, it's basically how you pay for the service), I don't truly see the problem. This was known from the start, after all. In fact it was why I wanted to use Photos in the first place.

      3 votes
    5. [5]
      teaearlgraycold
      Link Parent
      You should switch off of Gmail. I did so a couple of years ago and use my own domain name. It makes me look professional and I can more easily switch providers because I own the full email address.

      You should switch off of Gmail. I did so a couple of years ago and use my own domain name. It makes me look professional and I can more easily switch providers because I own the full email address.

      1 vote
      1. [4]
        takeda
        Link Parent
        I actually do run my own mail server for two decades. This is another thing I have beef with Google. Despite running it for so long, having IP that didn't change, implementing SPIF, DKIM and DMARC...

        I actually do run my own mail server for two decades. This is another thing I have beef with Google.

        Despite running it for so long, having IP that didn't change, implementing SPIF, DKIM and DMARC and never sending a spam or even appearing on any blacklist they out of sudden started labeling my email as spam.

        And because it is Google and I'm essentially a nobody, I can't do anything about it. It is extremely frustrating. And it's not like their spam filtering improved in the last 20 years.

        7 votes
        1. teaearlgraycold
          Link Parent
          I just use ProtonMail. I wouldn’t want to bother with self hosting and their spam rating is good.

          I just use ProtonMail. I wouldn’t want to bother with self hosting and their spam rating is good.

          3 votes
        2. [2]
          Octofox
          Link Parent
          Could be because your ISP or VPS host has had spam issues on their IP block.

          Could be because your ISP or VPS host has had spam issues on their IP block.

          1 vote
          1. takeda
            Link Parent
            I use business plan, also I only get these issues with Gmail not other email services. Also I'm not the only one. If you look at for example discussions (a while ago there was a thread on HN) many...

            I use business plan, also I only get these issues with Gmail not other email services.

            Also I'm not the only one. If you look at for example discussions (a while ago there was a thread on HN) many people and small companies stop hosting and start using other providers because Google makes it a nightmare.

            IIRC someone shared a solution that worked for his company. He got his legal department involved. They managed to contact a human and the issue was quickly resolved, although that's not something I can do.

            3 votes
    6. [4]
      superphly
      Link Parent
      Would you find it creepy if you had your own photos on your own hard drive and you had a program that could do this as well? What I'm asking is, do you find the creepiness in that someone else is...

      Would you find it creepy if you had your own photos on your own hard drive and you had a program that could do this as well? What I'm asking is, do you find the creepiness in that someone else is doing it or that there's a program that can actually discern a dog from a cat?

      1. [3]
        takeda
        Link Parent
        Maybe it should be an opt in feature? I used dog as an easy example that most people would have, because if I would mention my uncle you likely wouldn't have pictures of him.

        Maybe it should be an opt in feature?

        I used dog as an easy example that most people would have, because if I would mention my uncle you likely wouldn't have pictures of him.

        1. [2]
          fantom1979
          Link Parent
          I honestly think at this point it is an opt-in feature. Nothing privacy related should be a surprise at this point. If you still use Google after 2010 or so, you should kind have known what you...

          I honestly think at this point it is an opt-in feature. Nothing privacy related should be a surprise at this point. If you still use Google after 2010 or so, you should kind have known what you were signing up for. Google is an advertising company that happens to make technology.

          Posted from the Chrome browser, on my Android phone, on a website that I used Gmail to sign up with.

          1. takeda
            Link Parent
            Honestly this kind of apathy is what it makes it so hard for anyone who don't want to opt in. I for example don't use Gmail, but because you (and many others do) use it I am at mercy of theirs to...

            Posted from the Chrome browser, on my Android phone, on a website that I used Gmail to sign up with.

            Honestly this kind of apathy is what it makes it so hard for anyone who don't want to opt in.

            I for example don't use Gmail, but because you (and many others do) use it I am at mercy of theirs to send email to you. And they do break things and I see many people running their email servers stop doing it exactly because of Google.

            I see you joined Tildes when reddit API protest started. We are doing the same thing here, but we are giving Google control over the internet. And unlike reddit, it will be much harder to escape that.

            4 votes
    7. [3]
      Delayed_Apex
      Link Parent
      I'm currently in the process of setting up a Raspberry-Pi-powered little home "server", which I will also use to back up my pictures instead of using Google Photos. There are browser-based gallery...

      I'm currently in the process of setting up a Raspberry-Pi-powered little home "server", which I will also use to back up my pictures instead of using Google Photos. There are browser-based gallery "apps" you can use on your own pics in your own storage which will also do that kind of AI - but locally, so no data is sent to any server anywhere. I think in that context I really like the idea. After all, machine learning based computer vision is an amazing tool and it does not have to be used exclusively for creepy or even evil purposes.

      1. [2]
        fantom1979
        Link Parent
        I've been looking for local programs that can do that kind of AI, but have struck out. Do you have any suggestions?

        I've been looking for local programs that can do that kind of AI, but have struck out. Do you have any suggestions?

        1 vote
        1. Delayed_Apex
          Link Parent
          I think "PhotoPrism" might be something to check out in that context. I'm still in the process of figuring it all out myself - at least my Raspberry Pi 4 has finally shipped now...

          I think "PhotoPrism" might be something to check out in that context. I'm still in the process of figuring it all out myself - at least my Raspberry Pi 4 has finally shipped now...

          1 vote
  2. [5]
    skybrian
    Link
    Asking around in forums can result in a fair bit of noise because people often jump to conclusions about technology (as we’ve already seen here). There are other possible explanations. Photos have...
    • Exemplary

    Asking around in forums can result in a fair bit of noise because people often jump to conclusions about technology (as we’ve already seen here). There are other possible explanations. Photos have file names and often have hidden metadata that might be indexed. Whoever sent you the images might have added a description. Email has headers that aren’t normally shown. There may be other ways.

    You could test this yourself. To do a clean test, I’d make some images, each with a different random word, send them to myself, and then see if you I can search for them. (Make sure you don’t put the random word anywhere else.)

    25 votes
    1. [4]
      goose
      Link Parent
      I verified this behavior by searching for a phrase in a printed document that someone is holding and sent me a photo of. There's no subject or body message, and no exif data attached to the image...

      I verified this behavior by searching for a phrase in a printed document that someone is holding and sent me a photo of. There's no subject or body message, and no exif data attached to the image with that phrase. It's also not a scan quality image, it's very much a hand holding a piece of paper at a slight angle away from the camera.

      18 votes
      1. [3]
        skybrian
        Link Parent
        Oh, that’s good data. What kind of Google account do you have?

        Oh, that’s good data. What kind of Google account do you have?

        5 votes
        1. [2]
          goose
          Link Parent
          A standard account founded on the faraway date of June 16, 2007. Two more years til my Google account is old enough to vote! I pay for extra storage (Google One), but that's it, it's not an...

          A standard account founded on the faraway date of June 16, 2007. Two more years til my Google account is old enough to vote!

          I pay for extra storage (Google One), but that's it, it's not an Enterprise/Workspace/Education account.

          7 votes
          1. Moody
            Link Parent
            Same here, standard account. I searched for ginger and got a hit on a photo of a bottle with the text ginger beer on the label.

            Same here, standard account.

            I searched for ginger and got a hit on a photo of a bottle with the text ginger beer on the label.

            3 votes
  3. [2]
    goose
    Link
    Just verified the same behavior on my standard (non-enterprise/education/workspace) account. I have mixed feelings on this. On the one hand, I wish they were more upfront about this, and it were...

    Just verified the same behavior on my standard (non-enterprise/education/workspace) account.

    I have mixed feelings on this. On the one hand, I wish they were more upfront about this, and it were opt in. On the other hand, it's definitely a useful feature.

    14 votes
    1. TheBeacon
      Link Parent
      I don't have mixed feelings. It's search working as intended. Attachments are part of the email, they are in no way a "sealed" part that should for some reason be kept separate. Consider PDF...

      I don't have mixed feelings. It's search working as intended. Attachments are part of the email, they are in no way a "sealed" part that should for some reason be kept separate.

      Consider PDF attachments, some are created using PDF "printer" software that makes the contents an image or are scanned documents. They should show up on search like any other text document, but because of the way they were made that's only possible if you run OCR on them. So Google does this for good UX. Similarly you have documents sent as images, maybe photographed. So you need to run OCR on images too.

      If there's any sensitive information it should be kept encrypted. This is especially important for email considering some servers are still sending them out in the clear i.e. without even transport encryption.

      Sending text as images is not encryption, nor an indication the contents are to be walled off. I'm baffled some people treated them as such.

      7 votes
  4. [13]
    Brekkjern
    Link
    Why is it an issue that images are OCR scanned? To me this sounds like a feature, and an excellent way to prevent spammers from trying to get around spam filters by just putting the incriminating...

    Why is it an issue that images are OCR scanned? To me this sounds like a feature, and an excellent way to prevent spammers from trying to get around spam filters by just putting the incriminating text in the image.

    11 votes
    1. [8]
      FluffyKittens
      Link Parent
      IMO it's problematic because it's a new feature that users can't toggle easily. Many people and orgs haven't historically been managing data with OCR'd images in mind, so adding OCR after the fact...

      IMO it's problematic because it's a new feature that users can't toggle easily. Many people and orgs haven't historically been managing data with OCR'd images in mind, so adding OCR after the fact makes a hacker's job literally 10x easier.

      Consider how many private individuals and small businesses have sent e.g. scanned tax forms or drivers license pictures unencrypted. That's not a huge deal if the hacker has to download the full mailbox and batch OCR every image/PDF, then run a scraping script. That data export process is intensive and slow, and more likely to set off alarm bells when Google notices the same sketchy IP logging into accounts connected to ten different orgs and pulling down dozens of gigs each time.

      Instead, now an attacker can add "drivers license" or "form 1040" to their IMAP-search exfil script and they're done in a few seconds.

      5 votes
      1. [2]
        sparksbet
        Link Parent
        To be fair, they would probably already be doing that, right? To catch emails that mention those docs are attached or attachments where the file is named that.

        Instead, now an attacker can add "drivers license" or "form 1040" to their IMAP-search exfil script and they're done in a few seconds.

        To be fair, they would probably already be doing that, right? To catch emails that mention those docs are attached or attachments where the file is named that.

        6 votes
        1. FluffyKittens
          Link Parent
          Yeah, absolutely. They'll just be catching ~95% of those emails instead of ~40%, and that matters at scale.

          Yeah, absolutely. They'll just be catching ~95% of those emails instead of ~40%, and that matters at scale.

          2 votes
      2. [5]
        Brekkjern
        Link Parent
        Fixed that for you. This is what Google will trigger on. Not the volume of data. Security is by preventing access to the data in the first place. This is why Google is pushing 2FA solutions so...

        Many people and orgs haven't historically been managing data with OCR'd images security in mind

        Fixed that for you.

        when Google notices the same sketchy IP logging into accounts connected to ten different orgs

        This is what Google will trigger on. Not the volume of data.

        Security is by preventing access to the data in the first place. This is why Google is pushing 2FA solutions so hard. It is also why they would want to OCR incoming email to uncover phishing, which would likely prevent more data leakage than the extra time it would take for an attacker to do their own OCR scans on the downloaded data.

        5 votes
        1. [4]
          FluffyKittens
          Link Parent
          A one-off OCR scan upon receipt to check for phishing, where the data is ephemerally-stored and not exposed to end users, would give all the security benefits with none of the downside of their...

          A one-off OCR scan upon receipt to check for phishing, where the data is ephemerally-stored and not exposed to end users, would give all the security benefits with none of the downside of their current implementation, no?

          I agree that people suck at security, which is why I think Google shouldn't OCR without a user explicitly enabling the feature: it's an insecure default.

          I would be extremely surprised if major traffic anomalies weren't tracked in some point of Google's security stack, but similarly, having to dump full-on sets of breach data is a big logistical hurdle for attackers even if it's not the key heuristic that gets their attacks blocked.

          1 vote
          1. [3]
            Brekkjern
            Link Parent
            What security benefits? You are still arguing like obscurity is security. It is not. And the upside of the current implementation is that it is a feature where users can search for information...

            would give all the security benefits with none of the downside of their current implementation, no?

            What security benefits? You are still arguing like obscurity is security. It is not. And the upside of the current implementation is that it is a feature where users can search for information that is in images. This is not a downside. That is the point.

            I agree that people suck at security, which is why I think Google shouldn't OCR without a user explicitly enabling the feature: it's an insecure default.

            It's not insecure at all. You have to be properly authenticated and authorized to view the data. Denying this feature could at worst mean that people would give their mailbox content to a third party to have OCR of their content for better search, increasing the attack surface on their data by the new provider. Protecting users from dumb decisions like this is likely going to be a larger benefit than detection through volume of extraction.

            10 votes
            1. [2]
              FluffyKittens
              Link Parent
              Could you drop the snarky/aggro "FTFY" attitude, please? I'm giving a good faith reply to a question you asked. I'm not saying that users shouldn't be able to do that. I'm saying it's a feature...

              Could you drop the snarky/aggro "FTFY" attitude, please? I'm giving a good faith reply to a question you asked.

              And the upside of the current implementation is that it is a feature where users can search for information that is in images.

              I'm not saying that users shouldn't be able to do that. I'm saying it's a feature they should be in control of.

              Denying this feature could at worst mean that people would give their mailbox content to a third party to have OCR of their content for better search, increasing the attack surface on their data by the new provider.

              Again, I'm not saying that users shouldn't have built-in OCR. Why would someone export to a third-party instead of flipping a switch to turn on OCR in their email settings?

              Serious question: do you have experience analyzing large breach data? If not, you should give it a whirl if you live in a jurisdiction where it's legal, because it'll give you a lot of insight into the process of blackhat data harvesting. The vast majority of bad actors are going for the lowest-hanging fruit and leaving the rest. Unless they have a specific target, the furthest they'll go in terms of harvesting email attachments is pumping PDFs and images through tessaract and pumping that raw text through a secrets scanner.

              Security-through-obscurity is the wrong framing for this: we're not talking about infrastructure or a transmission protocol. It's a matter of defense-in-depth, i.e. not serving your data up on a silver platter for retrieval.

              4 votes
              1. Brekkjern
                Link Parent
                Because users are more comfortable going through an OAuth flow to allow a third party access to their accounts than to discover features in a settings page to toggle them on or off. It is...

                Again, I'm not saying that users shouldn't have built-in OCR. Why would someone export to a third-party instead of flipping a switch to turn on OCR in their email settings?

                Because users are more comfortable going through an OAuth flow to allow a third party access to their accounts than to discover features in a settings page to toggle them on or off.

                Security-through-obscurity is the wrong framing for this: we're not talking about infrastructure or a transmission protocol. It's a matter of defense-in-depth, i.e. not serving your data up on a silver platter for retrieval.

                It is absolutely not the wrong framing here. You agree that the data is available in the mailbox, but your argument is that just removing them from the search index will suffice for security. It will not. If we were to actually do security in depth, then Google should scan for such documents and require an extra authentication to open them (enter your password, or touch your 2FA). That would actually secure the information instead of attempt to hide it from an attacker.

                3 votes
    2. [4]
      slashtab
      Link Parent
      You are more worried about a spam mail than google reading sensitive data?

      You are more worried about a spam mail than google reading sensitive data?

      3 votes
      1. [3]
        Brekkjern
        Link Parent
        If the data is sensitive and you don't trust the processor of the data, then why would you have the data sent through a party you don't trust? This is what encryption is for. Not obfuscation.

        If the data is sensitive and you don't trust the processor of the data, then why would you have the data sent through a party you don't trust? This is what encryption is for. Not obfuscation.

        21 votes
        1. [2]
          slashtab
          Link Parent
          Didn't OP mentioned that google is doing this bethought any permission?

          Didn't OP mentioned that google is doing this bethought any permission?

          1. fantom1979
            Link Parent
            The OP didn't give permission when they signed up for Gmail? I would assume this is covered in the TOS. I am sure it sounds like lawyer speak but legally they probably do have your permission....

            The OP didn't give permission when they signed up for Gmail? I would assume this is covered in the TOS. I am sure it sounds like lawyer speak but legally they probably do have your permission.

            These days I would assume anything you do online that is not encrypted is public.

            3 votes
  5. skybrian
    Link
    It seems like this discussion has gone on for a long time without anyone noticing that this is a feature for searching your own email. The results are surprising, but the search is being done on...

    It seems like this discussion has gone on for a long time without anyone noticing that this is a feature for searching your own email. The results are surprising, but the search is being done on your behalf. Google doesn't need to ask your permission to enable you to read your own email, or look at images in your own email, or to search your own email either. Searching your own email was a big selling point of Gmail since it launched. Now it works better.

    It's true that Google needs to implement this feature on their servers. Gmail is a server-side app. People are speculating that they're showing the results to someone other than the person they belong to ("harvesting" them), but we don't have actual evidence of that. As far as we know, there's no privacy violation.

    Marketing people would sometimes talk about "surprise and delight" where you make things work better than expected in surprising ways. Nowadays, all surprises are bad, it seems? We're getting so jumpy about what tech companies might be doing and how it might work that they can't do something cool without a lot of reassurance that a new feature just does what it looks like and nothing else. We can't even see what it does without imagining other things, too.

    Possibly, some of us might be reassured if this were an email app that you ran on your computer at home. However, I do know elderly people who are often surprised and spooked by things their own tablet or phone does.

    Maybe technology has gotten a bit too magical and people have lost all sense of control over it. And that's true! You don't have any control over what a software update does.

    5 votes
  6. cfabbro
    Link
    Do you have an Enterprise; Education Standard or Education Plus Google Workspace account? If so, this might explain what's happening: https://support.google.com/a/answer/6358855

    Do you have an Enterprise; Education Standard or Education Plus Google Workspace account? If so, this might explain what's happening:

    https://support.google.com/a/answer/6358855

    4 votes
  7. [2]
    AgentBoxer
    Link
    iPhone does this as well. Try taking a screenshot of something with words then go back to Home Screen and pull down on it to bring up the global search and type a word present in the screenshot....

    iPhone does this as well. Try taking a screenshot of something with words then go back to Home Screen and pull down on it to bring up the global search and type a word present in the screenshot. If you want to disable this go to iPhone settings>Siri and search>photos>show content in search and disable.

    You can also select and copy and paste words directly from images as well

    3 votes
    1. ButteredToast
      Link Parent
      Apple platforms do their OCR locally though (which is why it works even in the live feed of the camera app's viewfinder), which seems a bit less creepy even if there's still potential privacy issues.

      Apple platforms do their OCR locally though (which is why it works even in the live feed of the camera app's viewfinder), which seems a bit less creepy even if there's still potential privacy issues.

      3 votes
  8. tvix
    Link
    I've had the same thing with GDrive. A few years ago now I was looking for where I put some cool NASA pictures of Jupiter. So I searched Jupiter, mainly hoping for a folder I had lost that had the...

    I've had the same thing with GDrive. A few years ago now I was looking for where I put some cool NASA pictures of Jupiter.

    So I searched Jupiter, mainly hoping for a folder I had lost that had the name with it. I did, and found the pictures I wanted. Cool story bro, right?

    What also poped up when I searched my GDrive for "Jupiter" was a picture of a statue of Jupiter from some museum somewhere I took years ago. No name tag on the statue, no direct data tag from my Android phone (yes EXTIF, but that should only put me roughly in a multi-floor museum - the file was from an unorganized SD card dump).

    I was pretty shocked. Not that it could identify the statue, but more that it did from inside my "private" GDrive.

    2 votes
  9. jaylittle
    Link
    If you value your privacy, you need to stop using gmail. There is no other correct answer.

    If you value your privacy, you need to stop using gmail. There is no other correct answer.

    1 vote
  10. Pistos
    Link
    Ah, Google. The best part about this: Even if it's somehow opt in for the Gmail account holder, they'll still happily slurp up data of other people that send email and attachments to you.

    Ah, Google. The best part about this: Even if it's somehow opt in for the Gmail account holder, they'll still happily slurp up data of other people that send email and attachments to you.

  11. [2]
    Comment removed by site admin
    Link
    1. mat
      Link Parent
      For what it's worth Google internally talk about having two customers. First is the user, who buy services like email and search and pays with their eyeballs on ads; second is the ad buyers, who...

      For what it's worth Google internally talk about having two customers. First is the user, who buy services like email and search and pays with their eyeballs on ads; second is the ad buyers, who buy access to eyeballs and pay with money. Sure they have things like Youtube premium where you buy services directly with money but you're right the majority of their business is showing ads to people.

      They're not selling "you", they're selling a very limited form of access to you. If you set up an advertising account with google you can see how much you don't have access to. It's basically nothing, you set parameters for what demographics you want to see your ads and that's about it. Google mediates everything to keep user information out of advertiser's hands, because that dataset is incredibly valuable. It's never sold.

      6 votes