11 votes

Analysis by computer science professor shows that "Google Phone" and "Google Messages" send data to Google servers without being asked and without the user's knowledge, continuously

12 comments

  1. vord
    Link
    Don't take this comment as a defense of Google... They are a massive ad network, and thus assuming the worst is probably prudent. The mitigations for users and reccomendations to Google are all...

    Don't take this comment as a defense of Google... They are a massive ad network, and thus assuming the worst is probably prudent. The mitigations for users and reccomendations to Google are all solid. However...

    Let's keep in mind that all of this data is also almost certainly being aggregated by your cellphone provider, whom also knows who you are, and by extension governments in most of the "Western" world. They're also probably selling this data off to whomever they feel like as an additional revenue stream.

    Telemetry in-app has been justified seemingly by every developer and their mother. If we don't like Google doing it....then maybe we should all reconsider the consequences of telemetry for the convenience of developers.

    7 votes
  2. [11]
    teaearlgraycold
    Link
    I don’t understand why Google needs this information for text messages. Or why you can’t opt out. However, the argument against the collection of incoming numbers for filtering doesn’t seem sound...

    I don’t understand why Google needs this information for text messages. Or why you can’t opt out.

    However, the argument against the collection of incoming numbers for filtering doesn’t seem sound to me:

    When, in addition, the “See caller and spam ID” option is enabled in the Google Dialer app (which is the default) the app sends the phone number of each incoming call3 to Google, together with the time of the call. By combining data from handsets exchanging phone calls the phone numbers of both are therefore revealed. We note that sending of incoming phone numbers to Google is not necessary for call screening. For example, Google’s Safe Browsing anti-phishing service for web browsers achieves URL screening while only uploading partial URL hashes to Google servers

    Phone numbers don’t have nearly enough entropy for a hashing approach to be at all useful.

    4 votes
    1. [10]
      drannex
      Link Parent
      Ads. Edit: and government "anti-terrorism" measures.

      I don’t understand why Google needs this information for text messages

      Ads.

      Edit: and government "anti-terrorism" measures.

      1 vote
      1. skybrian
        Link Parent
        There is no need to speculate. The paper has Google's reasons for this logging: I don't think Google did a good job at minimizing the data they collect, but as you see there are lots of reasons...

        There is no need to speculate. The paper has Google's reasons for this logging:

        Google also provided clarification on the purposes of some of the data collection observed. Namely:

        1. The message hash is collected for detecting message sequencing bugs.
        2. Phone numbers are collected to improve regex pattern matching for automatic recognition of one-time passwords sent over RCS. Messages automatically recognizes incoming One-Time Password (OTP) codes to avoid the user having to fill them in. This can be a frequent point
          of failure and the phone number data is used to improve recognition by providing ground-truth based on known
          OTP sender numbers.
        3. The ICCID data is used to support Google Fi.
        4. Firebase Analytics logging of events (not including phone
          numbers) is used to measure the effectiveness of app download promotions (for Messages and Dialer specifically). Namely, to measure not only whether the app was downloaded but also whether it was used once downloaded.

        I don't think Google did a good job at minimizing the data they collect, but as you see there are lots of reasons why a product might have logging that have nothing to do with ads.

        6 votes
      2. [8]
        teaearlgraycold
        Link Parent
        How does knowing who texted whom (and not what they said) help with advertising?

        How does knowing who texted whom (and not what they said) help with advertising?

        2 votes
        1. [7]
          drannex
          (edited )
          Link Parent
          Network dynamics, the people they messaged give data on who they respond to. You can figure out if they are family, friends, work associates, etc. Cross mesh with location to figure out if there...

          Network dynamics, the people they messaged give data on who they respond to. You can figure out if they are family, friends, work associates, etc. Cross mesh with location to figure out if there are events they are going to and what the events are.

          Plus, they send out a hash of the message content, which with Google powers would be an easy enough method for deciphering when 'needed'. Likely there are two different hashes for the same message content for both receiver and sender, and you can extrapolate and compare the similaritiesto figure the content (if they even need that), as who knows who can access the key to hash (Google also likely stores that information anyway) They don't salt the hash, so they can get that information incredibly easily. See comment below from @petrichor.

          Edit: Deleted my other comment, I just don't have the interest to keep this conversation going.

          3 votes
          1. [6]
            petrichor
            Link Parent
            That's not how hashes work. SHA256 is a well-established cryptographic standard. This is impossible. Hash functions are designed to produce catastrophically different outputs from the smallest of...

            Plus, they send out a hash of the message content, which with Google powers would be an easy enough method for deciphering when 'needed'.

            That's not how hashes work. SHA256 is a well-established cryptographic standard.

            you can extrapolate and compare the similarities to figure the content

            This is impossible. Hash functions are designed to produce catastrophically different outputs from the smallest of changes. For a demonstration of what this looks like, see this (tongue-in-cheek) Wordle clone.

            who knows who can access the key to hash

            SHA256 doesn't use secret keys. If I hash something with SHA256 and you hash the same something, it'll produce the same output. In order to reverse an arbitrary hash, you would need to mount a preimage attack (guess-and-check the message content). Preimage attacks are really impossibly hard to pull off. Besides, they truncate the hash to around 1/4 of its original length, which, again, makes it even more functionally impossible.

            I'm not a fan of Google telemetry by any means, but there isn't really anything malicious about using partial message hashes.

            9 votes
            1. [2]
              teaearlgraycold
              Link Parent
              Granted some double digit percentage of messages could be broken with rainbow tables - “Yes” “No” “Hey”, strings of emoji

              Granted some double digit percentage of messages could be broken with rainbow tables - “Yes” “No” “Hey”, strings of emoji

              5 votes
              1. petrichor
                Link Parent
                Very true, haha! "what's up" and "yup" for me...

                Very true, haha! "what's up" and "yup" for me...

                1 vote
            2. [4]
              Comment deleted by author
              Link Parent
              1. [3]
                Adys
                Link Parent
                Hashing and encryption are two different and distinct things. Hashing is non reversible. This is computer science 101, so it’s difficult to listen to anything you’re saying if you aren’t credible...

                Hashing and encryption are two different and distinct things. Hashing is non reversible.

                This is computer science 101, so it’s difficult to listen to anything you’re saying if you aren’t credible at all there.

                4 votes
                1. teaearlgraycold
                  Link Parent
                  Hashing: give me an identifier for this thing, but destroy any indication of what it was Encryption: divide this thing into two pieces (usually a password & encrypted blob), such that each piece...

                  Hashing: give me an identifier for this thing, but destroy any indication of what it was

                  Encryption: divide this thing into two pieces (usually a password & encrypted blob), such that each piece has no indication of the original information

                  2 votes
                2. [2]
                  Comment deleted by author
                  Link Parent
                  1. stu2b50
                    Link Parent
                    It's not really a "key", though. A key implies that its secret information. When you add a salt to something before you hash, you actually just keep the salt next to the hash. There's no reason to...

                    There are reasons why in encryption its called hash and salting. when you use a key to add an additional element of security.

                    It's not really a "key", though. A key implies that its secret information. When you add a salt to something before you hash, you actually just keep the salt next to the hash. There's no reason to keep it secret; it's not a secret. It's just used to avoid rainbow table attacks. It doesn't prevent brute force attacks because the only thing that can prevent that is logistics.

                    Whether or not you need a salt really depends on the circumstances, but it doesn't really change anything in this circumstance. There's no way, theoretically or practically, to prevent a brute force attack like "let's just try all the common messages and see if they match", salting or not.

                    2 votes