10 votes

reCAPTCHA: Is there method in monotony?

What started out as a little facetious in my own head leads me now to a serious question. Is there some meaningful reason why Google has to use a subsection of images for reCAPTCHA? I really dislike having to do this and at the very least would appreciate some variation.

  • Traffic Lights
  • Buses
  • Bicycles
  • Cars
  • Crosswalks

Is there something special about these things in this context? Is the visual noise they're usually associated with what makes them good candidates? Are Google just really into urban planning? Who knows...I'm hoping some Tilder smarter than I can help me out.

30 comments

  1. [25]
    aphoenix
    Link
    In short: it's used to help train self driving vehicles.
    17 votes
    1. HoolaBoola
      Link Parent
      I don't have any other issue with this other than the company is Google. They've got so big a hold on the internet, and have so much data on just about everything. This by itself doesn't concern...

      I don't have any other issue with this other than the company is Google. They've got so big a hold on the internet, and have so much data on just about everything. This by itself doesn't concern me that much, but it also provides Google an irreplaceable edge over any competition. In short, no other company has these kinds of resources, and I'd prefer the competition was roughly at the same level.

      9 votes
    2. [23]
      milkbones_4_bigelow
      Link Parent
      Ah ha! Free labour, I should have known. Appreciate the link. Sinister stuff, bud sadly, unsurprising. Out of curiosity, are those data sets then made available or locked up? I'm guessing the...

      Ah ha! Free labour, I should have known. Appreciate the link. Sinister stuff, bud sadly, unsurprising. Out of curiosity, are those data sets then made available or locked up? I'm guessing the latter, but who knows.

      5 votes
      1. [20]
        skybrian
        Link Parent
        The "free labor" thing is a common complaint but I find it strange. It's a trivial amount of effort and they would still need to show captchas if the results were thrown away. An analogy: if...

        The "free labor" thing is a common complaint but I find it strange. It's a trivial amount of effort and they would still need to show captchas if the results were thrown away.

        An analogy: if someone invented a device to harvest a bit of energy from people's footsteps when walking down a sidewalk then I would consider that clever rather than sinister.

        9 votes
        1. [13]
          milkbones_4_bigelow
          (edited )
          Link Parent
          Hey skybrian, thanks for your reply and for making me think a little more deeply about my gut reaction. That said, I think there are still issues here, at least for me. The core of this being...

          Hey skybrian, thanks for your reply and for making me think a little more deeply about my gut reaction. That said, I think there are still issues here, at least for me. The core of this being transparency and intent.

          Transparency: It wasn't clear to me what this information is being used for. Nobody likes a masquerade.

          Intent: If footsteps were somehow harnessed to produce power (firstly, this would be awesome) which was then freely distributed to the poorest most in need individuals, I'd be happy to walk (or click pictures of traffic lights all day) that said, I imagine the primary purpose of any power culminating from this data set is to benefit Google and it's stakeholders alone. Naturally this point is less important should the data be made available in the public domain (it not being is speculation on my part), thus contributing to road safety for everyone. What do you think?

          6 votes
          1. [5]
            cfabbro
            (edited )
            Link Parent
            Google is providing, for free, an invaluable service used by millions of websites to combat spam, but it isn't free for them to develop, run and maintain, so why should they not get some return...

            Google is providing, for free, an invaluable service used by millions of websites to combat spam, but it isn't free for them to develop, run and maintain, so why should they not get some return value out of the process? And if they used pictures not tied to any of their other projects, all that categorization effort would simply go to waste.

            And just FYI, they don't hide the fact they do this, and already are transparent about it:
            https://www.google.com/recaptcha/intro/v3.html

            Hundreds of millions of CAPTCHAs are solved by people every day. reCAPTCHA makes positive use of this human effort by channeling the time spent solving CAPTCHAs into annotating images and building machine learning datasets. This in turn helps improve maps and solve hard AI problems.

            https://www.google.com/recaptcha/intro/invisible.html?ref=producthunt

            reCAPTCHA improves our knowledge of the physical world by creating CAPTCHAs out of text visible on Street View imagery. As people verify the text in these CAPTCHAs, this information is used to make Google Maps more precise and complete. So if you're a Google Maps user, your experience (and everyone else's) will be even better.

            reCAPTCHA helps solve hard problems in Artificial Intelligence. High quality human labelled images are compiled into datasets that can be used to train Machine Learning systems. Research communities benefit from such efforts that help build the next generation of groundbreaking Artificial Intelligence solutions.

            reCAPTCHA digitizes books by turning words that cannot be read by computers into CAPTCHAs for people to solve. Word by word, a book is digitized and preserved online for people to find and read.

            https://support.google.com/recaptcha/?hl=en

            reCAPTCHA also makes positive use of the human effort spent in solving CAPTCHAs by using the solutions to digitize text, annotate images, and build machine-learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems.

            etc. etc. etc.

            9 votes
            1. [3]
              milkbones_4_bigelow
              (edited )
              Link Parent
              hey cfabbro, I appreciate you taking the time to put this together. I hadn't seen those initially. I only hope the data is used positively and is for the benefit of all people. Thanks again.

              hey cfabbro, I appreciate you taking the time to put this together. I hadn't seen those initially. I only hope the data is used positively and is for the benefit of all people. Thanks again.

              2 votes
              1. [2]
                cfabbro
                (edited )
                Link Parent
                NP. p.s. Don't get me wrong, I wish everything google (and every mega-corporation) did was altruistic and benefited all mankind as well, instead of just their owners/shareholders to the detriment...

                NP. p.s. Don't get me wrong, I wish everything google (and every mega-corporation) did was altruistic and benefited all mankind as well, instead of just their owners/shareholders to the detriment of everyone else (as seems to be the norm). However, the unfortunate reality is that they are a for-profit corporation, and since we're not in a post-scarcity world yet, IMO the best we can really hope for from them is a tiny bit of transparency and some mutual benefit... which I think reCAPTCHA at least seems to achieve. There are plenty of reasons to dislike and distrust google, but I don't think this particular project of theirs is one of them.

                1. milkbones_4_bigelow
                  Link Parent
                  That's fair enough. Glad everyone got a chance to express their opinion.

                  There are plenty of reasons to dislike and distrust google, but I don't think this particular project of theirs is one of them.

                  That's fair enough. Glad everyone got a chance to express their opinion.

                  1 vote
            2. FlippantGod
              (edited )
              Link Parent
              I have a few issues. For one thing, websites opt in, not users. But the big issue is that users visiting sites on firefox recieve more captchas than those browsing on chrome. There are studies...

              I have a few issues. For one thing, websites opt in, not users. But the big issue is that users visiting sites on firefox recieve more captchas than those browsing on chrome. There are studies showing that reducing page load times increases sales.

              What about taking five seconds for an image to slowly fade in so you can click it and then do that for half a minute?

              Google is generating value for their own computer vision products but also their broswer, at the expense of firefox users whose only options are to go along with this, switch to chrome, or deny themselves access to websites not even affiliated with google.

              Furthermore, iirc, Google developed software to distinguish the cursor behaviors of human users vs bots, so they could have done away with the image captchas entirely as I understand it.

              Edit: additionally, the quotes you provided make no claim that the datasets are ever shared with the research community, just that such efforts help improve machine learning. And THEN, even the digitized books are not necessarily made publically available; the digitized text is indeed preserved as they say, by Google, who uses the materials as another dataset I'm sure. In all likelyhood, only a minisulcule fraction of digitized texts are probably ever publically viewable due to copywrites and intellectual property laws. Instead Google seems to provide one or two pages of a work as a preview. Usually the table of contents in my experience...

              So out of those claims, the only real one was that it improves google maps, thus bringing more value to thsir products over the competition.

              I am certain that captcha is not intended for any altruistic purpose.

              1 vote
          2. [7]
            skybrian
            Link Parent
            Regarding transparency, I believe in explaining how things work, and also that many users don't really care about how things work most of the time, until one day they decide to ask. Probably there...

            Regarding transparency, I believe in explaining how things work, and also that many users don't really care about how things work most of the time, until one day they decide to ask. Probably there should be a link to an explanation as part of the reCaptcha. I wonder if there's one already and how does it work? (I don't have any easy way to trigger it.)

            As far as intent goes, it seems rather unlikely that the data will benefit Google alone. If they build it into a product then won't users benefit from it too?

            It doesn't look like Google publishes the results from classifying these images and I'd guess they won't do it because it would help the spammers who are trying to crack reCaptcha. But Google does publish an Open Images Dataset for machine language researchers to use.

            Disclaimer: I worked for Google for almost a dozen years, but not in this area, and I'm out of touch these days.

            7 votes
            1. [6]
              milkbones_4_bigelow
              (edited )
              Link Parent
              I just took a look but there's no link funnily enough. That could be a nice addition though. To your point about benefit, I'm a little unsure. I guess at the core of it, the services Google...

              I just took a look but there's no link funnily enough. That could be a nice addition though.

              To your point about benefit, I'm a little unsure. I guess at the core of it, the services Google provide are not in the interests of the greater good, I could be wrong here of course. It's my understanding Google is an advertising company that uses its user's data as a commodity which is of much greater value than the service(s) they provide. I'd love to spend some time digging into it in more detail.

              Out of curiosity, how was your experience working there? I'm certain there were/are a ton of decent, super smart, non nefarious folks there. I don't mean to make generalisations :) Thanks for continuing the discussion skybrian :)

              1 vote
              1. [5]
                skybrian
                Link Parent
                Google Search and Maps and YouTube are useful services that billions of people use every day, for free. There are alternatives, sure, and people outside the company don't get to decide how these...

                Google Search and Maps and YouTube are useful services that billions of people use every day, for free. There are alternatives, sure, and people outside the company don't get to decide how these services work. But still, it seems very egalitarian? No matter how rich or poor you are, and for most countries and languages, if you can get online, you can use their services, and isn't universal free service a "greater good?" But I guess a lot of Internet services are like that. We take so much for granted.

                I don't think there's a way to rigorously compare the value of advertising data to the value of the services Google provides. Economists know how to compare things that have prices, and one side of the comparison isn't priced. It's hard to compare something given away for free to billions of people to something that makes billions of dollars.

                But, just as one datapoint, I find it hard to imagine how advertisers could have gained much from all those years of watching me ignore their ads. I don't always have an ad blocker installed, but ad blindness is a thing and my guess is that Wirecutter has had more influence on my buying decisions than all of Google's advertiser's combined.

                Since many others are the same way, I find Google's way of making money to be rather mysterious. I know it comes from ads, but still, something doesn't add up. For a long time I suspected that online advertising was in some kind of bubble and that eventually advertisers would wise up and revenues would crash. But it didn't happen. Ad spending keeps growing, so it must somehow be working for them?


                Regarding working there, the pay and benefits were excellent, enough to eventually retire and putter around. So, I can't really complain because they were very good to me!

                Despite this, I wouldn't say my morale was all that great when I was there. It always felt like some other teams were doing amazing things and we'd hear about them at TGIF, but for various reasons my particular team wasn't going to move the needle. I often thought of leaving Google, but the money and benefits seemed too good, so I ended up changing teams in pursuit of something meaningful.

                The overall mood changed dramatically over the years. At the beginning, there was a big whiteboard with a whimsical "world domination plan" drawn on it and we all thought it was pretty funny, self-deprecating humor. Also, there was definitely a sense that we felt like we were all in it together.

                There were always weird political fights on mailing lists (like whether the microkitchens should have water bottles) but they seemed inconsequential back then. The Google+ launch was sort of the beginning of the end where it seemed like it was clueless management versus activists. (Remember the "real name" debate?) It got worse after Trump's election and now it seems trust largely evaporated even internally.

                Part of the reason I think it went south is that in a big company that employees identify with, when some distant part of the company screws up in a public way, lots of people get upset, since they put a lot of self-worth in the company's reputation. This results in a lot of angst over things that don't have much to do with your job. I think smaller companies have it better in this respect: most of the things that go wrong in the world cannot possibly be your company's fault.

                2 votes
                1. [2]
                  milkbones_4_bigelow
                  (edited )
                  Link Parent
                  Hey skybrian, I appreciate your detailed reply. With that said, I'm afraid I still disagree, at least with the last part of your premise :) As the old adage states, there's no such thing as a free...

                  Hey skybrian,

                  I appreciate your detailed reply. With that said, I'm afraid I still disagree, at least with the last part of your premise :)

                  Google Search and Maps and YouTube are useful services that billions of people use every day, for free

                  As the old adage states, there's no such thing as a free lunch. I believe there is a cost, whether the bill is paid now or (in a more likely scenario) in the future. I can however back-pedal slightly on using reCAPTCHA as the poster child in this case. It's probably not the best example. That said, much like eating Vegan at McDonald's, I still feel icky about lining the pockets of a company I fundamentally disagree with.

                  In terms of egalitarianism, I just don't see it. "Egalitarian doctrines are generally characterised by the idea that all humans are equal in fundamental worth or moral status." I do not think Google's business model can make that claim. I refer you to the excellent book by Virginia Eubanks - Automating Inequality in which she makes the claim that "While we all live under this new regime of data analytics, the most invasive and punitive systems are aimed at the poor".

                  Interesting to hear about your experience at Google, thanks for sharing. I previously worked for one of the larger tech shops too and had a very similar time of it. I'm much happier having jumped ship to work on a project I consider to be more sustainable and responsible in the long term.

                  1 vote
                  1. skybrian
                    Link Parent
                    One of the reasons I'm a bit weary of tech criticism is that there is a vague mood of skepticism and mistrust that ignores important distinctions. For example, just saying "big tech" ignores...

                    One of the reasons I'm a bit weary of tech criticism is that there is a vague mood of skepticism and mistrust that ignores important distinctions. For example, just saying "big tech" ignores differences between companies that often have rather different policies and business models. In the old days a lot of people hated Micro$oft but at least they didn't confuse what different companies did or treat all tech startups the same? (Or worse, blame everything on "capitalism".) My simplistic over-generalization is that I think we are generalizing too much these days. It's a bad habit, hard to break, because being specific often requires research.

                    I haven't read Automating Inequality but, based on a quick skim of a couple reviews, it seem to be about the government misuse of computer systems? It's certainly true that the flaws in government bureaucratic systems tend to affect the poor (who can lose benefits and whose finances are precarious) more than the rich.

                    I think something more like the opposite is true of advertising-supported services, though. Advertising is targeted at people who spend money, the more the better. The most favorable demographic category is probably younger people in rich countries with lots of disposable income. But Google isn't attempting to exclude people from using these services if they don't fit the demographic. A lot of people just end up seeing advertising that's irrelevant to them. The costs are so low that free riders can be included as part of the cost of doing business.

                    It's true that valuable services aren't provided for free but that's only true statistically. There is no "free lunch" but not everyone who got a lunch paid for it. The value to you of a web search can be anything from dangerously misleading to a complete waste of time to life-changing information, and this has nothing to do with the cost.

                    But here I am talking about the consumer side of things. It's obviously not true for content producers; websites get widely different amounts of traffic depending on search ranking, and there is a tournament system where people who are trying to make money on YouTube get widely varying amounts of traffic depending on fame and chance.

                    If you want to make money or attract an audience on the Internet then it's a very unequal place. But if you're just using it to learn things, it seems a lot fairer?

                2. [2]
                  FlippantGod
                  Link Parent
                  Keep in mind that google search and youtube are absolutely not and have never claimed to be, platforms for free speech. Google itself removes content, and sometimes DCMA abuse removes content, and...

                  Keep in mind that google search and youtube are absolutely not and have never claimed to be, platforms for free speech. Google itself removes content, and sometimes DCMA abuse removes content, and sometimes countries censor content (albiet independently from google). For google search, google can prevent sites from appearing in searches (usually due to DCMA), and while it ceased operation in china in 2010 after a decision to stop censoring search results for the chinese government, has at least been working on a search engine that will meet the government's censorship requirements.

                  1. skybrian
                    Link Parent
                    Yes, when I said "egalitarian" I was thinking more about the consumer side of things. An analogy would be a store that everyone can use, but doesn't carry every product. For some things you need...

                    Yes, when I said "egalitarian" I was thinking more about the consumer side of things. An analogy would be a store that everyone can use, but doesn't carry every product. For some things you need to go elsewhere.

                    2 votes
        2. [6]
          2942
          Link Parent
          It's not "just" free labour; it's free labour for one of the richest corporations on Earth whose entire business model is based on spying on me, and the free labour is to help them spy on me...

          It's not "just" free labour; it's free labour for one of the richest corporations on Earth whose entire business model is based on spying on me, and the free labour is to help them spy on me better. No thanks.

          5 votes
          1. [3]
            elcuello
            Link Parent
            Word. The amount of trust people here give these mega corporations is baffling. This subject might not be the worst and as I understand it it actually does some good to society but how people just...

            Word. The amount of trust people here give these mega corporations is baffling. This subject might not be the worst and as I understand it it actually does some good to society but how people just take what these corporations say at face value is disheartening.

            4 votes
            1. [2]
              aphoenix
              Link Parent
              I think you have to understand that it's not really a level of trust in mega corporations, but rather a level of trust in other computer scientists who are working on these projects, and who have...

              I think you have to understand that it's not really a level of trust in mega corporations, but rather a level of trust in other computer scientists who are working on these projects, and who have published what they're doing and why.

              There is a lot of information out there about why Google does reCaptcha like this, how it's implemented, and how it helps to train self driving cars. Many of us have watched multiple talks on the matter; some of those talks have been given by acquaintances or friends, and some of us have actually looked at doing similar things for projects that we've built.

              In short, I don't trust the higher ups at Google, but there's a long list of things that I don't mistrust that leads to me believing that this is about teaching self driving cars, and not about spying. Also, as it solves a huge problem it's really easy to understand and appreciate why we go through it.

              By the way, if you want to complain about reCaptcha, there's some super relevant things to complain about - like how it takes longer to do on browsers that aren't Chrome - and that's something that does come from Google being untrustworthy. Simply trusting that the reCaptcha does what it says it does, though, isn't baffling; it's very reasonable.

              3 votes
              1. elcuello
                Link Parent
                I have no doubt that there is good people working for Google doing cool and important stuff and just to be clear reCaptcha is actually one of the better projects IMO. The problem for me is that...

                I have no doubt that there is good people working for Google doing cool and important stuff and just to be clear reCaptcha is actually one of the better projects IMO. The problem for me is that whatever comes from Google (high up or on the ground level) have a build in mistrust by default. I can't differentiate between these two and honestly I don't think other people should either just because they have friends working there. If you continue to work for a company that behaves the way Google does you must come to terms with that a lot of people won't automatically trust your intentions.

          2. skybrian
            Link Parent
            Yep, just like in the Truman Show :-)

            Yep, just like in the Truman Show :-)

            2 votes
          3. aphoenix
            Link Parent
            How does training self driving cars help google to spy on you better?

            and the free labour is to help them spy on me better. No thanks.

            How does training self driving cars help google to spy on you better?

      2. [2]
        DataWraith
        Link Parent
        The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting...

        The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting Google Tech Talk about it in 2006. It's basically about how to motivate people to do work for free, such as by packaging it as a game.

        3 votes
        1. milkbones_4_bigelow
          Link Parent
          Super interesting, thanks for the link. Bookmarked!

          Super interesting, thanks for the link. Bookmarked!

          1 vote
  2. [5]
    Moonchild
    Link
    Not an answer to your question, but I generally find the audio captchas to be easier and take less time. If nothing else, adding some to the mix would help with the monotony.

    Not an answer to your question, but I generally find the audio captchas to be easier and take less time. If nothing else, adding some to the mix would help with the monotony.

    4 votes
    1. [3]
      Keegan
      Link Parent
      Audio captchas also seem to work more frequently. On Firefox image captchas are often very slow. For the ones where you keep selecting images until no more _____ are seen, it takes forever for a...

      Audio captchas also seem to work more frequently. On Firefox image captchas are often very slow. For the ones where you keep selecting images until no more _____ are seen, it takes forever for a new image to appear. Also I frequently get told I answered wrong and have to restart the process.

      1 vote
      1. [2]
        OGWhales
        Link Parent
        The "keep answering until none are left" can be terrible. I did it until exactly zero traffic lights were left and it simply wouldn't let me confirm. So I just spam clicked a bunch of random tiles...

        The "keep answering until none are left" can be terrible. I did it until exactly zero traffic lights were left and it simply wouldn't let me confirm. So I just spam clicked a bunch of random tiles as I didn't know what else to do and that worked for whatever reason...

        1 vote
        1. balooga
          Link Parent
          When you read tomorrow's headlines about autonomous vehicles plowing through busy intersections, remember that it's all your fault for polluting the self-driving learning model with junk data. I'm...

          When you read tomorrow's headlines about autonomous vehicles plowing through busy intersections, remember that it's all your fault for polluting the self-driving learning model with junk data.

          I'm only kidding! But what a world we're living in now...

          2 votes
    2. milkbones_4_bigelow
      Link Parent
      That's a nice tip, thank you, I'll try that out next time :)

      That's a nice tip, thank you, I'll try that out next time :)

      1 vote