10 votes

reCAPTCHA: Is there method in monotony?

Posted January 6, 2020 by milkbones_4_bigelow

Tags: security, google, ask.experts, recaptcha, artificial intelligence

What started out as a little facetious in my own head leads me now to a serious question. Is there some meaningful reason why Google has to use a subsection of images for reCAPTCHA? I really dislike having to do this and at the very least would appreciate some variation.

Traffic Lights
Buses
Bicycles
Cars
Crosswalks

Is there something special about these things in this context? Is the visual noise they're usually associated with what makes them good candidates? Are Google just really into urban planning? Who knows...I'm hoping some Tilder smarter than I can help me out.

28 comments

[24]
aphoenix
January 6, 2020
Link
In short: it's used to help train self driving vehicles.

In short: it's used to help train self driving vehicles.

17 votes
1. [23]
  milkbones_4_bigelow (OP)
  January 6, 2020
  Link Parent
  Ah ha! Free labour, I should have known. Appreciate the link. Sinister stuff, bud sadly, unsurprising. Out of curiosity, are those data sets then made available or locked up? I'm guessing the...
  
  Ah ha! Free labour, I should have known. Appreciate the link. Sinister stuff, bud sadly, unsurprising. Out of curiosity, are those data sets then made available or locked up? I'm guessing the latter, but who knows.
  
  5 votes
  1. [20]
    skybrian
    January 6, 2020
    Link Parent
    The "free labor" thing is a common complaint but I find it strange. It's a trivial amount of effort and they would still need to show captchas if the results were thrown away. An analogy: if...
    
    The "free labor" thing is a common complaint but I find it strange. It's a trivial amount of effort and they would still need to show captchas if the results were thrown away.
    
    An analogy: if someone invented a device to harvest a bit of energy from people's footsteps when walking down a sidewalk then I would consider that clever rather than sinister.
    
    9 votes
    
    [14]
    milkbones_4_bigelow (OP)
    January 6, 2020 (edited January 6, 2020)
    Link Parent
    Hey skybrian, thanks for your reply and for making me think a little more deeply about my gut reaction. That said, I think there are still issues here, at least for me. The core of this being...
    
    Hey skybrian, thanks for your reply and for making me think a little more deeply about my gut reaction. That said, I think there are still issues here, at least for me. The core of this being transparency and intent.
    
    Transparency: It wasn't clear to me what this information is being used for. Nobody likes a masquerade.
    
    Intent: If footsteps were somehow harnessed to produce power (firstly, this would be awesome) which was then freely distributed to the poorest most in need individuals, I'd be happy to walk (or click pictures of traffic lights all day) that said, I imagine the primary purpose of any power culminating from this data set is to benefit Google and it's stakeholders alone. Naturally this point is less important should the data be made available in the public domain (it not being is speculation on my part), thus contributing to road safety for everyone. What do you think?
    
    6 votes
    
    [5]
    cfabbro
    January 6, 2020 (edited January 6, 2020)
    Link Parent
    Google is providing, for free, an invaluable service used by millions of websites to combat spam, but it isn't free for them to develop, run and maintain, so why should they not get some return...
    
    Google is providing, for free, an invaluable service used by millions of websites to combat spam, but it isn't free for them to develop, run and maintain, so why should they not get some return value out of the process? And if they used pictures not tied to any of their other projects, all that categorization effort would simply go to waste.
    
    And just FYI, they don't hide the fact they do this, and already are transparent about it:
    https://www.google.com/recaptcha/intro/v3.html
    
    Hundreds of millions of CAPTCHAs are solved by people every day. reCAPTCHA makes positive use of this human effort by channeling the time spent solving CAPTCHAs into annotating images and building machine learning datasets. This in turn helps improve maps and solve hard AI problems.
    
    https://www.google.com/recaptcha/intro/invisible.html?ref=producthunt
    
    reCAPTCHA improves our knowledge of the physical world by creating CAPTCHAs out of text visible on Street View imagery. As people verify the text in these CAPTCHAs, this information is used to make Google Maps more precise and complete. So if you're a Google Maps user, your experience (and everyone else's) will be even better.
    
    reCAPTCHA helps solve hard problems in Artificial Intelligence. High quality human labelled images are compiled into datasets that can be used to train Machine Learning systems. Research communities benefit from such efforts that help build the next generation of groundbreaking Artificial Intelligence solutions.
    
    reCAPTCHA digitizes books by turning words that cannot be read by computers into CAPTCHAs for people to solve. Word by word, a book is digitized and preserved online for people to find and read.
    
    https://support.google.com/recaptcha/?hl=en
    
    reCAPTCHA also makes positive use of the human effort spent in solving CAPTCHAs by using the solutions to digitize text, annotate images, and build machine-learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems.
    
    etc. etc. etc.
    
    9 votes
    
    [3]
    milkbones_4_bigelow (OP)
    January 7, 2020 (edited January 7, 2020)
    Link Parent
    hey cfabbro, I appreciate you taking the time to put this together. I hadn't seen those initially. I only hope the data is used positively and is for the benefit of all people. Thanks again.
    
    hey cfabbro, I appreciate you taking the time to put this together. I hadn't seen those initially. I only hope the data is used positively and is for the benefit of all people. Thanks again.
    
    2 votes
    
    [2]
    cfabbro
    January 7, 2020 (edited January 7, 2020)
    Link Parent
    NP. p.s. Don't get me wrong, I wish everything google (and every mega-corporation) did was altruistic and benefited all mankind as well, instead of just their owners/shareholders to the detriment...
    
    NP. p.s. Don't get me wrong, I wish everything google (and every mega-corporation) did was altruistic and benefited all mankind as well, instead of just their owners/shareholders to the detriment of everyone else (as seems to be the norm). However, the unfortunate reality is that they are a for-profit corporation, and since we're not in a post-scarcity world yet, IMO the best we can really hope for from them is a tiny bit of transparency and some mutual benefit... which I think reCAPTCHA at least seems to achieve. There are plenty of reasons to dislike and distrust google, but I don't think this particular project of theirs is one of them.
    
    milkbones_4_bigelow (OP)
    January 9, 2020
    Link Parent
    That's fair enough. Glad everyone got a chance to express their opinion.
    
    There are plenty of reasons to dislike and distrust google, but I don't think this particular project of theirs is one of them.
    
    That's fair enough. Glad everyone got a chance to express their opinion.
    
    1 vote
    
    FlippantGod
    January 9, 2020 (edited January 9, 2020)
    Link Parent
    I have a few issues. For one thing, websites opt in, not users. But the big issue is that users visiting sites on firefox recieve more captchas than those browsing on chrome. There are studies...
    
    I have a few issues. For one thing, websites opt in, not users. But the big issue is that users visiting sites on firefox recieve more captchas than those browsing on chrome. There are studies showing that reducing page load times increases sales.
    
    What about taking five seconds for an image to slowly fade in so you can click it and then do that for half a minute?
    
    Google is generating value for their own computer vision products but also their broswer, at the expense of firefox users whose only options are to go along with this, switch to chrome, or deny themselves access to websites not even affiliated with google.
    
    Furthermore, iirc, Google developed software to distinguish the cursor behaviors of human users vs bots, so they could have done away with the image captchas entirely as I understand it.
    
    Edit: additionally, the quotes you provided make no claim that the datasets are ever shared with the research community, just that such efforts help improve machine learning. And THEN, even the digitized books are not necessarily made publically available; the digitized text is indeed preserved as they say, by Google, who uses the materials as another dataset I'm sure. In all likelyhood, only a minisulcule fraction of digitized texts are probably ever publically viewable due to copywrites and intellectual property laws. Instead Google seems to provide one or two pages of a work as a preview. Usually the table of contents in my experience...
    
    So out of those claims, the only real one was that it improves google maps, thus bringing more value to thsir products over the competition.
    
    I am certain that captcha is not intended for any altruistic purpose.
    
    1 vote
    
    [8]
    skybrian
    January 6, 2020
    Link Parent
    Regarding transparency, I believe in explaining how things work, and also that many users don't really care about how things work most of the time, until one day they decide to ask. Probably there...
    
    Regarding transparency, I believe in explaining how things work, and also that many users don't really care about how things work most of the time, until one day they decide to ask. Probably there should be a link to an explanation as part of the reCaptcha. I wonder if there's one already and how does it work? (I don't have any easy way to trigger it.)
    
    As far as intent goes, it seems rather unlikely that the data will benefit Google alone. If they build it into a product then won't users benefit from it too?
    
    It doesn't look like Google publishes the results from classifying these images and I'd guess they won't do it because it would help the spammers who are trying to crack reCaptcha. But Google does publish an Open Images Dataset for machine language researchers to use.
    
    Disclaimer: I worked for Google for almost a dozen years, but not in this area, and I'm out of touch these days.
    
    7 votes
    
    [7]
    milkbones_4_bigelow (OP)
    January 7, 2020 (edited January 7, 2020)
    Link Parent
    I just took a look but there's no link funnily enough. That could be a nice addition though. To your point about benefit, I'm a little unsure. I guess at the core of it, the services Google...
    
    I just took a look but there's no link funnily enough. That could be a nice addition though.
    
    To your point about benefit, I'm a little unsure. I guess at the core of it, the services Google provide are not in the interests of the greater good, I could be wrong here of course. It's my understanding Google is an advertising company that uses its user's data as a commodity which is of much greater value than the service(s) they provide. I'd love to spend some time digging into it in more detail.
    
    Out of curiosity, how was your experience working there? I'm certain there were/are a ton of decent, super smart, non nefarious folks there. I don't mean to make generalisations :) Thanks for continuing the discussion skybrian :)
    
    1 vote
    
    [6]
    skybrian
    January 8, 2020
    Link Parent
    Google Search and Maps and YouTube are useful services that billions of people use every day, for free. There are alternatives, sure, and people outside the company don't get to decide how these...
    
    Google Search and Maps and YouTube are useful services that billions of people use every day, for free. There are alternatives, sure, and people outside the company don't get to decide how these services work. But still, it seems very egalitarian? No matter how rich or poor you are, and for most countries and languages, if you can get online, you can use their services, and isn't universal free service a "greater good?" But I guess a lot of Internet services are like that. We take so much for granted.
    
    I don't think there's a way to rigorously compare the value of advertising data to the value of the services Google provides. Economists know how to compare things that have prices, and one side of the comparison isn't priced. It's hard to compare something given away for free to billions of people to something that makes billions of dollars.
    
    But, just as one datapoint, I find it hard to imagine how advertisers could have gained much from all those years of watching me ignore their ads. I don't always have an ad blocker installed, but ad blindness is a thing and my guess is that Wirecutter has had more influence on my buying decisions than all of Google's advertiser's combined.
    
    Since many others are the same way, I find Google's way of making money to be rather mysterious. I know it comes from ads, but still, something doesn't add up. For a long time I suspected that online advertising was in some kind of bubble and that eventually advertisers would wise up and revenues would crash. But it didn't happen. Ad spending keeps growing, so it must somehow be working for them?
    
    Regarding working there, the pay and benefits were excellent, enough to eventually retire and putter around. So, I can't really complain because they were very good to me!
    
    Despite this, I wouldn't say my morale was all that great when I was there. It always felt like some other teams were doing amazing things and we'd hear about them at TGIF, but for various reasons my particular team wasn't going to move the needle. I often thought of leaving Google, but the money and benefits seemed too good, so I ended up changing teams in pursuit of something meaningful.
    
    The overall mood changed dramatically over the years. At the beginning, there was a big whiteboard with a whimsical "world domination plan" drawn on it and we all thought it was pretty funny, self-deprecating humor. Also, there was definitely a sense that we felt like we were all in it together.
    
    There were always weird political fights on mailing lists (like whether the microkitchens should have water bottles) but they seemed inconsequential back then. The Google+ launch was sort of the beginning of the end where it seemed like it was clueless management versus activists. (Remember the "real name" debate?) It got worse after Trump's election and now it seems trust largely evaporated even internally.
    
    Part of the reason I think it went south is that in a big company that employees identify with, when some distant part of the company screws up in a public way, lots of people get upset, since they put a lot of self-worth in the company's reputation. This results in a lot of angst over things that don't have much to do with your job. I think smaller companies have it better in this respect: most of the things that go wrong in the world cannot possibly be your company's fault.
    
    2 votes
    
    [3]
    milkbones_4_bigelow (OP)
    January 9, 2020 (edited January 9, 2020)
    Link Parent
    Hey skybrian, I appreciate your detailed reply. With that said, I'm afraid I still disagree, at least with the last part of your premise :) As the old adage states, there's no such thing as a free...
    
    Hey skybrian,
    
    I appreciate your detailed reply. With that said, I'm afraid I still disagree, at least with the last part of your premise :)
    
    Google Search and Maps and YouTube are useful services that billions of people use every day, for free
    
    As the old adage states, there's no such thing as a free lunch. I believe there is a cost, whether the bill is paid now or (in a more likely scenario) in the future. I can however back-pedal slightly on using reCAPTCHA as the poster child in this case. It's probably not the best example. That said, much like eating Vegan at McDonald's, I still feel icky about lining the pockets of a company I fundamentally disagree with.
    
    In terms of egalitarianism, I just don't see it. "Egalitarian doctrines are generally characterised by the idea that all humans are equal in fundamental worth or moral status." I do not think Google's business model can make that claim. I refer you to the excellent book by Virginia Eubanks - Automating Inequality in which she makes the claim that "While we all live under this new regime of data analytics, the most invasive and punitive systems are aimed at the poor".
    
    Interesting to hear about your experience at Google, thanks for sharing. I previously worked for one of the larger tech shops too and had a very similar time of it. I'm much happier having jumped ship to work on a project I consider to be more sustainable and responsible in the long term.
    
    1 vote
    
    [2]
    skybrian
    January 9, 2020
    Link Parent
    One of the reasons I'm a bit weary of tech criticism is that there is a vague mood of skepticism and mistrust that ignores important distinctions. For example, just saying "big tech" ignores...
    
    One of the reasons I'm a bit weary of tech criticism is that there is a vague mood of skepticism and mistrust that ignores important distinctions. For example, just saying "big tech" ignores differences between companies that often have rather different policies and business models. In the old days a lot of people hated Micro$oft but at least they didn't confuse what different companies did or treat all tech startups the same? (Or worse, blame everything on "capitalism".) My simplistic over-generalization is that I think we are generalizing too much these days. It's a bad habit, hard to break, because being specific often requires research.
    
    I haven't read Automating Inequality but, based on a quick skim of a couple reviews, it seem to be about the government misuse of computer systems? It's certainly true that the flaws in government bureaucratic systems tend to affect the poor (who can lose benefits and whose finances are precarious) more than the rich.
    
    I think something more like the opposite is true of advertising-supported services, though. Advertising is targeted at people who spend money, the more the better. The most favorable demographic category is probably younger people in rich countries with lots of disposable income. But Google isn't attempting to exclude people from using these services if they don't fit the demographic. A lot of people just end up seeing advertising that's irrelevant to them. The costs are so low that free riders can be included as part of the cost of doing business.
    
    It's true that valuable services aren't provided for free but that's only true statistically. There is no "free lunch" but not everyone who got a lunch paid for it. The value to you of a web search can be anything from dangerously misleading to a complete waste of time to life-changing information, and this has nothing to do with the cost.
    
    But here I am talking about the consumer side of things. It's obviously not true for content producers; websites get widely different amounts of traffic depending on search ranking, and there is a tournament system where people who are trying to make money on YouTube get widely varying amounts of traffic depending on fame and chance.
    
    If you want to make money or attract an audience on the Internet then it's a very unequal place. But if you're just using it to learn things, it seems a lot fairer?
    
    milkbones_4_bigelow (OP)
    February 2, 2020 (edited February 3, 2020)
    Link Parent
    Apologies for the delay in responding skybrian. I agree, ignoring important distinctions is a problem. I could do with being more mindful of this. That said, sticking with Google, it's difficult...
    
    Apologies for the delay in responding skybrian.
    
    I agree, ignoring important distinctions is a problem. I could do with being more mindful of this. That said, sticking with Google, it's difficult for me to quantify the net positive/negative considering their reach and given the story hasn't yet fully played out. I can acknowledge my feelings towards Google are just that.
    
    Re: "free riders/not everyone who got a lunch paid for it". I still do not believe the ride is free. Even if people see advertisements that are irrelevant to them on the grounds of economic exclusion they still pay with their data that is mined and that has the potential of being abused.
    
    From Permanent Record
    
    "Ultimately, saying that you don't care about privacy because you have nothing to hide is no different from saying you don't care about freedom of speech because you have nothing to say. Or that you don't care about freedom of the press because you don't like to read. Or that you don't care about freedom of religion because you don't believe in God. Or that you don't care about the freedom to peacably assemble because you're a lazy, antisocial agoraphobe."
    
    I appreciate your point about making money and the difficulty that entails for say, creating content on YouTube but perhaps the question we ought to be asking is what are more ethical/sustainable alternatives? I am not interested for example in methods to increase income for the dairy industry just by virtue of it being one of the more ubiquitous methods of agriculture. It's brutal and unnecessary and I would rather seek alternatives that generate less harm.
    
    Whatever the case, it seems this thread is going on a bit. Just to close from my side, thanks skybrian for your thoughtful responses and for making me think more deeply about this.
    
    1 vote
    
    [2]
    FlippantGod
    January 9, 2020
    Link Parent
    Keep in mind that google search and youtube are absolutely not and have never claimed to be, platforms for free speech. Google itself removes content, and sometimes DCMA abuse removes content, and...
    
    Keep in mind that google search and youtube are absolutely not and have never claimed to be, platforms for free speech. Google itself removes content, and sometimes DCMA abuse removes content, and sometimes countries censor content (albiet independently from google). For google search, google can prevent sites from appearing in searches (usually due to DCMA), and while it ceased operation in china in 2010 after a decision to stop censoring search results for the chinese government, has at least been working on a search engine that will meet the government's censorship requirements.
    
    skybrian
    January 9, 2020
    Link Parent
    Yes, when I said "egalitarian" I was thinking more about the consumer side of things. An analogy would be a store that everyone can use, but doesn't carry every product. For some things you need...
    
    Yes, when I said "egalitarian" I was thinking more about the consumer side of things. An analogy would be a store that everyone can use, but doesn't carry every product. For some things you need to go elsewhere.
    
    2 votes
    
    [6]
    
    Comment deleted by author
    Link Parent
    
    [3]
    elcuello
    January 7, 2020
    Link Parent
    Word. The amount of trust people here give these mega corporations is baffling. This subject might not be the worst and as I understand it it actually does some good to society but how people just...
    
    Word. The amount of trust people here give these mega corporations is baffling. This subject might not be the worst and as I understand it it actually does some good to society but how people just take what these corporations say at face value is disheartening.
    
    4 votes
    
    [2]
    aphoenix
    January 7, 2020
    Link Parent
    I think you have to understand that it's not really a level of trust in mega corporations, but rather a level of trust in other computer scientists who are working on these projects, and who have...
    
    I think you have to understand that it's not really a level of trust in mega corporations, but rather a level of trust in other computer scientists who are working on these projects, and who have published what they're doing and why.
    
    There is a lot of information out there about why Google does reCaptcha like this, how it's implemented, and how it helps to train self driving cars. Many of us have watched multiple talks on the matter; some of those talks have been given by acquaintances or friends, and some of us have actually looked at doing similar things for projects that we've built.
    
    In short, I don't trust the higher ups at Google, but there's a long list of things that I don't mistrust that leads to me believing that this is about teaching self driving cars, and not about spying. Also, as it solves a huge problem it's really easy to understand and appreciate why we go through it.
    
    By the way, if you want to complain about reCaptcha, there's some super relevant things to complain about - like how it takes longer to do on browsers that aren't Chrome - and that's something that does come from Google being untrustworthy. Simply trusting that the reCaptcha does what it says it does, though, isn't baffling; it's very reasonable.
    
    3 votes
    
    elcuello
    January 8, 2020
    Link Parent
    I have no doubt that there is good people working for Google doing cool and important stuff and just to be clear reCaptcha is actually one of the better projects IMO. The problem for me is that...
    
    I have no doubt that there is good people working for Google doing cool and important stuff and just to be clear reCaptcha is actually one of the better projects IMO. The problem for me is that whatever comes from Google (high up or on the ground level) have a build in mistrust by default. I can't differentiate between these two and honestly I don't think other people should either just because they have friends working there. If you continue to work for a company that behaves the way Google does you must come to terms with that a lot of people won't automatically trust your intentions.
    
    skybrian
    January 7, 2020
    Link Parent
    Yep, just like in the Truman Show :-)
    
    Yep, just like in the Truman Show :-)
    
    2 votes
    
    aphoenix
    January 7, 2020
    Link Parent
    How does training self driving cars help google to spy on you better?
    
    and the free labour is to help them spy on me better. No thanks.
    
    How does training self driving cars help google to spy on you better?
  2. [2]
    DataWraith
    January 7, 2020
    Link Parent
    The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting...
    
    The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting Google Tech Talk about it in 2006. It's basically about how to motivate people to do work for free, such as by packaging it as a game.
    
    3 votes
    
    milkbones_4_bigelow (OP)
    January 7, 2020
    Link Parent
    Super interesting, thanks for the link. Bookmarked!
    
    Super interesting, thanks for the link. Bookmarked!
    
    1 vote
[4]
Moonchild
January 7, 2020
Link
Not an answer to your question, but I generally find the audio captchas to be easier and take less time. If nothing else, adding some to the mix would help with the monotony.

Not an answer to your question, but I generally find the audio captchas to be easier and take less time. If nothing else, adding some to the mix would help with the monotony.

4 votes
1. [3]
  
  Comment deleted by author
  Link Parent
  1. [2]
    OGWhales
    January 7, 2020
    Link Parent
    The "keep answering until none are left" can be terrible. I did it until exactly zero traffic lights were left and it simply wouldn't let me confirm. So I just spam clicked a bunch of random tiles...
    
    The "keep answering until none are left" can be terrible. I did it until exactly zero traffic lights were left and it simply wouldn't let me confirm. So I just spam clicked a bunch of random tiles as I didn't know what else to do and that worked for whatever reason...
    
    1 vote
    
    balooga
    January 7, 2020
    Link Parent
    When you read tomorrow's headlines about autonomous vehicles plowing through busy intersections, remember that it's all your fault for polluting the self-driving learning model with junk data. I'm...
    
    When you read tomorrow's headlines about autonomous vehicles plowing through busy intersections, remember that it's all your fault for polluting the self-driving learning model with junk data.
    
    I'm only kidding! But what a world we're living in now...
    
    2 votes
2. milkbones_4_bigelow (OP)
  January 7, 2020
  Link Parent
  That's a nice tip, thank you, I'll try that out next time :)
  
  That's a nice tip, thank you, I'll try that out next time :)
  
  1 vote