30 votes

Vibe Check - Let AI find you the best things

21 comments

  1. [6]
    kwyjibo
    Link
    Moxie Marlinspike shared his LLM experiment the other day. It's a recommendation engine that you can talk to. It can give you specific recommendations since it gets its data from appropriate...

    Moxie Marlinspike shared his LLM experiment the other day. It's a recommendation engine that you can talk to. It can give you specific recommendations since it gets its data from appropriate subreddits. I tested it on things I'm knowledgeable on and its recommendations were quite good. It may be of interest to some here.

    Here's how Moxie defines it:

    https://x.com/moxie/status/1783932933717561486

    I made this last weekend to experiment w/ building an app end to end on LLMs: https://vibecheck.market/

    It's like Wirecutter, but uses an LLM to recommend product choices based on reddit conversations and reviews, so you don't have to spend 20-30min reading reddit

    My experience: I'm late to the game, but this is the first time I've tried building an app end to end around an LLM.

    1. It's very fast to build something that's 90% of a solution. The problem is that the last 10% of building something is usually the hard part which really matters, and with a black box at the center of the product, it feels much more difficult to me to nail that remaining 10%. With vibecheck, most of the time the results to my queries are great; some percentage of the time they aren't. Closing that gap with gen AI feels much more fickle to me than a normal engineering problem. It could be that I'm unfamiliar with it, but I also wonder if some classes of generative AI based products are just doomed to mediocrity as a result.

    2. On the other hand, I think the discomfort I feel with having an unpredictable black box at the center of a product which can fail in very creative ways some percentage of the time might actually be a competitive advantage for startups. I think Wirecutter, especially branded as NY Times, would have a lot harder time tolerating that unpredictability and unreliability than an app like this (or an actual startup) which can set that product expectation with users to begin with. It might be that the current problems with generative AI are actually the things that create an innovator's dilemma and give startups an advantage to slip under the incumbents. It seems like the ideal spot right now would an app where 90% "done" is a mostly great experience, but the failings are still somehow not stomach-able by incumbents.

    3. I do not understand how the economics of LLMs pencil out. When I look at the per concurrent user costs associated with inference, they seem orders of magnitude higher than per concurrent user costs of previous internet technologies. It seems to me that if previous apps like webmail, messengers, etc had costs as high, they would not have been viable products. This is something I want to learn more about.

    21 votes
    1. [4]
      conception
      Link Parent
      Point 3 is so relevant. Every product has added AI but they are also all adding it as an add-on charge and that charge is always bonkers. CoPilot for business is 30/user/mo and you have to pay for...

      Point 3 is so relevant. Every product has added AI but they are also all adding it as an add-on charge and that charge is always bonkers. CoPilot for business is 30/user/mo and you have to pay for the full year, Slack’s is 10/months… it’s basically the 50-100%+ the cost of any service you use for middling results.

      11 votes
      1. [2]
        Grumble4681
        (edited )
        Link Parent
        To me point 3 almost seems tied in with the other points, like the substandard results are tolerated even by incumbents. Google literally added the AI overview to its search results, and...

        To me point 3 almost seems tied in with the other points, like the substandard results are tolerated even by incumbents. Google literally added the AI overview to its search results, and subsequently people were getting a result at the top of their Google search results telling them to add glue to their pizzas. And mind you, this wasn't an aberrant result or something hard to predict, that type of result is very predictable for LLMs but Google still did it anyhow. Damn near every company in every industry is doing it, incumbent or not it seems. Dealerships adding LLM chatbot widgets on their website that give totally wrong information etc.

        Combine this with Point 3, to me it just seems like some of the economics of various Web 2.0 ventures or other tech that showed up in that era, with Youtube being potentially the most emblematic of all of them. Services developed with little to no thought on how to monetize them in a sustainable way, with the goal of just growth and gaining users and taking venture capital money to keep growing and seemingly behind all of that was an expectation of massive returns somewhere. I think Youtube is considered to be one of those services of that era that probably had the greatest costs and its grown to such an extent that seemingly no other service could possibly absorb or handle the kinds of things Youtube does because the costs would be astronomically high with almost no possibility of ever making that type of service profitable.

        To me LLMs seem similar, though as mentioned in that Point 3, the costs are orders of magnitude higher, but to me that just says the expected returns are orders of magnitude higher. I think those who are putting money into this expect that whoever nails the right LLM products (if anyone does anyhow) is going to be sitting on a fucking goldmine the likes of which we may not have seen before or very rarely have seen in the history of human innovations. I also think there is some expectation that those who don't invest and subsequently don't end up on top of the gold mine will be holding interest in companies whose value will plummet to nothing if they're competing with companies that invested in LLM and managed to make a product or service that consumers chose over the ones that didn't. Whether that ever fully develops in that way I don't know, but that's my impression of what drives Point 3 as well as the other points too.

        Edit: Just wanted to add something I came across that I didn't intend to make such a parallel with the comment I already had but apparently it does exist.

        https://en.wikipedia.org/wiki/Semantic_Web#Web_3.0

        As the URL indicates, it's describing "Semantic Web" but that is also noting it is sometimes referred as "Web 3.0".

        Berners-Lee originally expressed his vision of the Semantic Web in 1999 as follows:

        I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize.

        So I used Web 2.0 to illustrate some aspects that are similar to how I think LLM growth is being approached, and it's interesting that in 1999 Berners-Lee describes this 'semantic web' also called Web 3.0 that sounds a lot like what we're seeing with the growth of LLMs. It further makes me think the playbook of Web 2.0 is just being extended to Web 3.0.

        11 votes
        1. ebonGavia
          Link Parent
          It annoyed the pupper-lovin' heck out of me when all the NFT bros started calling their stuff "Web3". I'm like... Dude, you realize that's already a thing, right? Name your shit something else.

          Web 3.0

          It annoyed the pupper-lovin' heck out of me when all the NFT bros started calling their stuff "Web3". I'm like... Dude, you realize that's already a thing, right? Name your shit something else.

          3 votes
      2. teaearlgraycold
        Link Parent
        This is why Apple is going for local inference. Long after the VC money has ran out the players that make their customers own the inference hardware will keep chugging along.

        This is why Apple is going for local inference. Long after the VC money has ran out the players that make their customers own the inference hardware will keep chugging along.

        3 votes
    2. Apex
      Link Parent
      This is amazing, thank you - I need to look into products for my upcoming newborn and this is so helpful.

      This is amazing, thank you - I need to look into products for my upcoming newborn and this is so helpful.

      2 votes
  2. [6]
    Grumble4681
    Link
    So I just tried this with something I recently bought, an electric kettle. I did my own research of course and then picked what I thought was best, and I generally do a fair amount of research...

    So I just tried this with something I recently bought, an electric kettle. I did my own research of course and then picked what I thought was best, and I generally do a fair amount of research even if I'm buying something fairly cheap.

    It presented me with three kettles, two that were within the budget (it asked if I preferred the price being under $50 which I said yes). The second one that was within the budget was the kettle I actually had purchased based on my own research (before I even knew this tool existed). So on that hand I was impressed, I could have potentially spared myself a couple hours of researching and just picked one of the kettles this thing had chosen and arrived to the same result.

    However, I just got the kettle today and tested it out earlier, and there was one parameter I didn't give enough consideration to at the time of purchasing, and it also wasn't really accounted for in the questions that website guided me through. So the height of the kettle I bought is a little too tall for the sink I planned on using it with which makes it difficult to fill the kettle. I noticed that site lets you add additional parameters for it to consider beyond just the guided questions, so I tried to use that by telling it I wanted something less than 1.7 liter capacity with various phrasings, and it would never show me anything lower than 1.7 liter capacity, as that is what I would have done if I had accounted for that extra parameter prior to purchasing. Well actually I'd probably have more specifically accounted for the height of the kettles, but the capacity is likely to have strong correlation with height and the capacity is more prominently displayed and more likely to be accurate than any dimensions listed. I also presume capacity is more likely to be discussed in reddit posts or such, so I figured that would be a better parameter to give this 'AI' tool.

    One other note on that site, I just went back to it and tried it again, I put in electric kettle again, and it asked me different questions this time than it did the time before. Two of the recommendations were the same as the first time, but the third recommendation was different, which is actually the one I bought, so the second time doing this it had replaced the recommendation of the kettle I had actually purchased with a different electric kettle based on the parameters of the new questions. Again I tried to add custom parameters and it didn't seem to work.

    There's also an error on the website saying the servers are overloaded and to try again later, but it still produced results so I don't know if that means it's working properly or not.

    14 votes
    1. [5]
      ThrowdoBaggins
      Link Parent
      My first question is how many kettles on the market and talked about are less than 1.7L (after all, it seems like it pulls results from discussions, not from the infinite unparsable wider...

      so I tried to use that by telling it I wanted something less than 1.7 liter capacity with various phrasings, and it would never show me anything lower than 1.7 liter capacity

      My first question is how many kettles on the market and talked about are less than 1.7L (after all, it seems like it pulls results from discussions, not from the infinite unparsable wider internet)

      My second comment is that if this is built off a language model, there’s nothing in language that can tell you what “less” or “more” or “1.7L” actually means — those are translating language into real world parameters and comparing them. I’m not surprised that the bot failed at providing an answer unless the dataset it’s pulling from also has that question and answer regularly enough to pick up.

      1 vote
      1. [4]
        Grumble4681
        Link Parent
        I mean it definitely seems like 1.7L is more common. I was looking some up after I discovered the one I got was perhaps a little too tall, and AmazonBasics had one, and a few other known brands...

        I mean it definitely seems like 1.7L is more common. I was looking some up after I discovered the one I got was perhaps a little too tall, and AmazonBasics had one, and a few other known brands had some 1-1.2L or so and there's probably more but I haven't looked too hard yet, but yeah I suspected it wasn't as common a concern to other buyers of kettles and it only is for me because I'm dealing with a more unusual living circumstance.

        I also considered that second comment, I know LLMs are bad with numbers or just don't really comprehend them in the way that we want, so I also just phrased it to say 'lower capacity' without using numbers, and tried other phrasings too, but I suspect that probably none of that would matter if it doesn't really know what 1.7L is. I'm surprised they aren't made in such a way to understand through context that the user is attempting to do some kind of math or utilize numbers in a way that isn't just speaking a language and have some kind of 'math mode' to not utilize it's language model on those specific pieces.

        1. [3]
          ThrowdoBaggins
          Link Parent
          I suspect that’s less an issue with whichever model it’s using under the hood, and more to do with the fact that language itself doesn’t have physical dimensions encoded into it, and you as a...

          I'm surprised they aren't made in such a way to understand through context that the user is attempting to do some kind of math or utilize numbers in a way that isn't just speaking a language and have some kind of 'math mode' to not utilize it's language model on those specific pieces.

          I suspect that’s less an issue with whichever model it’s using under the hood, and more to do with the fact that language itself doesn’t have physical dimensions encoded into it, and you as a person just understand things because your language has the fallback of lived experience.

          My favourite example of how language doesn’t encode for physical reality:

          “The trophy couldn’t fit in the suitcase because it was too big”

          What does “it” refer to in this sentence? And do you know that because of language, or because of lived experience?
          Now consider:

          “The trophy couldn’t fit in the suitcase because it was too small”

          What does “it” refer to in this sentence? Is it the same as in the first sentence? Why/why not? And how could an infinitely intelligent language model parse these sentences into reality in order to get a concept of “big” or “small” or even “in”?

          3 votes
          1. [2]
            em-dash
            Link Parent
            I mean, those are easy without lived experience, given just the additional fact "a thing will only fit inside another thing if the inner thing is smaller than the outer thing". That seems within...

            I mean, those are easy without lived experience, given just the additional fact "a thing will only fit inside another thing if the inner thing is smaller than the outer thing". That seems within the abilities of current-day AI.

            ChatGPT:

            What does "it" refer to in “The trophy couldn’t fit in the suitcase because it was too big”?

            In the sentence, "it" refers to "the trophy." The phrase "it was too big" indicates that the reason the trophy couldn't fit in the suitcase is because the trophy itself was too large in size. Therefore, "it" directly relates to the trophy mentioned earlier in the sentence.

            and then in a new chat, to avoid context clues from the first example:

            What does "it" refer to in “The trophy couldn’t fit in the suitcase because it was too small”?

            In the sentence "The trophy couldn’t fit in the suitcase because it was too small," the pronoun "it" refers to the suitcase, not the trophy. The meaning of the sentence is that the trophy couldn't fit into the suitcase because the suitcase itself was too small, not the trophy.

            (this is literally the only case where I don't feel extremely weird about responding to someone with "I asked ChatGPT and here's what it said")

            1 vote
            1. ThrowdoBaggins
              (edited )
              Link Parent
              Edit: after writing and posting, I don’t think I have anything that actually contributes to the conversation, so I’d like to retract this comment. Leaving it up so I don’t accidentally mess with...

              Edit: after writing and posting, I don’t think I have anything that actually contributes to the conversation, so I’d like to retract this comment. Leaving it up so I don’t accidentally mess with notifications.

              I’m not surprised that ChatGPT solved that because the trophy-suitcase problem is one I’ve come across on the internet long before ChatGPT was available.

              For what it’s worth, ChatGPT has absolutely no confidence in the solution because simply telling it “incorrect” is enough for it to change its answer and give a nonsensical reply.

              That doesn’t change the fact that the words don’t carry any information from the real world, so ChatGPT is also happy to have a go at the inverse

              In the sentence "The suitcase couldn’t fit into the trophy because it was too small," "it" refers to:
              1. The suitcase couldn’t fit into the trophy.
              2. Because the trophy was too small.
              So, "it" refers to the trophy being too small to accommodate the suitcase.

              In the sentence "The suitcase couldn’t fit into the trophy because it was too big," "it" refers to:
              1. The suitcase couldn’t fit into the trophy.
              2. Because the suitcase was too big.
              So, "it" refers to the suitcase being too big to fit into the trophy.

              1 vote
  3. zoroa
    (edited )
    Link
    I gave it a try, and it gave me the same vibes as a lot of LLM powered tools I've tried. I spend more energy scrutinizing the output of the tool, than I would've spent just doing the task on my...

    I gave it a try, and it gave me the same vibes as a lot of LLM powered tools I've tried. I spend more energy scrutinizing the output of the tool, than I would've spent just doing the task on my own.

    • There were a couple times where it'd ask me for a price range, then give me 3 options that were all over my limit.
    • It once made a recommendation off of a Reddit comment it misinterpreted to be in my price range:
      • The reddit comment was essentially saying "You can't buy the item invidividually, but you can split a 2-pack that costs $310 with a friend. So it comes out to $155 for each person". I had read that comment before trying the LLM, and dismissed it because that didn't work for me. The LLM later presented that same comment as "You can buy 2 for $310, which comes out to $155 for 1 which is under your budget".

    edit: grammar

    5 votes
  4. [2]
    Sassanix
    Link
    This works well with core products, but it can easily be gamed by fake reviews. If you want to get recommendation on accessories for a core product, then it doesn’t do too well.

    This works well with core products, but it can easily be gamed by fake reviews.

    If you want to get recommendation on accessories for a core product, then it doesn’t do too well.

    4 votes
    1. kwyjibo
      Link Parent
      I'm sure it's very susceptible to giving ridiculous recommendations, but I've not encountered that myself yet. (Small sample size, though.) I've recently gotten into fountain pens and all the...

      I'm sure it's very susceptible to giving ridiculous recommendations, but I've not encountered that myself yet. (Small sample size, though.)

      I've recently gotten into fountain pens and all the other things related to it, which are obscure enough. Every single recommendation it gave me was accurate. For example, Tomoe River paper is often recommended as the best paper for fountain pens (if not for every pen) but it's not cheap. I wanted it to give me a cheaper option and it recommended me Clairefontaine, a brand that's not well known outside of France. I personally use their paper and they're really good. Same with the ink. It could've recommended me an ink from a better known brand like Parker or Sailor, which wouldn't be an inaccurate recommendation but it'd have been a lazy one, so instead it recommended me Diamine, which are known for their price to performance ratio within that community.

      As with every LLM based product, you just have to keep in mind that you're interacting with a bullshit machine that can often be wrong. But they can be handy, too.

      4 votes
  5. creesch
    Link
    It's a somewhat interesting tool, but the way it presents recommendations doesn't do it for me. As others have stated, I'd like it to actually source its claims. Not just for me to verify the...

    It's a somewhat interesting tool, but the way it presents recommendations doesn't do it for me. As others have stated, I'd like it to actually source its claims. Not just for me to verify the claims, but also because discussions on communities like reddit often contain other information I might not have considered.

    There is also the fact that the information it gives is somewhat limited. Some bullet points clearly are based on the context of a conversation as well. Which, to be fair, is in the name. A "vibe check" is not the same as a comprehensive breakdown.

    I suppose there is value here though. Not necessarily for me, because I am fairly confident I can do my own research online and arrive at similar, if not better, results. However, a lot of people don't spend as much time online as I do and don't have the affinity with doing research for products. I feel like this tool does give better results than the tons of listicles people would come across when naively googling for things.

    4 votes
  6. em-dash
    Link
    I would love to see it cite sources, because my weird requirements for things are often the sorts of things manufacturers don't like to advertise. I asked it for a "robot lawnmower without cloud...

    I would love to see it cite sources, because my weird requirements for things are often the sorts of things manufacturers don't like to advertise.

    I asked it for a "robot lawnmower without cloud services". All three of its recommendations were mowers I had previously rejected because they looked like they might depend on cloud services, and I hadn't been able to get a straight answer from the manufacturers (in one case I even contacted them directly and got a confused "why would you want that" answer back). If it could point me to something, even a confident-sounding reddit comment from someone saying "I have this exact one and it works without cloud services", I would consider it much more useful.

    2 votes
  7. [2]
    patience_limited
    Link
    I spent hours searching for a reliable bike lock that's not easy to break or cut, to go with the new e-bike. My rural community is blessed with premier trails and biking amenities. There are...

    I spent hours searching for a reliable bike lock that's not easy to break or cut, to go with the new e-bike. My rural community is blessed with premier trails and biking amenities. There are enough high-end bikes in town that thieves with angle grinders are a problem.

    Vibe Check came up with the lock I got (Litelok X) as the number two cut-resistant pick after a Hiplok that's more expensive. It's relatively new on the market, so I'm surprised it came up at all.

    Aside from the "Bummer! Servers are currently overloaded. Please try your query later" messages, I'll definitely keep trying this tool.

    1 vote
    1. goryramsy
      Link Parent
      Off-topic comment here. If it's thieves with angle grinders you're trying to deter, you can use a soft cloth (think: stockings/women's leggings/socks) and a bit of glue around the chain/ring to...

      Off-topic comment here. If it's thieves with angle grinders you're trying to deter, you can use a soft cloth (think: stockings/women's leggings/socks) and a bit of glue around the chain/ring to break any angle grinders. Also, if it gets cold there, it'll help your hands.

      2 votes
  8. Lapbunny
    Link
    I asked it for coffee and it gave me Nguyen Coffee Supply. I may try it? Then again, I asked it for the best roadster under $10,000 that isn't a Miata. It then gave me - by searching r/stocks,...

    I asked it for coffee and it gave me Nguyen Coffee Supply. I may try it?

    Then again, I asked it for the best roadster under $10,000 that isn't a Miata. It then gave me - by searching r/stocks, r/digitalnomad,, r/BestofRedditorUpdates, r/KansasCityChiefs, r/Acura, and r/leagueoflegends - the Miata, the BRZ, and a Civic Si.

    1 vote
  9. slampisko
    Link
    I don't live in the US and I'm not in the market for any products anyway, so I tried to have it recommend a band (specific genre) and a podcast (specific type) and the recs were actually pretty good!

    I don't live in the US and I'm not in the market for any products anyway, so I tried to have it recommend a band (specific genre) and a podcast (specific type) and the recs were actually pretty good!