2 votes

Chat Jimmy - A nearly instantaneous AI chatbot

3 comments

  1. [3]
    gco
    Link
    This is a very interesting showcase, more details are available at https://taalas.com/the-path-to-ubiquitous-ai/ A potential positive here is that this can be used in specific cases where models...

    This is a very interesting showcase, more details are available at https://taalas.com/the-path-to-ubiquitous-ai/
    A potential positive here is that this can be used in specific cases where models are not required to complete complex ambiguous tasks, hopefully alleviating the current pressure on electronic components like RAM and GPUs.

    1 vote
    1. creesch
      Link Parent
      Man, they could have made that much easier to read. The tl;dr seems to be that instead of using general purpose hardware their offering is hardware specifically build for specific models. So far...

      Man, they could have made that much easier to read. The tl;dr seems to be that instead of using general purpose hardware their offering is hardware specifically build for specific models. So far the only model they are offering is Llama 3.1 8B.
      Which is a relatively light model to run to begin with. But to their credit a quick test does seem to validate their speed claims:

      • Their claimed speed in chat: Generated in 0.018s • 15,623 tok/s
      • Running llama 3.1 8b with Q4 in llama.cpp: 239 tokens 2.2s 108.61 t/s

      If this approach scales up to more competent models it can potentially be interesting depending on a variety of factors like the actual hardware cost involved. Considering they give absolutely no relevant details about the hardware (close to zero, zip, zilch, nada) other than what basically comes down to "we designed a custom SoC" I suspect there might be some caveats and or gotchas involved here.

      electronic components like RAM and GPUs.

      They still need RAM, they nicely talk around it in their marketing with stuff like this

      Taalas eliminates this boundary. By unifying storage and compute on a single chip, at DRAM-level density, our architecture far surpasses what was previously possible.

      But that just seems to be describing a SoC, like apple sillicon. Which yes, gives speed benefits. But at the same time still pretty much requires all the other hardware in order to run properly.

      In fact, having it all typed out. I feel like they just reinvented the NPU.

      tl;dr it remains to be seen what they actually are offering. This is pure marketing aimed at attracting more investors imho.

      1 vote
    2. skybrian
      Link Parent
      I'm not going to even try this one, but their planned releases seem promising.

      Our second model, still based on Taalas’ first-generation silicon platform (HC1), will be a mid-sized reasoning LLM. It is expected in our labs this spring and will be integrated into our inference service shortly thereafter.

      Following this, a frontier LLM will be fabricated using our second-generation silicon platform (HC2). HC2 offers considerably higher density and even faster execution. Deployment is planned for winter.

      I'm not going to even try this one, but their planned releases seem promising.