8 votes

Jina AI releases first open source 8k embedding model

3 comments

  1. [3]
    Comment deleted by author
    Link
    1. sparksbet
      Link Parent
      I haven't read the papers so I don't know about any interesting technical details of this particular embeddings model, but the newsworthy part of this model is that it's been released open-source,...

      I haven't read the papers so I don't know about any interesting technical details of this particular embeddings model, but the newsworthy part of this model is that it's been released open-source, with all the added freedom and benefits that entails. Good embeddings models these days usually aren't, since they're expensive to train and thus usually trained and designed by big companies.

      I'm not sure where your baseline of knowledge is about AI and embeddings but if there's something specific you want a simple explanation of, I can do my best!

      3 votes
    2. unkz
      Link Parent
      I would think the main benefit would be for RAG (recall augmented generation). Essentially, one can use an embedding model to summarize documents into a vector of numbers so that they can be...

      I would think the main benefit would be for RAG (recall augmented generation). Essentially, one can use an embedding model to summarize documents into a vector of numbers so that they can be searched rapidly. This model handles larger documents than its other open source competitors. This would let you summarize fairly large documents in their entirety instead of chunking them and potentially ending up with no single chunk encompassing all the important details relating to a particular query.

      Here’s a pretty accessible article on what embeddings are all about:

      https://pub.aimind.so/llm-embeddings-explained-simply-f7536d3d0e4b

      2 votes
  2. ubr
    Link
    The model ranks 17th on the Massive Text Embedding Benchmark (MTEB) Leaderboard, making it a great option for those looking for a FOSS alternative to ada-002 that can handle just as many tokens....

    The model ranks 17th on the Massive Text Embedding Benchmark (MTEB) Leaderboard, making it a great option for those looking for a FOSS alternative to ada-002 that can handle just as many tokens. Naturally, for those dealing with smaller context windows, BGE still reigns supreme.

    5 votes