Show Tildes: we built the world's first legal AI API

[2]

Wes

March 10

Link

I was wondering if we'd see an update on your earlier concept work. Interesting to see you've decided to form a company for this work. It's not of any particular use to me, but I'm sure...

I was wondering if we'd see an update on your earlier concept work. Interesting to see you've decided to form a company for this work. It's not of any particular use to me, but I'm sure lawyer-types will be curious to try it.

For a product, it does seem to be very technical still. Requiring prompts via API/command line, and even using a DSL. I suspect a frontend UI - be it web-based or otherwise - would really make things more accessible for non-programmers. Maybe with a micro-LLM for converting English prompts to your DSL. Is that on the roadmap? Making an API first will surely make that kind of interface a lot easier to build, though.

10 votes

ubr (OP)
March 10
Link Parent
You're right and indeed we're planning on releasing an API playground next, hopefully followed by, later in the year, some apps. We thought we'd start with the API first because, right now, our...

For a product, it does seem to be very technical still. Requiring prompts via API/command line, and even using a DSL. I suspect a frontend UI - be it web-based or otherwise - would really make things more accessible for non-programmers.

You're right and indeed we're planning on releasing an API playground next, hopefully followed by, later in the year, some apps.

We thought we'd start with the API first because, right now, our team doesn't have a lot of frontend expertise (though we've learned quite a bit through building our platform ourselves).

We're hoping to use the API to quickly ship more models and thereby generate enough traction to expand our team.

Maybe with a micro-LLM for converting English prompts to your DSL. Is that on the roadmap?

Bingo! We're currently exploring how we can use LLMs to both perform prompt optimization (instead of us having to tune them ourselves by hand) as well as translate natural language queries into our DSL for less technical users.

We're also working on sharing a prompt that users can feed to their own agents to then have those agents be able to write their own queries, which should still save them money, time and often improve accuracy when running through large volumes of lengthy legal documents.

5 votes

ubr (OP)

March 9

Link

Hey ~tech, Over the past couple months, we, a team of Aussie legal and AI experts, have been working on building a new type of legal AI company — a company that, instead of trying to automate...

Hey ~tech,

Over the past couple months, we, a team of Aussie legal and AI experts, have been working on building a new type of legal AI company — a company that, instead of trying to automate legal jobs, is trying to automate legal tasks.

We want to make lawyers’ lives easier, not replace them.

We’ve been laser-focused on building small and efficient yet still highly accurate, specialized models for some of the most time-consuming and mundane legal tasks lawyers have to perform. Stuff like running through a thousand contracts just to locate any clauses that would allow you to get out early.

We just finished training our first set of models, focused on document and clause classification, probably the most common problem we see come up. Our benchmarks show our models to be far more accurate and almost more efficient than their closest general-purpose competitors.

Today, we’re making those models publicly available via the Isaacus API, the world’s first legal AI API.

Our models don’t require any finetuning because they’re zero-shot classifiers — you give them a description of what you’re looking for (for example, This is a confidentiality clause.) and they pop out a classification score.

Because our models are so small, which they have to be to be able to process reams of legal data at scale, they can sometimes be a bit sensitive to prompts. To help with that, however, we’ve preoptimized an entire library of prompts, including what we call, universal templates, which let you plug in your own arbitrary descriptions of what you’ve looking for.

We’ve made our prompt library available via the Isaacus Query Language or IQL. Another world first — it’s a brand-new AI query language designed specifically for using AI models to analyze documents.

You can invoke query templates using the format {IS <query_template_name>}. You can also chain queries together using Boolean and mathematical operators, like so: {This is a confidentiality clause.} AND {IS unilateral clause}.

We think our API is pretty neat and we hope you will too.

This is just the beginning for us — over the course of this year, we’re planning on releasing text extraction and embedding models as well as a second generation of our Kanon legal foundational model.

Here are some quick links for your convenience:

8 votes

[2]

carsonc

March 10

Link

This looks really great. For my own part, I am happy to see anyone on tildes posting on what they are doing, so long as its interesting. I have a lot of questions about this. How can you assure...

This looks really great. For my own part, I am happy to see anyone on tildes posting on what they are doing, so long as its interesting.

I have a lot of questions about this.

How can you assure privacy, confidentiality, and/or security with uploaded contracts? I can imagine that such documents might be sensitive and a client might want to know that use of the tool won't constitute a liability.

Should users anonymize or scrub their documents before uploading?

I read the statement at the bottom of the linked webpage and thought you might want to contact Chris Kavelin, as it would seem that your interests might dovetail.

I work in scientific research and would love to have a screening tool for scientific papers where I can do something similar to, say, grabbing a ton of papers on ArXiv and then querying them for specific information using something like IQL. Is this a pipe-dream or an existing capability?

8 votes

ubr (OP)
March 10 (edited March 10)
Link Parent
Thank you, much appreciated :) Some good questions. In terms of security, we use industry-standard security practices like encrypting and hashing data, rotating API keys, following the principle...

This looks really great. For my own part, I am happy to see anyone on tildes posting on what they are doing, so long as its interesting.

Thank you, much appreciated :)

How can you assure privacy, confidentiality, and/or security with uploaded contracts? I can imagine that such documents might be sensitive and a client might want to know that use of the tool won't constitute a liability.

Should users anonymize or scrub their documents before uploading?

Some good questions. In terms of security, we use industry-standard security practices like encrypting and hashing data, rotating API keys, following the principle of least privilege, etc. to protect our services.

In terms of privacy and confidentiality, our policy is to retain data only for as long as necessary to ensure there is no abuse of our services going on before we delete it. We also don't use private user data sent to our API to train our models without users' consent, except for, in the case of abuse of our services, improving our internal abuse detection capabilities.

For highly sensitive workloads though, we actually also offer self-hosting, where users can run our models on their own hardware, including inside air-gapped envrionments, and ensure that no data ever leaves their hands.

I read the statement at the bottom of the linked webpage and thought you might want to contact Chris Kavelin, as it would seem that your interests might dovetail.

Thanks for the tip!

I work in scientific research and would love to have a screening tool for scientific papers where I can do something similar to, say, grabbing a ton of papers on ArXiv and then querying them for specific information using something like IQL. Is this a pipe-dream or an existing capability?

Actually yes, that would be possible. Theoretically, IQL can be applied to the extraction and classification of any arbitrary data using any arbitrary text extraction or text classification model.

For the short- and medium-term at least though, our focus is on building specifically legal AI models.

The Kanon Universal Classifier could be used for what you're looking for but it may not give you the best accuracy possible.

We are open to the possibility of licensing our technology to others to build their own domain-equivalent tools.

3 votes

zestier

March 10 (edited March 10)

Link

This information may already be present somewhere if I dig deeper, but I'm curious both about why you chose the legal domain specifically and how specialized your tools are to that domain. The...

This information may already be present somewhere if I dig deeper, but I'm curious both about why you chose the legal domain specifically and how specialized your tools are to that domain. The reason I'm curious is that a lot of AI tools end up tailored to code and because they're written by people that write code it makes sense that they'd already be interested in that domain, but law is pretty specific so I'm guessing there's some story there.

It's also a bit unclear to me if this is specialized to Australian law. And if it isn't, how does it deal with differences between jurisdictions?

3 votes

RNG

March 10

Link

Seems like this comment from a year and a half ago is still relevant: https://tildes.net/~comp/1baf/teaching_llms_to_divide_and_conquer_problems_with_hierarchical_question_decomposition#comment-awdu

3 votes

[7]

bl4kers

March 10

Link

It seems like all your posts are self-promotion

28 votes

[4]
ubr (OP)
March 10 (edited March 10)
Link Parent
Sorry about that! I tend to be more of a lurker on Tildes. If the mods think its too much, I'm happy to take this down for now. Certainly did not intend to cause offence!

Sorry about that! I tend to be more of a lurker on Tildes. If the mods think its too much, I'm happy to take this down for now. Certainly did not intend to cause offence!

7 votes
1. [3]
  MimicSquid
  March 10
  Link Parent
  I'm not a mod, (especially because that's not the sort of moderation structure we have around here) but your entire comment history is talking about LLMs, and 90% of it is in threads you started....
  
  I'm not a mod, (especially because that's not the sort of moderation structure we have around here) but your entire comment history is talking about LLMs, and 90% of it is in threads you started. The question we all want to ask is why we're posting something here. This is a discussion site, and so a lot of things are posted because the OP wants to talk about it. That's fine and reasonable. But as in conversations in general, it's in poor taste to only ever want to talk about your interests. If you only ever want to talk about yourself it makes me wonder why you're here, and whether your reasons for being here are because you want to talk or because you want to advertise. As this is a discussion site and not an advertising platform, it behooves all of us to think about whether we're being a good conversational partner.
  
  24 votes
  1. zestier
    March 10
    Link Parent
    Our comments turned out pretty similar, but the one thing I'd disagree with a bit is that I think it's fine if someone wants to stick to just their interests. But, that means they should be part...
    
    Our comments turned out pretty similar, but the one thing I'd disagree with a bit is that I think it's fine if someone wants to stick to just their interests. But, that means they should be part of the community surrounding that interest even when not the OP. As a concrete example here, that would mean participating in really any of the quite frequent posts related to LLMs (breakthrough articles, people seeking model recommendations, etc.).
    
    12 votes
  2. ubr (OP)
    March 10
    Link Parent
    I take your point (and that of @zestier) and will increase my engagement in the future. Apologies again.
    
    I take your point (and that of @zestier) and will increase my engagement in the future. Apologies again.
    
    6 votes
[2]
Queresote
March 10
Link Parent
I upvoted your comment because it has brought up this question I've been battling myself (amongst others): How much should I actually share about myself (such as recent accomplishments, personal...

I upvoted your comment because it has brought up this question I've been battling myself (amongst others):

How much should I actually share about myself (such as recent accomplishments, personal updates, anecdotes, etc.) in regards to the stream?

I've been holding onto this TIME article pertaining to the topic, which has served as an excellent starting point.

In the article self-promption is mentioned (such as examples for why it may be appropriate sometimes to self-promote) and there are valuable resources linked as well.

What do you believe a proper amount of self-promotion is? Is there a ratio that speaks to you? What other considerations do you think we should be employing here?

7 votes
1. zestier
  March 10 (edited March 10)
  Link Parent
  Personally I think that trying to stick to any sort of ratio is not helpful, and potentially actively detrimental to the goals of the self promotion rules here, as it almost feels analogous to...
  
  Personally I think that trying to stick to any sort of ratio is not helpful, and potentially actively detrimental to the goals of the self promotion rules here, as it almost feels analogous to interacting to maximize engagement farming. Tildes isn't really a place for building a brand and gathering followers. Tildes is more of a community.
  
  The way I think of it is like this: if you went to a party and your only contributions to conversion were basically you networking by talking about yourself you'd probably find not-great vibes in those conversations. That doesn't make it bad to talk about what you've got going on, but do it naturally as part of the community rather than to an audience. People want to hear about your cool new stuff, but they want to be talked with, not at.
  
  As an example specific to the OP of this topic, he(?) seems pretty knowledgeable about AI and LLMs and these topics are pretty frequent. Surely there has been room for some of this expertise to shine in discussions that others had.
  
  16 votes

Link information

14 comments