Hey Tildes, After months of hard work, I am excited to share the first ever semantic map of Australian law. My map represents the first attempt to map Australian laws, cases and regulations across...
Hey Tildes,
After months of hard work, I am excited to share the first ever semantic map of Australian law.
My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning.
Each point on the map is a unique document in the Open Australian Legal Corpus, the largest open database of Australian law (which, full disclosure, I created). The closer any two points are on the map, the more similar they are in underlying meaning.
As I cover in my article, there’s a lot you can learn by mapping Australian law. Some of the most interesting insights to come out of this initiative are that:
⦁ Migration, family and substantive criminal law are the most isolated branches of case law on the map;
⦁ Migration, family and substantive criminal law are the most distant branches of case law from legislation on the map;
⦁ Development law is the closest branch of case law to legislation on the map;
⦁ Case law is more of a continuum than a rigidly defined structure and the borders between branches of case law can often be quite porous; and
⦁ The map does not reveal any noticeable distinctions between Australian state and federal law, whether it be in style, principles of interpretation or general jurisprudence.
If you’re interested in learning more about what the map has to teach us about Australian law or if you’d like to find out how you can create semantic maps of your own, check out the full article on my blog, which provides a detailed analysis of my map and also covers the finer details of how I built it, with code examples offered along the way.
Very cool, though mostly over my head. It's kind of funny that you're using BERT for topic modelling. It's still a very new technique by most definitions, but it feels like LLMs have largely...
Very cool, though mostly over my head.
It's kind of funny that you're using BERT for topic modelling. It's still a very new technique by most definitions, but it feels like LLMs have largely displaced these "legacy" approaches now. But it clearly worked, and it is likely well-suited to things that aren't strictly related to language patterns.
I don't have too much to add on the topic of semantic mapping itself, but I appreciate all the work you've put into this and thank you for sharing it.
I actually didn't use BERT for topic modelling, the embeddings were created by a far more advanced specialised model called BAAI/bge-small-en-v1.5, although still a transformer. A more rudimentary...
I actually didn't use BERT for topic modelling, the embeddings were created by a far more advanced specialised model called BAAI/bge-small-en-v1.5, although still a transformer. A more rudimentary algorithm was used for clustering, HDBSCAN, but then I manually reviewed all 507 clusters and reduced them to 19 branches of law. I do mention that I modelled my process off BERTopic, which is probably where the confusion came from. I'm not too sure why it's still called that when it doesn't even use BERT anymore 😅.
Thank you for the positive feedback, it is much appreciated. And apologies if the article seems a little too complex, the underlying process is 'pretty simple' but my implementation may be more involved because working with over 200k long-form documents meant having to build much more efficent code from scratch than what is available in the BERTopic Python library.
I see, thanks for explaining further. And yeah, I assumed BERTopic used BERT so that's on me for not reading closely enough. No worries about the complexity. I'm sure it's the perfect resource for...
I see, thanks for explaining further. And yeah, I assumed BERTopic used BERT so that's on me for not reading closely enough.
No worries about the complexity. I'm sure it's the perfect resource for somebody at the same experience level who is trying to solve similar problems. It's rare to find such a blog post that fits perfectly, but when you do, it is a magical experience. So I hope your post reaches those that need this information most. Cheers!
Hey Tildes,
After months of hard work, I am excited to share the first ever semantic map of Australian law.
My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning.
Each point on the map is a unique document in the Open Australian Legal Corpus, the largest open database of Australian law (which, full disclosure, I created). The closer any two points are on the map, the more similar they are in underlying meaning.
As I cover in my article, there’s a lot you can learn by mapping Australian law. Some of the most interesting insights to come out of this initiative are that:
⦁ Migration, family and substantive criminal law are the most isolated branches of case law on the map;
⦁ Migration, family and substantive criminal law are the most distant branches of case law from legislation on the map;
⦁ Development law is the closest branch of case law to legislation on the map;
⦁ Case law is more of a continuum than a rigidly defined structure and the borders between branches of case law can often be quite porous; and
⦁ The map does not reveal any noticeable distinctions between Australian state and federal law, whether it be in style, principles of interpretation or general jurisprudence.
If you’re interested in learning more about what the map has to teach us about Australian law or if you’d like to find out how you can create semantic maps of your own, check out the full article on my blog, which provides a detailed analysis of my map and also covers the finer details of how I built it, with code examples offered along the way.
Very cool, though mostly over my head.
It's kind of funny that you're using BERT for topic modelling. It's still a very new technique by most definitions, but it feels like LLMs have largely displaced these "legacy" approaches now. But it clearly worked, and it is likely well-suited to things that aren't strictly related to language patterns.
I don't have too much to add on the topic of semantic mapping itself, but I appreciate all the work you've put into this and thank you for sharing it.
I actually didn't use BERT for topic modelling, the embeddings were created by a far more advanced specialised model called
BAAI/bge-small-en-v1.5
, although still a transformer. A more rudimentary algorithm was used for clustering, HDBSCAN, but then I manually reviewed all 507 clusters and reduced them to 19 branches of law. I do mention that I modelled my process off BERTopic, which is probably where the confusion came from. I'm not too sure why it's still called that when it doesn't even use BERT anymore 😅.Thank you for the positive feedback, it is much appreciated. And apologies if the article seems a little too complex, the underlying process is 'pretty simple' but my implementation may be more involved because working with over 200k long-form documents meant having to build much more efficent code from scratch than what is available in the BERTopic Python library.
I see, thanks for explaining further. And yeah, I assumed BERTopic used BERT so that's on me for not reading closely enough.
No worries about the complexity. I'm sure it's the perfect resource for somebody at the same experience level who is trying to solve similar problems. It's rare to find such a blog post that fits perfectly, but when you do, it is a magical experience. So I hope your post reaches those that need this information most. Cheers!