12 votes

Much of the innovation in natural language processing comes from the US, resulting in an English language bias – Finland decided to change the game with a collective approach

Posted August 19, 2023 by mycketforvirrad

Tags: finland, language models.large, computing.super, language, english, bias, technology.proprietary, computers, lumi, europe, author.pat brans, source.computerweekly

https://www.computerweekly.com/news/366548593/Europes-fastest-supercomputer-trains-large-language-models-in-Finland

Link information

This data is scraped automatically and may be incorrect.

Title: Europe's fastest supercomputer trains large language models in Finland | Computer Weekly
Word count: 505 words

1 comment

sparksbet
August 19, 2023
Link
Natural language processing is absolutely biased towards English, but I think attributing this solely to the innovation coming from the US is not entirely accurate. Institutions in other countries...

Natural language processing is absolutely biased towards English, but I think attributing this solely to the innovation coming from the US is not entirely accurate. Institutions in other countries doing NLP still focus principally on English as a rule, with even other major languages taking a back seat. A lot of this ends up being an awful feedback loop, where all the good data and benchmarks are in English, so you limit your research to English, so any good data or benchmarks you come up with are in English, and so on. I applaud research instutions working to improve coverage of non-English languages across the board. The article itself rightly points out the major difficulty in getting sufficient data in a language like Finnish, but even large global languages like Spanish and Arabic are woefully underrepresented in NLP.

Of course the computing costs are also a huge issue, but I think they're also an issue within English NLP, since they make it very difficult for anyone but the biggest for-profit companies to develop these super large models.

6 votes