12 votes

Much of the innovation in natural language processing comes from the US, resulting in an English language bias – Finland decided to change the game with a collective approach

1 comment

  1. sparksbet
    Link
    Natural language processing is absolutely biased towards English, but I think attributing this solely to the innovation coming from the US is not entirely accurate. Institutions in other countries...

    Natural language processing is absolutely biased towards English, but I think attributing this solely to the innovation coming from the US is not entirely accurate. Institutions in other countries doing NLP still focus principally on English as a rule, with even other major languages taking a back seat. A lot of this ends up being an awful feedback loop, where all the good data and benchmarks are in English, so you limit your research to English, so any good data or benchmarks you come up with are in English, and so on. I applaud research instutions working to improve coverage of non-English languages across the board. The article itself rightly points out the major difficulty in getting sufficient data in a language like Finnish, but even large global languages like Spanish and Arabic are woefully underrepresented in NLP.

    Of course the computing costs are also a huge issue, but I think they're also an issue within English NLP, since they make it very difficult for anyone but the biggest for-profit companies to develop these super large models.

    6 votes