6 votes

The Stack - permissively licensed code for large language models

Posted November 1, 2022 by skybrian

Tags: machine learning, bigcode, the stack, licensing.software, language models.large

https://www.bigcode-project.org/docs/about/the-stack/

Link information

This data is scraped automatically and may be incorrect.

Title: The Stack
Published: Nov 16 2020
Word count: 956 words

1 comment

skybrian (OP)
November 1, 2022
Link
From the web page:

From the web page:

Our first step to that end was to select code whose original license was compatible with training an LLM, specifically, open-source licenses without copyleft. You can find the list of selected open-source licenses below. In addition, we are giving developers the ability to have their code removed from the dataset upon request. The process for submitting and enacting removal requests will keep evolving throughout the project as we receive feedback and build up more data governance tools. The following FAQ presents the current state of this process, as well as the planned next steps.

1 vote