6 votes

The Stack - permissively licensed code for large language models

1 comment

  1. skybrian
    Link
    From the web page:

    From the web page:

    Our first step to that end was to select code whose original license was compatible with training an LLM, specifically, open-source licenses without copyleft. You can find the list of selected open-source licenses below. In addition, we are giving developers the ability to have their code removed from the dataset upon request. The process for submitting and enacting removal requests will keep evolving throughout the project as we receive feedback and build up more data governance tools. The following FAQ presents the current state of this process, as well as the planned next steps.

    1 vote