Background
As of today, Codeberg's Terms of Use (ToU) require repository contents to be licensed under a license approved by the Free Software Foundation (FSF) or the Open Source Initiative (OSI), as stated in the following two clauses (bolding mine):
§ 1 (2) Our service is open for all projects covered by a free software or open source licence, as defined by either the Free Software Foundation (FSF) or the Open Source Initiative (OSI).
and
§ 2 (1) Repository content shall be licensed under an open-source license approved by the Free Software Foundation (see list of the FSF) or the Open Source Initiative (see list of the OSI).
(...)
The OSI is currently working on a new definition of open source to be applied to "artificial intelligence" systems, which the OSI calls the "Open Source AI Definition" (OSAID) and intends to announce in late October. However, the OSAID appears to differ significantly in spirit from the OSI's original Open Source Definition (OSD), allowing for the dataset used to train a system (and thus generally necessary to replicate it) to be proprietary, as the checklist attached to the latest OSAID draft permits instead publishing a research paper, a technical report, or a draft card.
The OSI is explicit in its intention of not requiring training datasets to be open, having stated that (bolding mine)
The role of training data is one of the most hotly debated parts of the definition. After long deliberation and co-design sessions we have concluded that defining training data as a benefit, not a requirement, is the best way to go.
You may learn more about this controversy from a recent opinion piece by Steven J. Vaughan-Nichols published in The Register.
In response to these news, I would like to ask the Codeberg community the following question:
Should Codeberg update its ToU if the OSI publishes the OSAID without a requirement for training datasets to be open? If so, what should be changed?
### Comment
## Background
As of today, Codeberg's [Terms of Use](https://codeberg.org/Codeberg/org/src/branch/main/TermsOfUse.md) (ToU) require repository contents to be licensed under a license approved by the [Free Software Foundation](https://www.fsf.org/) (FSF) or the [Open Source Initiative](https://opensource.org/) (OSI), as stated in the following two clauses (bolding mine):
>§ 1 (2) Our service is open for all projects covered by a free software or open source licence, as defined by either the Free Software Foundation (FSF) or the **Open Source Initiative (OSI)**.
and
>§ 2 (1) Repository content shall be licensed under an open-source license approved by the Free Software Foundation ([see list of the FSF](https://www.gnu.org/licenses/license-list.html)) or the **Open Source Initiative** ([see list of the OSI](https://opensource.org/licenses/)).
>(...)
The OSI is currently working on a new definition of open source to be applied to "artificial intelligence" systems, which the OSI calls the "[Open Source AI Definition](https://opensource.org/deepdive)" (OSAID) and intends to announce in late October. However, the OSAID appears to differ significantly in spirit from the OSI's original [Open Source Definition](https://opensource.org/osd) (OSD), allowing for the dataset used to train a system (and thus generally necessary to replicate it) to be proprietary, as the [checklist](https://opensource.org/deepdive/drafts/the-open-source-ai-definition-checklist-draft-v-0-0-9) attached to the latest OSAID draft permits instead publishing a research paper, a technical report, or a draft card.
The OSI is explicit in its intention of not requiring training datasets to be open, having [stated](https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513) that (bolding mine)
>The role of training data is one of the most hotly debated parts of the definition. After long deliberation and co-design sessions we have concluded that defining **training data as a benefit, not a requirement, is the best way to go**.
You may learn more about this controversy from a recent [opinion piece](https://www.theregister.com/2024/09/14/opinion_column_osi/) by Steven J. Vaughan-Nichols published in The Register.
## Request for comments (RfC)
In response to these news, I would like to ask the Codeberg community the following question:
**Should Codeberg update its ToU if the OSI publishes the OSAID without a requirement for training datasets to be open? If so, what should be changed?**