Codeberg/Community
54
325
Fork
You've already forked Community
12

Allow non-AI licenses #2089

Closed
opened 2025年08月16日 17:13:34 +02:00 by cpresser · 8 comments

Hi,

With the current wording of the TOU, it is not possible to add 'non-AI' to a license. Additional restrictions, make the license non-free and not compliant with with either the FSF or OSI definitions.
Ideally, I would like the exceptions in §2 of the TOU to be amended to include the non-AI use-case.

I know that this is a controversial topic, so consider this a wish only.


Usually my projects have this license-text:

NoAI: This project may not be used in datasets for, in the development of, or as inputs to generative AI programs.
§44b: Dies ist ein in maschinenlesbarer Form vorliegender Nutzungsvorbehalt entsprechend §44b UrhG
License: CERN-OHL-S
2025 cpresser and friends

It would be nice if I can host projects with that license on codeberg.


"§ 2 Allowed Content & Usage"

Reasonable exceptions are to a very limited extent considered acceptable. For example, releasing single logo image files of a FLOSS project under no licence or a separate non-free licence that requires derivative works to use their own logo that is clearly distinguishable from the original work even in absence of trademark registration.

Hi, With the current wording of the TOU, it is not possible to add 'non-AI' to a license. Additional restrictions, make the license non-free and not compliant with with either the FSF or OSI definitions. Ideally, I would like the exceptions in §2 of the TOU to be amended to include the non-AI use-case. I know that this is a controversial topic, so consider this a wish only. --- Usually my projects have this license-text: ``` NoAI: This project may not be used in datasets for, in the development of, or as inputs to generative AI programs. §44b: Dies ist ein in maschinenlesbarer Form vorliegender Nutzungsvorbehalt entsprechend §44b UrhG License: CERN-OHL-S 2025 cpresser and friends ``` It would be nice if I can host projects with that license on codeberg. --- "§ 2 Allowed Content & Usage" > Reasonable exceptions are to a very limited extent considered acceptable. For example, releasing single logo image files of a FLOSS project under no licence or a separate non-free licence that requires derivative works to use their own logo that is clearly distinguishable from the original work even in absence of trademark registration.
Owner
Copy link

I like the idea of non-AI licenses a lot, but there are a couple of issues with them, both legally as well as practically - changing existing licenses or writing new ones is always a bit of a legal gamble and often unclear in both directions, especially when clauses are not clearly adjusted for the underlying license. That's the reason why we preferred to stick to a relatively fixed set of proven licenses in the past.

If we were to do this, it would have to be a set of specific licenses, like e.g. all or some from https://github.com/non-ai-licenses/non-ai-licenses/tree/main. However:

  • Some of them can completely break with this addition (e.g. the NON-AI-Unlicense tries to release content into the public domain, whilst trying to simultaneously keep control over it, which could make the whole license legally inapplicable in many countries).
  • Just adding a clause to a license might also conflict with the license content, e.g. the EUPL specifically allows sublicensing under MIT, which would then break a non-AI clause.
  • In your clause I already see a couple of practical issues, e.g. "may not be used in inputs to generative AI programs" means that all users and developers of the software would have to have Microsoft Recall disabled (which would be a good outcome IMO but probably unenforcable 😉) and can't take screenshots of it with most modern smartphones.

TL;DR: we want to accept legally clear and generally proven licenses to avoid unclarities, and I'm in favor of adding non-AI licenses that fulfil these criteria.

I like the idea of non-AI licenses a lot, but there are a couple of issues with them, both legally as well as practically - changing existing licenses or writing new ones is always a bit of a legal gamble and often unclear in both directions, especially when clauses are not clearly adjusted for the underlying license. That's the reason why we preferred to stick to a relatively fixed set of proven licenses in the past. If we were to do this, it would have to be a set of specific licenses, like e.g. all or some from https://github.com/non-ai-licenses/non-ai-licenses/tree/main. However: - Some of them can completely break with this addition (e.g. the NON-AI-Unlicense tries to release content into the public domain, whilst trying to simultaneously keep control over it, which could make the whole license legally inapplicable in many countries). - Just adding a clause to a license might also conflict with the license content, e.g. the EUPL specifically allows sublicensing under MIT, which would then break a non-AI clause. - In your clause I already see a couple of practical issues, e.g. "may not be used in inputs to generative AI programs" means that all users and developers of the software would have to have Microsoft Recall disabled (which would be a good outcome IMO but probably unenforcable 😉) and can't take screenshots of it with most modern smartphones. TL;DR: we want to accept legally clear and generally proven licenses to avoid unclarities, and I'm in favor of adding non-AI licenses that fulfil these criteria.

It is pretty clear that 'non-AI' licenses are effectively non enforceable. IMHO this is beyond the scope of this issue.

In the end, those additions are mostly 'virtue signaling'. If and how I would win a court-case against someone putting my project into a learning dataset vastly depends on the jurisdiction and the amount of money I have for lawyers. It is quite unlikely that as creators we have any real leverage.


I spend plenty of time researching this topic. Related for the German-speaking crowd: https://media.ccc.de/v/gpn23-207-opensource-lizenzen-und-ki-wie-passt-das-zusammen-

There will be another more broader talk about this issue at KiCon2024 soon.


To make this more simple, I suggest to limit the discussion in this issue to the "§44b UrhG", which is the German implementation of the DSM 2019/790 directive.
It is a legally binding addition, which - to my knowledge - does not conflict per-se with existing copyleft licenses.
But it does make them non-free, because it limits the scope of usage. It feels like it already violates the spirit of the codeberg TOU.

It is pretty clear that 'non-AI' licenses are effectively non enforceable. IMHO this is beyond the scope of this issue. In the end, those additions are mostly 'virtue signaling'. If and how I would win a court-case against someone putting my project into a learning dataset vastly depends on the jurisdiction and the amount of money I have for lawyers. It is quite unlikely that as creators we have any real leverage. --- I spend plenty of time researching this topic. Related for the German-speaking crowd: https://media.ccc.de/v/gpn23-207-opensource-lizenzen-und-ki-wie-passt-das-zusammen- There will be another more broader talk about this issue at KiCon2024 soon. --- To make this more simple, I suggest to limit the discussion in this issue to the "§44b UrhG", which is the German implementation of the DSM 2019/790 directive. It is a legally binding addition, which - to my knowledge - does not conflict per-se with existing copyleft licenses. But it does make them non-free, because it limits the scope of usage. It feels like it already violates the spirit of the codeberg TOU.

Clarification edit: My below voiced support is not for changing an existing vote, but for allowing "non-free" licenses on Codeberg when the "non-free" applies to forbidding use as training data or input to generative systems.

I absolutely support this and have been meaning to ask about this myself - I also have been subscribed to the linked github repository for a while now for that reason.

I believe something like the following could be done to clarify precedence of rules:

No-machine-learning XYZ license

< exclusion clause for machine learning >

For all other purposes, and where not in contradiction with , the XYZ license applies in the below quoted form.

This should take care of "collisions" with clauses in the XYZ license: If there could be a collision in the first place, then XYZ doesn't even get to be applied, because the exclusion-clause is applicable.

Sidenote: I do not at all like the idea of using the wrong term "AI" for machine learning, both because large language models etc have nothing in common with AI, and because the focus here is on being used as training data, something that actual AI wouldn't need (or just many orders of magnitude less than LLMs).

Adopting "AI" as a term in a license against machine learning, written by people who know better, that is a bit like non-racist people adopting the word "race" - and as that contributed to hundreds of millions of people believing that "race" is a thing, so will the use of the false term "AI" contribute to even more people believing that nowadays machine learning systems are "AI".

_Clarification edit: My below voiced support is not for changing an existing vote, but for allowing "non-free" licenses on Codeberg when the "non-free" applies to forbidding use as training data or input to generative systems._ I absolutely support this and have been meaning to ask about this myself - I also have been subscribed to the linked github repository for a while now for that reason. I believe something like the following could be done to clarify precedence of rules: > **No-machine-learning XYZ license** > > < exclusion clause for machine learning > > > For all other purposes, and where not in contradiction with <exclusion clause>, the XYZ license applies in the below quoted form. This should take care of "collisions" with clauses in the XYZ license: If there could be a collision in the first place, then XYZ doesn't even get to be applied, because the exclusion-clause is applicable. Sidenote: I do not at all like the idea of using the wrong term "AI" for machine learning, both because large language models etc have nothing in common with AI, and because the focus here is on being used as training data, something that *actual* AI wouldn't need (or just many orders of magnitude less than LLMs). Adopting "AI" as a term in a license against machine learning, written by people who know better, that is a bit like non-racist people adopting the word "race" - and as that contributed to hundreds of millions of people believing that "race" is a thing, so will the use of the false term "AI" contribute to even more people believing that nowadays machine learning systems are "AI".

@momar

I like the idea of non-AI licenses a lot, but there are a couple of issues with them, both legally as well as practically

I agree, as in practice, the current Zeitgeist/political topics often move too fast for licenses to catch up.

In your clause I already see a couple of practical issues, e.g. "may not be used in inputs to generative AI programs" means that all users and developers of the software would have to have Microsoft Recall disabled (which would be a good outcome IMO but probably unenforcable 😉) and can't take screenshots of it with most modern smartphones.

Are you sure, that local processing really applies to this?

@cpresser

It feels like it already violates the spirit of the codeberg TOU.

What do you mean by that?

As I am living in Germany: regarding the opt-out of "§44b UrhG" : is there any group of people, that says this is how the opt-out should be done?

@momar > I like the idea of non-AI licenses a lot, but there are a couple of issues with them, both legally as well as practically I agree, as in practice, the current Zeitgeist/political topics often move too fast for licenses to catch up. > In your clause I already see a couple of practical issues, e.g. "may not be used in inputs to generative AI programs" means that all users and developers of the software would have to have Microsoft Recall disabled (which would be a good outcome IMO but probably unenforcable 😉) and can't take screenshots of it with most modern smartphones. Are you sure, that local processing really applies to this? @cpresser > It feels like it already violates the spirit of the codeberg TOU. What do you mean by that? As I am living in Germany: regarding the opt-out of "§44b UrhG" : is there any group of people, that says this is how the opt-out should be done?

As I am living in Germany: regarding the opt-out of "§44b UrhG" : is there any group of people, that says this is how the opt-out should be done?

There is no clear answer to that. Even if you read the full text of the law and the relating documents on the Bundestag server. I am quite confident that the text in the README does suffice.
Currently, I am not aware of a real world case where this question has been clarified in a court decision.
An interesting court-case relating to that is https://openjur.de/u/2495651.html

It feels like it already violates the spirit of the codeberg TOU.
What do you mean by that?

The §44b addition restricts the usage of a project. Its a change to the license. A project with this addition is not "free" anymore. The effective license (which is the original license plus the exception) is not approved by FSF or OSI.
This is not compatible with the TOU of codeberg.

There is a quite good document by the CreativeCommons regarding the TDM exception: https://creativecommons.org/wp-content/uploads/2021/12/CC-Statement-on-the-TDM-Exception-Art-4-DSM-Final.pdf if you want to read more about the TDM exceptions.

> As I am living in Germany: regarding the opt-out of "§44b UrhG" : is there any group of people, that says this is how the opt-out should be done? There is no clear answer to that. Even if you read the full text of the law and the relating documents on the Bundestag server. I am quite confident that the text in the README does suffice. Currently, I am not aware of a real world case where this question has been clarified in a court decision. An interesting court-case relating to that is https://openjur.de/u/2495651.html >> It feels like it already violates the spirit of the codeberg TOU. > What do you mean by that? The §44b addition restricts the usage of a project. Its a change to the license. A project with this addition is not "free" anymore. The effective license (which is the original license _plus_ the exception) is not approved by FSF or OSI. This is not compatible with the TOU of codeberg. There is a quite good document by the CreativeCommons regarding the TDM exception: https://creativecommons.org/wp-content/uploads/2021/12/CC-Statement-on-the-TDM-Exception-Art-4-DSM-Final.pdf if you want to read more about the TDM exceptions.

I sympathize with the sentiment (just like other ethical licenses that exclude mass murders or predatory companies or ...). The law does not, unfortunately, make is possible to reconcile this with the goal of a Free Software license. It would not be a good idea to allow such non free licenses on Codeberg or any other forge dedicated to Free Software.

I sympathize with the sentiment (just like other ethical licenses that exclude mass murders or predatory companies or ...). The law does not, unfortunately, make is possible to reconcile this with the goal of a Free Software license. It would not be a good idea to allow such non free licenses on Codeberg or any other forge dedicated to Free Software.

I talked to some lawyers in the meantime and also read a few more documents regarding TDM.
Some of my previous statements are not correct. In particular

Its a change to the license. A project with this addition is not "free" anymore. The effective license (which is the original license plus the exception) is not approved by FSF or OSI.

Example: As the person (Licensee) that wants to use a project for AI-Learning, you can chose two paths:

  1. Respect the license that the licensor has given. That might respecting attribution and redistribution clauses.
  2. Say you are doing TDM according to the DSM-Regulation and disregard the license and its term.

If the rightholder (licensor) does opt-out of TDM, that still leaves Option1 to use the licensed work. Which is the "status quo ante bellum" (before the DSM Regulation). And that does not discrimination against fields of endeavor.

The TDM-Exception of this issue is tracked in #2094.
Leaving the 'No-AI' part, which I understand will not work on codeberg. Thus closing this issue.

I talked to some lawyers in the meantime and also read a few more documents regarding TDM. Some of my previous statements are not correct. In particular > Its a change to the license. A project with this addition is not "free" anymore. The effective license (which is the original license plus the exception) is not approved by FSF or OSI. Example: As the person (Licensee) that wants to use a project for AI-Learning, you can chose two paths: 1. Respect the license that the licensor has given. That might respecting attribution and redistribution clauses. 2. Say you are doing TDM according to the DSM-Regulation and disregard the license and its term. If the rightholder (licensor) does opt-out of TDM, that still leaves Option1 to use the licensed work. Which is the "status quo ante bellum" (before the DSM Regulation). And that does not discrimination against fields of endeavor. The TDM-Exception of this issue is tracked in #2094. Leaving the 'No-AI' part, which I understand will not work on codeberg. Thus closing this issue.

Leaving the 'No-AI' part, which I understand will not work on codeberg. Thus closing this issue.

Up until this sentence, I thought your post was saying the opposite? I am confused. I thought you had just argued why a No-machine-learning (I refuse to call that garbage "AI") is NOT a problem with codeberg?

Honestly, I did not think this would become a problem - I fully planned to prefix my licenses eventually with "do not use for machine learning" statement. Because machine learning is disregarding licenses AND authorship. And it does not matter whatever any "fair use" legislation interpreting court says about this now, or whether I can enforce it. What matters is that I have explicitly expressed in my license that I consider use for machine learning to be an unlicensed use. Without this, I can not even dream of starting a class action lawsuit at any time in the future.

> Leaving the 'No-AI' part, which I understand will not work on codeberg. Thus closing this issue. Up until this sentence, I thought your post was saying the opposite? I am confused. I thought you had just argued why a No-machine-learning (I refuse to call that garbage "AI") is NOT a problem with codeberg? Honestly, I did not think this would become a problem - I fully planned to prefix my licenses eventually with "do not use for machine learning" statement. Because machine learning is disregarding licenses AND authorship. And it does not matter whatever any "fair use" legislation interpreting court says about this *now*, or whether I can enforce it. What matters is that I have explicitly expressed in my license that I consider use for machine learning to be an unlicensed use. Without this, I can not even dream of starting a class action lawsuit at any time in the future.
Sign in to join this conversation.
No Branch/Tag specified
main
No results found.
Labels
Clear labels
accessibility

Reduces accessibility and is thus a "bug" for certain user groups on Codeberg.
bug

Something is not working the way it should. Does not concern outages.
bug
infrastructure

Errors evidently caused by infrastructure malfunctions or outages
Codeberg

This issue involves Codeberg's downstream modifications and settings and/or Codeberg's structures.
contributions welcome

Please join the discussion and consider contributing a PR!
docs

No bug, but an improvement to the docs or UI description will help
duplicate

This issue or pull request already exists
enhancement

New feature
infrastructure

Involves changes to the server setups, use `bug/infrastructure` for infrastructure-related user errors.
legal

An issue directly involving legal compliance
licence / ToS

involving questions about the ToS, especially licencing compliance
please chill
we are volunteers

Please consider editing your posts and remember that there is a human on the other side. We get that you are frustrated, but it's harder for us to help you this way.
public relations

Things related to Codeberg's external communication
question

More information is needed
question
user support

This issue contains a clearly stated problem. However, it is not clear whether we have to fix anything on Codeberg's end, but we're helping them fix it and/or find the cause.
s/Forgejo

Related to Forgejo. Please also check Forgejo's issue tracker.
s/Forgejo/migration

Migration related issues in Forgejo
s/Pages

Issues related to the Codeberg Pages feature
s/Weblate

Issue is related to the Weblate instance at https://translate.codeberg.org
s/Woodpecker

Woodpecker CI related issue
security

involves improvements to the sites security
service

Add a new service to the Codeberg ecosystem (instead of implementing into Gitea)
upstream

An open issue or pull request to an upstream repository to fix this issue (partially or completely) exists (i.e. Gitea, Forgejo, etc.)
wontfix

Codeberg's current set of contributors are not planning to spend time on delegating this issue.
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Codeberg/Community#2089
Reference in a new issue
Codeberg/Community
No description provided.
Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?