Codeberg/Community
54
324
Fork
You've already forked Community
12

Citable Code / Service integration for DOI's #295

Open
opened 2020年09月26日 15:19:30 +02:00 by lhinderberger · 22 comments

In Codeberg/Documentation#56 the idea was thrown around to create an integration between Codeberg and a provider of DOI's to make code hosted on Codeberg easier citable in scientific writing.

Maybe someone is interested in developing such a tool? :)

This issue exists to track efforts to create such an integration and will be linked to from the relevant position in Codeberg Documentation.

In Codeberg/Documentation#56 [the idea was thrown around](https://codeberg.org/Codeberg/Documentation/issues/56#issuecomment-82244) to create an integration between Codeberg and a provider of DOI's to make code hosted on Codeberg easier citable in scientific writing. Maybe someone is interested in developing such a tool? :) This issue exists to track efforts to create such an integration and will be linked to from the relevant position in Codeberg Documentation.
lhinderberger changed title from (削除) Write a service integration for DOI's (削除ここまで) to Citable Code / Service integration for DOI's 2020年09月26日 15:24:23 +02:00
@lhinderberger would https://github.com/go-gitea/gitea/pull/19999 solve this issue?

@6543 Sorry, I have just relayed this issue back then when I was moderating the Documentation issue tracker, I don't know if the linked PR is a sufficient solution for the original poster's problem. CC @ivan-paleo

@6543 Sorry, I have just relayed this issue back then when I was moderating the Documentation issue tracker, I don't know if the linked PR is a sufficient solution for the original poster's problem. CC @ivan-paleo

oh ok .. just personaly have not much knowlage about "citable code" so just like to guess I i can see this pull as upstream solution or not

oh ok .. just personaly have not much knowlage about "citable code" so just like to guess I i can see this pull as upstream solution or not

I have not read the whole linked PR, but I do not think it is what I was referring to. Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo.
Currently, the only way is to download the ZIP archive of a given release of the repo and upload it to an online repository (e.g. Zenodo), and get the DOI there. That's fine, but it could be even better: a GitHub repo can be linked to a Zenodo account, so that every new release on GitHub is automatically uploaded to Zenodo ang get assigned a DOI. The only thing the user has to do is link the two accounts (GitHub and Zenodo) and the rest is done automatically. That's what I was mentioning.
This is explained in details here.

My suggestion was therefore to get in touch with the Zenodo team in order to create such a link with Codeberg repos too.

Does that make sense?

I have not read the whole linked PR, but I do not think it is what I was referring to. Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo. Currently, the only way is to download the ZIP archive of a given release of the repo and upload it to an online repository (e.g. Zenodo), and get the DOI there. That's fine, but it could be even better: a GitHub repo can be linked to a Zenodo account, so that every new release on GitHub is automatically uploaded to Zenodo ang get assigned a DOI. The only thing the user has to do is link the two accounts (GitHub and Zenodo) and the rest is done automatically. That's what I was mentioning. This is explained in details [here](https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_2.md). My suggestion was therefore to get in touch with the Zenodo team in order to create such a link with Codeberg repos too. Does that make sense?

Is there any progress with this feature? 🙂

How about this as an Woodpecker plugin?
https://github.com/jhpoelen/zenodo-upload

Zenodo API documentation:
https://developers.zenodo.org

Is there any progress with this feature? 🙂 How about this as an Woodpecker plugin? https://github.com/jhpoelen/zenodo-upload Zenodo API documentation: https://developers.zenodo.org

The expected value - which I understand is a permanent ID-URL pair in a commonly agreed-upon format - might actually be offered by a third-party from the outside in.

Do you know about the Software Heritage initiative ?

Gitea is one of the git-providing software SWH is able to ingest out of the box: one may thus submit a full instance of Gitea as well as any individual Git repository. As a matter of fact, codeberg.org is among the currently watched instances already.

See the permanlink widget at the right side of any location in any repo (e.g. Gitea on Codeberg). SWH offers permanent "SWHIDs" and companion URLs for a directory, a revision of the software or a SWH snapshot of it.

Procedure: https://www.softwareheritage.org/howto-archive-and-reference-your-code/.

The expected value - which I understand is a permanent ID-URL pair in a commonly agreed-upon format - might actually be offered by a third-party from the outside in. Do you know about the [Software Heritage initiative](https://www.softwareheritage.org/) ? Gitea is one of the git-providing software SWH is able to ingest out of the box: one may thus submit a full instance of Gitea as well as any individual Git repository. As a matter of fact, [`codeberg.org` is among the currently watched instances](https://archive.softwareheritage.org/browse/search/?q=codeberg.org&visit_type=git&with_content=true&with_visit=true) already. See the `permanlink` widget at the right side of any location in any repo (e.g. [Gitea on Codeberg](https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://codeberg.org/Codeberg/gitea)). SWH offers permanent ["SWHIDs"](https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html) and companion URLs for a directory, a revision of the software or a SWH snapshot of it. Procedure: https://www.softwareheritage.org/howto-archive-and-reference-your-code/.

Thanks @jeffix2000 for the information. I was indeed not aware of it.
But from what I've now understood, the permalink is only for files, right? So I'd have to look at SWH in more details.
Nevertheless, the procedure to link a GitHub account with a Zenodo account in order to automatically get a DOI for every release of every repository is super useful and fast and this is what I was looking for. It's nice to have a workaround, but a direct way would be even better ;)

Thanks @jeffix2000 for the information. I was indeed not aware of it. But from what I've now understood, the permalink is only for files, right? So I'd have to look at SWH in more details. Nevertheless, the procedure to link a GitHub account with a Zenodo account in order to automatically get a DOI for every release of every repository is super useful and fast and this is what I was looking for. It's nice to have a workaround, but a direct way would be even better ;)

There is SWH IDs for the full repo at a given revision (commit) or release or SWH-initiated snapshot. The IDs are hash-derived from the content itself so, as long as the repo (or its hosting Git*** instance) is enlisted at SWH, one could compute the existing-or-coming SWHID of any one of the supported objects, directly from the forge. They provide libs to do that, which could "merely" be UI/API-integrated in Gitea.

This mechanism would require only the initial enlistment at SWH (manually or by API) then no upload would be needed from the repo, only computing the ID would be enough to be able to publish a recognized permalink somewhere.

Just an idea of frugal feature achievement ;)

There is SWH IDs for the full repo at a given revision (commit) or release or SWH-initiated snapshot. The IDs are hash-derived from the content itself so, as long as the repo (or its hosting Git*** instance) is enlisted at SWH, one could compute the existing-or-coming SWHID of any one of the supported objects, directly from the forge. They provide libs to do that, which could "merely" be UI/API-integrated in Gitea. This mechanism would require only the initial enlistment at SWH (manually or by API) then _no upload would be needed_ from the repo, only computing the ID would be enough to be able to publish a recognized permalink somewhere. Just an idea of frugal feature achievement ;)

I had a look at the procedure for SWH. There is still one thing that is not clear to me: what is archived? The whole repo, releases, ...?
For my use, it is important that I can cite a specific release (or snapshot at a given date or anything in that direction). I couldn't understand whether this is possible at SWH or not.

I had a look at the procedure for SWH. There is still one thing that is not clear to me: what is archived? The whole repo, releases, ...? For my use, it is important that I can cite a specific release (or snapshot at a given date or anything in that direction). I couldn't understand whether this is possible at SWH or not.

Yes : https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers

For instance, here are the permalink to this very repository's first commit and the permalinkg to Forgejo stable release 1.17.3 tag.

Some additional doc about citing:

A light option to consider would be that, given an "SWH archival" option in the full instance or the repo administration parameters, Gitea hint a locally-computed SWH ID/URL/badge code on any SWH-supported object.

(Disclaimer: I'm not part of the SWH project; as a member of the French public Administration, I just got to know about them since they are led/supported by prominent French & European research organizations and more, and an established actor of the OpenSource community.

Side note: aside from archival and research use cases, there are interesting cybersecurity ones around software supply chain integrity.)

Yes : https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers For instance, here are [the permalink to this very repository's first commit](https://archive.softwareheritage.org/browse/revision/b3a11ce64542a0c06af20d7f8f9c730e45772d39/?origin_url=https://codeberg.org/Codeberg/Community.git&snapshot=2fea86b173e91d11c5278ad370b3e3817e108cc7) and [the permalinkg to Forgejo stable release 1.17.3 tag](https://archive.softwareheritage.org/browse/release/9e6d524040ce5377db90f778eabfb89720ae3b82/?origin_url=https://codeberg.org/forgejo/forgejo&release=v1.17.3&snapshot=4a0d799547ed6b10e30cfc4242ac067197659864). Some additional doc about citing: - https://www.softwareheritage.org/save-and-reference-research-software/ - https://www.softwareheritage.org/howto-archive-and-reference-your-code/ - https://www.softwareheritage.org/2020/01/13/the-swh-badges-are-here/ A light option to consider would be that, given an "SWH archival" option in the full instance or the repo administration parameters, Gitea hint a locally-computed SWH ID/URL/badge code on any SWH-supported object. (Disclaimer: I'm not part of the SWH project; as a member of the French public Administration, I just got to know about them since they are led/supported by prominent French & European research organizations and more, and an established actor of the OpenSource community. Side note: aside from archival and research use cases, there are interesting cybersecurity ones around software supply chain integrity.)

I suspect that developers who like codeberg.org (and not github.com) will prefer using SWHIDs rather than DOIs. Analogous to vendor-lock-in, a DOI has registrar-lock-in. By choosing a DOI to identify a specific version of software, one is effectively locking in a single DOI registrar (e.g. Zenodo). In contrast, a SHWID identifies files, directories and git commits of software that multiple archives can resolve. One such example is softwareheritage.org but there is no lock-in to that particular archive.

I suspect that developers who like codeberg.org (and not github.com) will prefer using SWHIDs rather than DOIs. Analogous to vendor-lock-in, a DOI has registrar-lock-in. By choosing a DOI to identify a specific version of software, one is effectively locking in a single DOI registrar (e.g. Zenodo). In contrast, a SHWID identifies files, directories and git commits of software that multiple archives can resolve. One such example is softwareheritage.org but there is no lock-in to that particular archive.

@castedo developers might indeed prefer SWHIDs but scientists do prefer DOIs, so both options are useful.

@castedo developers might indeed prefer SWHIDs but scientists do prefer DOIs, so both options are useful.

Just to chime in - this is the first time I have heard of SWHIDs, while I agree they seem cool/interesting, as a scientist/researcher myself, DOIs are fundamental to our workflows and not going anywhere anytime soon (for better or worse!).

Just to chime in - this is the first time I have heard of SWHIDs, while I agree they seem cool/interesting, as a scientist/researcher myself, DOIs are fundamental to our workflows and not going anywhere anytime soon (for better or worse!).

@kirkpsmith It is not clear to me what workflows you are referring to. We have many workflows each with different advantages and disadvantages. Let's call the workflow that seems to be the focus in this issue #295 the "code-on-Zenodo" workflow.

The code-on-Zenodo workflow is not fundamental to our workflows. The research software that I've worked on, which is used by a significant number of researchers in genetics, never used and has no plans to use the code-on-Zenodo workflow.

I agree that DOIs are fundamental to the workflows involving citation of published ARTICLES that present research software to a research community. Here is an example for the research software that I worked on:

http://doi.org/10.1093/genetics/iyab229

But this fundamental use of a DOI is not the code-on-Zenodo workflow.

Which workflows are the best ones is a big conversation, but I disagree with characterizations that having a DOI to code on Zenodo is what scientists have to do or that is is the best way to document to other scientists what software to use to replicate scientific results.

@kirkpsmith It is not clear to me what workflows you are referring to. We have many workflows each with different advantages and disadvantages. Let's call the workflow that seems to be the focus in this issue #295 the "code-on-Zenodo" workflow. The code-on-Zenodo workflow is not fundamental to our workflows. The research software that I've worked on, which is used by a significant number of researchers in genetics, never used and has no plans to use the code-on-Zenodo workflow. I agree that DOIs are fundamental to the workflows involving citation of published _ARTICLES_ that present research software to a research community. Here is an example for the research software that I worked on: http://doi.org/10.1093/genetics/iyab229 But this fundamental use of a DOI is not the code-on-Zenodo workflow. Which workflows are the best ones is a big conversation, but I disagree with characterizations that having a DOI to code on Zenodo is what scientists have to do or that is is the best way to document to other scientists what software to use to replicate scientific results.

I am thinking more generally of the workflows where researchers are preparing a communication (manuscript, preprint, presentation, etc) and need to cite some software or dataset (and potentially a version of an actively-developed software). Typically that has looked like the principle investigator telling whomever is actually writing the communication to "get this [thing on a git repo] a DOI and cite it in the paper", before the communication is ever submitted. This is my experience where software is not the main point of the communication, but possibly a key part for enabling reproducibility.

IMO, in trying to push academics away from GitHub and towards efforts like codeberg/forgejo, it would be better to have a comfortable transition. If I was quickly scrolling a "citeable code" section oriented at academics I would expect to first see something on DOI's, not a relatively unknown, emerging (yet potentially superior) citation system. Then, as the software is developed or datasets extended, versioned DOIs allow for specifying in future communications the versions of code/data used, while a DOI for the whole repo allows for citation of the entire repository's body of work.

Then there are the software tools that support importing references solely by DOI, such as the Zotero magic wand - until such tools are supporting SWHIDs, I feel like DOIs are more versatile:
image

as @ivan-paleo said

Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo.

I am thinking more generally of the workflows where researchers are preparing a communication (manuscript, preprint, presentation, etc) and need to cite some software or dataset (and potentially a version of an actively-developed software). Typically that has looked like the principle investigator telling whomever is actually writing the communication to "get this [thing on a git repo] a DOI and cite it in the paper", before the communication is ever submitted. This is my experience where software is not the main point of the communication, but possibly a key part for enabling reproducibility. IMO, in trying to push academics away from GitHub and towards efforts like codeberg/forgejo, it would be better to have a comfortable transition. If I was quickly scrolling a "citeable code" section oriented at academics I would expect to _first_ see something on DOI's, not a relatively unknown, emerging (yet potentially superior) citation system. Then, as the software is developed or datasets extended, versioned DOIs allow for specifying in future communications the versions of code/data used, while a DOI for the whole repo allows for citation of the entire repository's body of work. Then there are the software tools that support importing references solely by DOI, such as the Zotero magic wand - until such tools are supporting SWHIDs, I feel like DOIs are more versatile: ![image](/attachments/44beec84-6072-4ee3-bb57-234e49aca65c) as @ivan-paleo said > Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo.
7.5 KiB

@kirkpsmith Thanks for sharing details on your workflow.

I agree on one of the SWHID limitations you mention. Namely, a Zenodo DOI is a better choice than SWHID for identifying an active changing software project. A SWHID is great for a frozen snapshot and history relative to a specific release of the software.

A SWHID is not helpful for a software project which might currently be on GitHub but the maintainer might move it to Codeberg later and then maybe somewhere else. A SWHID can't track such a project whereas a Zenodo DOI can.

@kirkpsmith Thanks for sharing details on your workflow. I agree on one of the SWHID limitations you mention. Namely, a Zenodo DOI is a better choice than SWHID for identifying an active changing software project. A SWHID is great for a frozen snapshot and history relative to a specific release of the software. A SWHID is not helpful for a software project which might currently be on GitHub but the maintainer might move it to Codeberg later and then maybe somewhere else. A SWHID can't track such a project whereas a Zenodo DOI can.

In the Open Scienece community, which tries to establish data & code as first-class scientific output, there is agreement that such publications need a PID, a persistent identifier. In essence that is a reliable service that redirects a permanent URL/Identifier to a possibly changing URL where the stuff is actually published.

DOI is just one PID, but the one with the best marketing, because it is connected to the big publishing houses.

There are others, equally well suited, community maintained ones, also free or way cheaper (Cern pays for your Zenodo DOIs). E.g.

In the Open Scienece community, which tries to establish data & code as first-class scientific output, there is agreement that such publications need a PID, a *persistent identifier*. In essence that is a reliable service that redirects a permanent URL/Identifier to a possibly changing URL where the stuff is actually published. DOI is just one PID, but the one with the best marketing, because it is connected to the big publishing houses. There are others, equally well suited, community maintained ones, also free or way cheaper (Cern pays for your Zenodo DOIs). E.g. + [PURL](https://purl.archive.org/help) + [ARK](https://arks.org/) + [Handle](https://handle.net/) + [W3ID](https://w3id.org/)

Thanks @harald.vonwaldow , I had not heard of those tools. It would be nice to have a table comparing those tools you just mentioned alongside DOI, SWHID, etc, in terms of workflow, scalability, cost, governing body, # of works currently hosted, difficulty of integrating with Codeberg/ForgeJo, etc.

Thanks @harald.vonwaldow , I had not heard of those tools. It would be nice to have a table comparing those tools you just mentioned alongside DOI, SWHID, etc, in terms of workflow, scalability, cost, governing body, # of works currently hosted, difficulty of integrating with Codeberg/ForgeJo, etc.

Indeed, such an overview would be nice. I don't have one, but thanks for the headings of the table that could hold it! Also, I haven't looked in detail at the SWHID. It's not very know in my circles but looks quite powerful and reliable on the face of it.

Indeed, such an overview would be nice. I don't have one, but thanks for the headings of the table that could hold it! Also, I haven't looked in detail at the SWHID. It's not very know in my circles but looks quite powerful and reliable on the face of it.

ARKs and Handles are useful, but they require subscriptions to membership agencies. DOI-minting often does as well, particularly when organizations work with CrossRef or Datacite. But Zenodo's DOI-minting service (as noted by others) is gratis for end users, including those who use the integration with Github. It would be ideal if Codeberg could develop the same integration in partnership with Zenodo, since the integration is bi-directional.

If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there.

ARKs and Handles are useful, but they require subscriptions to membership agencies. DOI-minting often does as well, particularly when organizations work with CrossRef or Datacite. But Zenodo's DOI-minting service (as noted by others) is gratis for end users, including those who use the integration with Github. It would be ideal if Codeberg could develop the same integration in partnership with Zenodo, since the integration is bi-directional. If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there.

@mgbilby wrote in #295 (comment):

If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there.

I think that's the right way of looking at it. Many researchers currently make use of GitHub for their openly licensed research software, and they also use the GH-to-Zenodo integration for providing long-term archives of that software (for various reasons: GH doesn't guarantee that links won't break at some point, GH (or researchers) might delete their GH repos, archiving provides a guaranteed stable software version, ...). As they already release openly licensed software, these researchers would be a potential key target audience for using Codeberg, but a lack of archiving feature might make that less attractive.

Also, Zenodo itself runs as a non-commercial infrastructure for knowledge products and is based on open source software, so there's arguably a high level of value alignment with Zenodo too.

@mgbilby wrote in https://codeberg.org/Codeberg/Community/issues/295#issuecomment-9541859: > If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there. I think that's the right way of looking at it. Many researchers currently make use of GitHub for their openly licensed research software, and they also use the GH-to-Zenodo integration for providing long-term archives of that software (for various reasons: GH doesn't guarantee that links won't break at some point, GH (or researchers) might delete their GH repos, archiving provides a guaranteed stable software version, ...). As they already release openly licensed software, these researchers would be a potential key target audience for using Codeberg, but a lack of archiving feature might make that less attractive. Also, Zenodo itself runs as a non-commercial infrastructure for knowledge products and is based on open source software, so there's arguably a high level of value alignment with Zenodo too.

Since it was my idea originally, I would love to see that connection between Zenodo and Codeberg.
BTW, Zenodo now connects automatically to SWH too: https://blog.zenodo.org/2024/10/21/2024-10-21-swh/

Since it was my idea originally, I would love to see that connection between Zenodo and Codeberg. BTW, Zenodo now connects automatically to SWH too: [https://blog.zenodo.org/2024/10/21/2024-10-21-swh/](https://blog.zenodo.org/2024/10/21/2024-10-21-swh/)
Sign in to join this conversation.
No Branch/Tag specified
main
No results found.
Labels
Clear labels
accessibility

Reduces accessibility and is thus a "bug" for certain user groups on Codeberg.
bug

Something is not working the way it should. Does not concern outages.
bug
infrastructure

Errors evidently caused by infrastructure malfunctions or outages
Codeberg

This issue involves Codeberg's downstream modifications and settings and/or Codeberg's structures.
contributions welcome

Please join the discussion and consider contributing a PR!
docs

No bug, but an improvement to the docs or UI description will help
duplicate

This issue or pull request already exists
enhancement

New feature
infrastructure

Involves changes to the server setups, use `bug/infrastructure` for infrastructure-related user errors.
legal

An issue directly involving legal compliance
licence / ToS

involving questions about the ToS, especially licencing compliance
please chill
we are volunteers

Please consider editing your posts and remember that there is a human on the other side. We get that you are frustrated, but it's harder for us to help you this way.
public relations

Things related to Codeberg's external communication
question

More information is needed
question
user support

This issue contains a clearly stated problem. However, it is not clear whether we have to fix anything on Codeberg's end, but we're helping them fix it and/or find the cause.
s/Forgejo

Related to Forgejo. Please also check Forgejo's issue tracker.
s/Forgejo/migration

Migration related issues in Forgejo
s/Pages

Issues related to the Codeberg Pages feature
s/Weblate

Issue is related to the Weblate instance at https://translate.codeberg.org
s/Woodpecker

Woodpecker CI related issue
security

involves improvements to the sites security
service

Add a new service to the Codeberg ecosystem (instead of implementing into Gitea)
upstream

An open issue or pull request to an upstream repository to fix this issue (partially or completely) exists (i.e. Gitea, Forgejo, etc.)
wontfix

Codeberg's current set of contributors are not planning to spend time on delegating this issue.
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
10 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Codeberg/Community#295
Reference in a new issue
Codeberg/Community
No description provided.
Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?