Codeberg/Community

Fork 12

Code Issues 422 Activity

Citable Code / Service integration for DOI's #295

New issue

Open

opened 2020年09月26日 15:19:30 +02:00 by lhinderberger · 22 comments

lhinderberger commented

2020年09月26日 15:19:30 +02:00

Copy link

In Codeberg/Documentation#56 the idea was thrown around to create an integration between Codeberg and a provider of DOI's to make code hosted on Codeberg easier citable in scientific writing.

Maybe someone is interested in developing such a tool? :)

This issue exists to track efforts to create such an integration and will be linked to from the relevant position in Codeberg Documentation.

In Codeberg/Documentation#56 [the idea was thrown around](https://codeberg.org/Codeberg/Documentation/issues/56#issuecomment-82244) to create an integration between Codeberg and a provider of DOI's to make code hosted on Codeberg easier citable in scientific writing. Maybe someone is interested in developing such a tool? :) This issue exists to track efforts to create such an integration and will be linked to from the relevant position in Codeberg Documentation.

👍 15

lhinderberger added the

enhancement

contributions welcome

labels

2020年09月26日 15:19:30 +02:00

lhinderberger referenced this issue from Codeberg/Documentation

2020年09月26日 15:20:31 +02:00

Citable Code #81

lhinderberger changed title from ~~(削除) Write a service integration for DOI's (削除ここまで)~~ to Citable Code / Service integration for DOI's

2020年09月26日 15:24:23 +02:00

6543 commented

2022年08月09日 17:52:56 +02:00

Copy link

@lhinderberger would https://github.com/go-gitea/gitea/pull/19999 solve this issue?

lhinderberger commented

2022年08月09日 23:32:45 +02:00

Author

Copy link

@6543 Sorry, I have just relayed this issue back then when I was moderating the Documentation issue tracker, I don't know if the linked PR is a sufficient solution for the original poster's problem. CC @ivan-paleo

6543 commented

2022年08月10日 01:16:28 +02:00

Copy link

oh ok .. just personaly have not much knowlage about "citable code" so just like to guess I i can see this pull as upstream solution or not

ivan-paleo commented

2022年08月11日 10:23:07 +02:00

Copy link

I have not read the whole linked PR, but I do not think it is what I was referring to. Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo.
Currently, the only way is to download the ZIP archive of a given release of the repo and upload it to an online repository (e.g. Zenodo), and get the DOI there. That's fine, but it could be even better: a GitHub repo can be linked to a Zenodo account, so that every new release on GitHub is automatically uploaded to Zenodo ang get assigned a DOI. The only thing the user has to do is link the two accounts (GitHub and Zenodo) and the rest is done automatically. That's what I was mentioning.
This is explained in details here.

My suggestion was therefore to get in touch with the Zenodo team in order to create such a link with Codeberg repos too.

Does that make sense?

I have not read the whole linked PR, but I do not think it is what I was referring to. Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo. Currently, the only way is to download the ZIP archive of a given release of the repo and upload it to an online repository (e.g. Zenodo), and get the DOI there. That's fine, but it could be even better: a GitHub repo can be linked to a Zenodo account, so that every new release on GitHub is automatically uploaded to Zenodo ang get assigned a DOI. The only thing the user has to do is link the two accounts (GitHub and Zenodo) and the rest is done automatically. That's what I was mentioning. This is explained in details [here](https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_2.md). My suggestion was therefore to get in touch with the Zenodo team in order to create such a link with Codeberg repos too. Does that make sense?

👍 5

vivi90 commented

2022年09月10日 01:34:52 +02:00

Copy link

Is there any progress with this feature? 🙂

How about this as an Woodpecker plugin?
https://github.com/jhpoelen/zenodo-upload

Zenodo API documentation:
https://developers.zenodo.org

Is there any progress with this feature? 🙂 How about this as an Woodpecker plugin? https://github.com/jhpoelen/zenodo-upload Zenodo API documentation: https://developers.zenodo.org

👍 1

jeffix2000 commented

2023年03月07日 20:29:03 +01:00

Copy link

The expected value - which I understand is a permanent ID-URL pair in a commonly agreed-upon format - might actually be offered by a third-party from the outside in.

Do you know about the Software Heritage initiative ?

Gitea is one of the git-providing software SWH is able to ingest out of the box: one may thus submit a full instance of Gitea as well as any individual Git repository. As a matter of fact, codeberg.org is among the currently watched instances already.

See the permanlink widget at the right side of any location in any repo (e.g. Gitea on Codeberg). SWH offers permanent "SWHIDs" and companion URLs for a directory, a revision of the software or a SWH snapshot of it.

Procedure: https://www.softwareheritage.org/howto-archive-and-reference-your-code/.

The expected value - which I understand is a permanent ID-URL pair in a commonly agreed-upon format - might actually be offered by a third-party from the outside in. Do you know about the [Software Heritage initiative](https://www.softwareheritage.org/) ? Gitea is one of the git-providing software SWH is able to ingest out of the box: one may thus submit a full instance of Gitea as well as any individual Git repository. As a matter of fact, [`codeberg.org` is among the currently watched instances](https://archive.softwareheritage.org/browse/search/?q=codeberg.org&visit_type=git&with_content=true&with_visit=true) already. See the `permanlink` widget at the right side of any location in any repo (e.g. [Gitea on Codeberg](https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://codeberg.org/Codeberg/gitea)). SWH offers permanent ["SWHIDs"](https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html) and companion URLs for a directory, a revision of the software or a SWH snapshot of it. Procedure: https://www.softwareheritage.org/howto-archive-and-reference-your-code/.

ivan-paleo commented

2023年03月08日 08:56:53 +01:00

Copy link

Thanks @jeffix2000 for the information. I was indeed not aware of it.
But from what I've now understood, the permalink is only for files, right? So I'd have to look at SWH in more details.
Nevertheless, the procedure to link a GitHub account with a Zenodo account in order to automatically get a DOI for every release of every repository is super useful and fast and this is what I was looking for. It's nice to have a workaround, but a direct way would be even better ;)

Thanks @jeffix2000 for the information. I was indeed not aware of it. But from what I've now understood, the permalink is only for files, right? So I'd have to look at SWH in more details. Nevertheless, the procedure to link a GitHub account with a Zenodo account in order to automatically get a DOI for every release of every repository is super useful and fast and this is what I was looking for. It's nice to have a workaround, but a direct way would be even better ;)

jeffix2000 commented

2023年03月08日 22:46:46 +01:00

Copy link

This mechanism would require only the initial enlistment at SWH (manually or by API) then no upload would be needed from the repo, only computing the ID would be enough to be able to publish a recognized permalink somewhere.

Just an idea of frugal feature achievement ;)

There is SWH IDs for the full repo at a given revision (commit) or release or SWH-initiated snapshot. The IDs are hash-derived from the content itself so, as long as the repo (or its hosting Git*** instance) is enlisted at SWH, one could compute the existing-or-coming SWHID of any one of the supported objects, directly from the forge. They provide libs to do that, which could "merely" be UI/API-integrated in Gitea. This mechanism would require only the initial enlistment at SWH (manually or by API) then _no upload would be needed_ from the repo, only computing the ID would be enough to be able to publish a recognized permalink somewhere. Just an idea of frugal feature achievement ;)

ivan-paleo commented

2023年03月10日 07:53:26 +01:00

Copy link

I had a look at the procedure for SWH. There is still one thing that is not clear to me: what is archived? The whole repo, releases, ...?
For my use, it is important that I can cite a specific release (or snapshot at a given date or anything in that direction). I couldn't understand whether this is possible at SWH or not.

I had a look at the procedure for SWH. There is still one thing that is not clear to me: what is archived? The whole repo, releases, ...? For my use, it is important that I can cite a specific release (or snapshot at a given date or anything in that direction). I couldn't understand whether this is possible at SWH or not.

jeffix2000 commented

2023年03月10日 12:07:30 +01:00

Copy link

Yes : https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers

For instance, here are the permalink to this very repository's first commit and the permalinkg to Forgejo stable release 1.17.3 tag.

Some additional doc about citing:

A light option to consider would be that, given an "SWH archival" option in the full instance or the repo administration parameters, Gitea hint a locally-computed SWH ID/URL/badge code on any SWH-supported object.

(Disclaimer: I'm not part of the SWH project; as a member of the French public Administration, I just got to know about them since they are led/supported by prominent French & European research organizations and more, and an established actor of the OpenSource community.

Side note: aside from archival and research use cases, there are interesting cybersecurity ones around software supply chain integrity.)

Yes : https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers For instance, here are [the permalink to this very repository's first commit](https://archive.softwareheritage.org/browse/revision/b3a11ce64542a0c06af20d7f8f9c730e45772d39/?origin_url=https://codeberg.org/Codeberg/Community.git&snapshot=2fea86b173e91d11c5278ad370b3e3817e108cc7) and [the permalinkg to Forgejo stable release 1.17.3 tag](https://archive.softwareheritage.org/browse/release/9e6d524040ce5377db90f778eabfb89720ae3b82/?origin_url=https://codeberg.org/forgejo/forgejo&release=v1.17.3&snapshot=4a0d799547ed6b10e30cfc4242ac067197659864). Some additional doc about citing: - https://www.softwareheritage.org/save-and-reference-research-software/ - https://www.softwareheritage.org/howto-archive-and-reference-your-code/ - https://www.softwareheritage.org/2020/01/13/the-swh-badges-are-here/ A light option to consider would be that, given an "SWH archival" option in the full instance or the repo administration parameters, Gitea hint a locally-computed SWH ID/URL/badge code on any SWH-supported object. (Disclaimer: I'm not part of the SWH project; as a member of the French public Administration, I just got to know about them since they are led/supported by prominent French & European research organizations and more, and an established actor of the OpenSource community. Side note: aside from archival and research use cases, there are interesting cybersecurity ones around software supply chain integrity.)

👍 1

castedo commented

2023年08月25日 14:54:57 +02:00

Copy link

I suspect that developers who like codeberg.org (and not github.com) will prefer using SWHIDs rather than DOIs. Analogous to vendor-lock-in, a DOI has registrar-lock-in. By choosing a DOI to identify a specific version of software, one is effectively locking in a single DOI registrar (e.g. Zenodo). In contrast, a SHWID identifies files, directories and git commits of software that multiple archives can resolve. One such example is softwareheritage.org but there is no lock-in to that particular archive.

👍 2

ivan-paleo commented

2023年08月28日 09:50:39 +02:00

Copy link

@castedo developers might indeed prefer SWHIDs but scientists do prefer DOIs, so both options are useful.

kirkpsmith commented

2023年11月06日 07:08:21 +01:00

Copy link

Just to chime in - this is the first time I have heard of SWHIDs, while I agree they seem cool/interesting, as a scientist/researcher myself, DOIs are fundamental to our workflows and not going anywhere anytime soon (for better or worse!).

castedo commented

2023年11月06日 14:31:43 +01:00

Copy link

The code-on-Zenodo workflow is not fundamental to our workflows. The research software that I've worked on, which is used by a significant number of researchers in genetics, never used and has no plans to use the code-on-Zenodo workflow.

I agree that DOIs are fundamental to the workflows involving citation of published ARTICLES that present research software to a research community. Here is an example for the research software that I worked on:

http://doi.org/10.1093/genetics/iyab229

But this fundamental use of a DOI is not the code-on-Zenodo workflow.

Which workflows are the best ones is a big conversation, but I disagree with characterizations that having a DOI to code on Zenodo is what scientists have to do or that is is the best way to document to other scientists what software to use to replicate scientific results.

@kirkpsmith It is not clear to me what workflows you are referring to. We have many workflows each with different advantages and disadvantages. Let's call the workflow that seems to be the focus in this issue #295 the "code-on-Zenodo" workflow. The code-on-Zenodo workflow is not fundamental to our workflows. The research software that I've worked on, which is used by a significant number of researchers in genetics, never used and has no plans to use the code-on-Zenodo workflow. I agree that DOIs are fundamental to the workflows involving citation of published _ARTICLES_ that present research software to a research community. Here is an example for the research software that I worked on: http://doi.org/10.1093/genetics/iyab229 But this fundamental use of a DOI is not the code-on-Zenodo workflow. Which workflows are the best ones is a big conversation, but I disagree with characterizations that having a DOI to code on Zenodo is what scientists have to do or that is is the best way to document to other scientists what software to use to replicate scientific results.

kirkpsmith commented

2023年11月10日 11:09:31 +01:00

Copy link

IMO, in trying to push academics away from GitHub and towards efforts like codeberg/forgejo, it would be better to have a comfortable transition. If I was quickly scrolling a "citeable code" section oriented at academics I would expect to first see something on DOI's, not a relatively unknown, emerging (yet potentially superior) citation system. Then, as the software is developed or datasets extended, versioned DOIs allow for specifying in future communications the versions of code/data used, while a DOI for the whole repo allows for citation of the entire repository's body of work.

Then there are the software tools that support importing references solely by DOI, such as the Zotero magic wand - until such tools are supporting SWHIDs, I feel like DOIs are more versatile:
image

as @ivan-paleo said

Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo.

I am thinking more generally of the workflows where researchers are preparing a communication (manuscript, preprint, presentation, etc) and need to cite some software or dataset (and potentially a version of an actively-developed software). Typically that has looked like the principle investigator telling whomever is actually writing the communication to "get this [thing on a git repo] a DOI and cite it in the paper", before the communication is ever submitted. This is my experience where software is not the main point of the communication, but possibly a key part for enabling reproducibility. IMO, in trying to push academics away from GitHub and towards efforts like codeberg/forgejo, it would be better to have a comfortable transition. If I was quickly scrolling a "citeable code" section oriented at academics I would expect to _first_ see something on DOI's, not a relatively unknown, emerging (yet potentially superior) citation system. Then, as the software is developed or datasets extended, versioned DOIs allow for specifying in future communications the versions of code/data used, while a DOI for the whole repo allows for citation of the entire repository's body of work. Then there are the software tools that support importing references solely by DOI, such as the Zotero magic wand - until such tools are supporting SWHIDs, I feel like DOIs are more versatile: ![image](/attachments/44beec84-6072-4ee3-bb57-234e49aca65c) as @ivan-paleo said > Being able to cite (Bibtex) a repository is nice, but what is more important (at least for scientists) is to have a DOI for the repo.

image.png

7.5 KiB

castedo commented

2023年11月23日 23:46:04 +01:00

Copy link

@kirkpsmith Thanks for sharing details on your workflow.

I agree on one of the SWHID limitations you mention. Namely, a Zenodo DOI is a better choice than SWHID for identifying an active changing software project. A SWHID is great for a frozen snapshot and history relative to a specific release of the software.

A SWHID is not helpful for a software project which might currently be on GitHub but the maintainer might move it to Codeberg later and then maybe somewhere else. A SWHID can't track such a project whereas a Zenodo DOI can.

@kirkpsmith Thanks for sharing details on your workflow. I agree on one of the SWHID limitations you mention. Namely, a Zenodo DOI is a better choice than SWHID for identifying an active changing software project. A SWHID is great for a frozen snapshot and history relative to a specific release of the software. A SWHID is not helpful for a software project which might currently be on GitHub but the maintainer might move it to Codeberg later and then maybe somewhere else. A SWHID can't track such a project whereas a Zenodo DOI can.

👍 1

harald.vonwaldow commented

2024年12月01日 06:44:19 +01:00

Copy link

In the Open Scienece community, which tries to establish data & code as first-class scientific output, there is agreement that such publications need a PID, a persistent identifier. In essence that is a reliable service that redirects a permanent URL/Identifier to a possibly changing URL where the stuff is actually published.

DOI is just one PID, but the one with the best marketing, because it is connected to the big publishing houses.

There are others, equally well suited, community maintained ones, also free or way cheaper (Cern pays for your Zenodo DOIs). E.g.

In the Open Scienece community, which tries to establish data & code as first-class scientific output, there is agreement that such publications need a PID, a *persistent identifier*. In essence that is a reliable service that redirects a permanent URL/Identifier to a possibly changing URL where the stuff is actually published. DOI is just one PID, but the one with the best marketing, because it is connected to the big publishing houses. There are others, equally well suited, community maintained ones, also free or way cheaper (Cern pays for your Zenodo DOIs). E.g. + [PURL](https://purl.archive.org/help) + [ARK](https://arks.org/) + [Handle](https://handle.net/) + [W3ID](https://w3id.org/)

👍 1

kirkpsmith commented

2024年12月01日 21:51:51 +01:00

Copy link

Thanks @harald.vonwaldow , I had not heard of those tools. It would be nice to have a table comparing those tools you just mentioned alongside DOI, SWHID, etc, in terms of workflow, scalability, cost, governing body, # of works currently hosted, difficulty of integrating with Codeberg/ForgeJo, etc.

harald.vonwaldow commented

2024年12月01日 23:16:57 +01:00

Copy link

Indeed, such an overview would be nice. I don't have one, but thanks for the headings of the table that could hold it! Also, I haven't looked in detail at the SWHID. It's not very know in my circles but looks quite powerful and reliable on the face of it.

mgbilby commented

2026年01月04日 20:12:51 +01:00

Copy link

If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there.

ARKs and Handles are useful, but they require subscriptions to membership agencies. DOI-minting often does as well, particularly when organizations work with CrossRef or Datacite. But Zenodo's DOI-minting service (as noted by others) is gratis for end users, including those who use the integration with Github. It would be ideal if Codeberg could develop the same integration in partnership with Zenodo, since the integration is bi-directional. If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there.

👍 1

gedankenstuecke commented

2026年01月04日 23:54:15 +01:00

Member

Copy link

@mgbilby wrote in #295 (comment):

If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there.

I think that's the right way of looking at it. Many researchers currently make use of GitHub for their openly licensed research software, and they also use the GH-to-Zenodo integration for providing long-term archives of that software (for various reasons: GH doesn't guarantee that links won't break at some point, GH (or researchers) might delete their GH repos, archiving provides a guaranteed stable software version, ...). As they already release openly licensed software, these researchers would be a potential key target audience for using Codeberg, but a lack of archiving feature might make that less attractive.

Also, Zenodo itself runs as a non-commercial infrastructure for knowledge products and is based on open source software, so there's arguably a high level of value alignment with Zenodo too.

@mgbilby wrote in https://codeberg.org/Codeberg/Community/issues/295#issuecomment-9541859: > If there is sufficient and sustained interest in the Codeberg community (as it appears there is), then I'd recommend Codeberg's leadership reach out to Zenodo developers to ask what next steps would be. It may also be that DH infrastructure orgs such as LIBER Europe would have an interest in sponsoring and/or publicizing a development sprint. Github (purchased/owned by Microsoft) shouldn't be the only large-scale DOI-minting public code repository and archiving service out there. I think that's the right way of looking at it. Many researchers currently make use of GitHub for their openly licensed research software, and they also use the GH-to-Zenodo integration for providing long-term archives of that software (for various reasons: GH doesn't guarantee that links won't break at some point, GH (or researchers) might delete their GH repos, archiving provides a guaranteed stable software version, ...). As they already release openly licensed software, these researchers would be a potential key target audience for using Codeberg, but a lack of archiving feature might make that less attractive. Also, Zenodo itself runs as a non-commercial infrastructure for knowledge products and is based on open source software, so there's arguably a high level of value alignment with Zenodo too.

ivan-paleo commented

2026年01月08日 10:05:47 +01:00

Copy link

Since it was my idea originally, I would love to see that connection between Zenodo and Codeberg.
BTW, Zenodo now connects automatically to SWH too: https://blog.zenodo.org/2024/10/21/2024-10-21-swh/

Since it was my idea originally, I would love to see that connection between Zenodo and Codeberg. BTW, Zenodo now connects automatically to SWH too: [https://blog.zenodo.org/2024/10/21/2024-10-21-swh/](https://blog.zenodo.org/2024/10/21/2024-10-21-swh/)

No Branch/Tag specified

Branches Tags

main

Labels

Clear labels

accessibility

Reduces accessibility and is thus a "bug" for certain user groups on Codeberg.

bug

Something is not working the way it should. Does not concern outages.

bug

infrastructure

Errors evidently caused by infrastructure malfunctions or outages

Codeberg

This issue involves Codeberg's downstream modifications and settings and/or Codeberg's structures.

contributions welcome

Please join the discussion and consider contributing a PR!

docs

No bug, but an improvement to the docs or UI description will help

duplicate

This issue or pull request already exists

enhancement

New feature

infrastructure

Involves changes to the server setups, use `bug/infrastructure` for infrastructure-related user errors.

legal

An issue directly involving legal compliance

licence / ToS

involving questions about the ToS, especially licencing compliance

please chill

we are volunteers

Please consider editing your posts and remember that there is a human on the other side. We get that you are frustrated, but it's harder for us to help you this way.

public relations

question

More information is needed

question

user support

This issue contains a clearly stated problem. However, it is not clear whether we have to fix anything on Codeberg's end, but we're helping them fix it and/or find the cause.

s/Forgejo

s/Forgejo/migration

Migration related issues in Forgejo

s/Pages

s/Weblate

s/Woodpecker

Woodpecker CI related issue

security

involves improvements to the sites security

service

Add a new service to the Codeberg ecosystem (instead of implementing into Gitea)

upstream

An open issue or pull request to an upstream repository to fix this issue (partially or completely) exists (i.e. Gitea, Forgejo, etc.)

wontfix

Codeberg's current set of contributors are not planning to spend time on delegating this issue.

No labels

contributions welcome

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

10 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

Codeberg/Community#295

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?