Codeberg/Community
54
325
Fork
You've already forked Community
12

How to render jupyter notebook on Codeberg properly? #671

Open
opened 2022年07月30日 20:59:07 +02:00 by penguinsfly · 28 comments

How would jupyter notebooks be rendered properly instead of a JSON format?

I was recommended to look at this. From my naive understanding, it is a configuration on the backend of gitea (the app.ini file) instead of a user/repo-specific configuration, is that correct?

Thanks!

How would `jupyter` notebooks be rendered properly instead of a JSON format? I was recommended to look at [this](https://blog.gitea.io/2022/04/how-to-render-jupyter-notebooks-on-gitea/). From my naive understanding, it is a configuration on the backend of [`gitea`](https://docs.gitea.io/en-us/external-renderers/#example-jupyter-notebook) (the `app.ini` file) instead of a user/repo-specific configuration, is that correct? Thanks!

Hi @penguinsfly!

You're indeed right that this needs to be configured within the app.ini and cannot be achieved by a repo configuration.

Looking at the blog post it is a bit hacky and requires to install jupyter (and it's dependencies) on the server, which are quite a lot. Unfortunatly it doesn't seem to be that other software exists that also has the functionality to have .ipynb -> html.

Hi @penguinsfly! You're indeed right that this needs to be configured within the app.ini and cannot be achieved by a repo configuration. Looking at the [blog post](https://blog.gitea.io/2022/04/how-to-render-jupyter-notebooks-on-gitea/) it is a *bit hacky* and requires to install jupyter (and it's dependencies) on the server, which are quite a lot. Unfortunatly it doesn't seem to be that other software exists that also has the functionality to have `.ipynb` -> `html`.

Thank you for your reply. I agree with you that jupyter sometimes can be quite a lot.

Just for reference, I created a venv environment (~20M initially) and just pip install nbconvert and it turned to be (~50MB), so I'm guessing it's around 30MB.

There's also nbviewer (I think nbconvert is used backend), which renders a notebook from the raw link (e.g., see here, I just grabbed a random link on codeberg). So I wonder whether it is possible to fetch the html within the body from that. Or just to somehow automatically attach a link pointing towards the nbviewer site with the raw link appended.

Another solution is pandoc, which might be lighter. I usually use it for md convesion but I believe it can also convert notebooks, though one might need a few tweaks/configurations to get it to look right.

Do you think any of the above suggestions might be possible? Or do you anticipate some future solutions any time soon?

Thank you for your reply. I agree with you that `jupyter` sometimes can be quite a lot. Just for reference, I created a `venv` environment (~20M initially) and just `pip install nbconvert` and it turned to be (~50MB), so I'm guessing it's around 30MB. There's also [`nbviewer`](https://nbviewer.org/) (I think `nbconvert` is used backend), which renders a notebook from the raw link (e.g., see [here](https://nbviewer.org/urls/codeberg.org/lcsrr/jupyter_notebooks/raw/branch/main/Univesp/Introdu%C3%A7%C3%A3o%20a%20Ci%C3%AAncia%20de%20Dados/Pandas/Pandas_intro.ipynb), I just grabbed a random [link](https://codeberg.org/lcsrr/jupyter_notebooks/raw/branch/main/Univesp/Introdu%C3%A7%C3%A3o%20a%20Ci%C3%AAncia%20de%20Dados/Pandas/Pandas_intro.ipynb) on codeberg). So I wonder whether it is possible to fetch the `html` within the body from that. Or just to somehow automatically attach a link pointing towards the `nbviewer` site with the raw link appended. Another solution is `pandoc`, which might be [lighter](https://github.com/jgm/pandoc/releases/tag/2.18). I usually use it for `md` convesion but I believe it can also convert notebooks, though one might need a few tweaks/configurations to get it to look right. Do you think any of the above suggestions might be possible? Or do you anticipate some future solutions any time soon?
Owner
Copy link

I bookmarked that blog post quite a while ago. I didn't want to frickle with the setup yet.

What would be optimal in my opinion: Have (docker?) containers where everything is installed, and instead of calling pandoc or whatever, we're calling containertool run in-container-xy pandoc or something like that. Anyone interested to work on that with us?

I bookmarked that blog post *quite a while ago*. I didn't want to frickle with the setup yet. What would be optimal in my opinion: Have (docker?) containers where everything is installed, and instead of calling `pandoc` or whatever, we're calling `containertool run in-container-xy pandoc` or something like that. Anyone interested to work on that with us?

Anyone interested to work on that with us?

Happy to take a stab at it(add it to the backlog), seems like using something like pandoc container is useful in the long-term as it also provides a lot of other transformations.

> Anyone interested to work on that with us? Happy to take a stab at it(add it to the backlog), seems like using something like pandoc container is useful in the long-term as it also provides a lot of other transformations.

I noticed that Codeberg already seems to be using pandoc for .rst files. Codeberg-Infrastructure/build-deploy-gitea@77a0e2828f/etc/gitea/conf/app.ini (L183)

So it seems like just a small configuration part to add support for this.

I noticed that Codeberg already seems to be using pandoc for `.rst` files. https://codeberg.org/Codeberg-Infrastructure/build-deploy-gitea/src/commit/77a0e2828f8a78df42493da8f155b412cc7e71cf/etc/gitea/conf/app.ini#L183 So it seems like just a small configuration part to add support for this.
Owner
Copy link

Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead.

Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead.

Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead.

Hmm okay, either way I had some fiddling with pandoc in how they convert it into HTML. It seems that they don't add any syntax highlighting classes/language when it's rendered "directly". It's only added when you specify to be self contained(--self-contained). As well for images, the conversion to data:...URI's is when specifying the self containment as well. So while looking into just using a <iframe> via Gitea. It became clear that unless you fully trust the output to not contain malicious scripts, it's not possible to use the <iframe>.

> Yes, but we actually don't want to use pandoc directly on the server, but use containers for that instead. Hmm okay, either way I had some fiddling with pandoc in how they convert it into HTML. It seems that they don't add any syntax highlighting classes/language when it's rendered "directly". It's only added when you specify to be self contained(`--self-contained`). As well for images, the conversion to `data:...`URI's is when specifying the self containment as well. So while looking into just using a `<iframe>` via Gitea. It became clear that unless you fully trust the output to not contain malicious scripts, it's not possible to use the `<iframe>`.

So I played around with the docker images for a bit and found that nbconvert, though not generalizable to other formats, might not be so bad (relatively). This is my first time playing around with docker so excuse my naive approach.

For pandoc, pandoc/minimal seems to be the smallest one available (?) and it's 79.3 MB. Which pandoc container source is currently used for gitea/codeberg?

Anyway, like @Gusted said, it needs additional configuration and more tinkering to get things to look right. And I haven't found a source for a template yet.

With nbconvert (i.e. dev:nbconvert-alpine), I just used the python:3.8-alpine then installed nbconvert and it was around 83.9MB (initially 46.8MB), so not that far from pandoc/minimal. And I tested with this demo file on nbviewer, which even without additional installations of pandoc somehow renders the math parts quite nicely (unsure how).

Here's what I used:

# DockerfileFROMpython:3.8-alpineRUN pip install --no-cache-dir -U nbconvertWORKDIR/dataENTRYPOINT ["jupyter", "nbconvert"]
# build 
docker build -t dev:nbconvert-alpine -f Dockerfile docker-images
# run with specific file or multiple ones
docker run --rm -v "`pwd`:/data" --user `id -u`:`id -g` \ # I follow what pandoc did
 dev:nbconvert-alpine --to html <path/to/ipynb-file or path/to/*.ipynb>

Here's the docker images output

REPOSITORY TAG SIZE
dev nbconvert-alpine 83.9MB
pandoc/minimal latest 79.3MB
pandoc/core latest 371MB
python 3.8-alpine 46.8MB
python 3.8-slim 124MB

As for timing, with the same file, pandoc/minimal took a bit less than a second while nbconvert took around 3 seconds. Of course, one would probably have to set a file-size and timing limit for conversion to prevent nbconvert from taking too long or too much space.

So I played around with the docker images for a bit and found that `nbconvert`, though not generalizable to other formats, might not be so bad (relatively). This is my first time playing around with docker so excuse my naive approach. For `pandoc`, [`pandoc/minimal`](https://hub.docker.com/r/pandoc/minimal) seems to be the smallest one available (?) and it's 79.3 MB. Which `pandoc` container source is currently used for `gitea/codeberg`? Anyway, like @Gusted said, it needs additional configuration and more tinkering to get things to look right. And I haven't found a source for a template yet. With `nbconvert` (i.e. `dev:nbconvert-alpine`), I just used the `python:3.8-alpine` then installed `nbconvert` and it was around 83.9MB (initially 46.8MB), so not that far from `pandoc/minimal`. And I tested with this [demo file on nbviewer](https://nbviewer.org/github/jrjohansson/qutip-lectures/blob/master/Lecture-1-Jaynes-Cumming-model.ipynb), which even without additional installations of `pandoc` somehow renders the math parts quite nicely (unsure how). Here's what I used: ``` Dockerfile # Dockerfile FROM python:3.8-alpine RUN pip install --no-cache-dir -U nbconvert WORKDIR /data ENTRYPOINT ["jupyter", "nbconvert"] ``` ``` bash # build docker build -t dev:nbconvert-alpine -f Dockerfile docker-images # run with specific file or multiple ones docker run --rm -v "`pwd`:/data" --user `id -u`:`id -g` \ # I follow what pandoc did dev:nbconvert-alpine --to html <path/to/ipynb-file or path/to/*.ipynb> ``` Here's the `docker images` output ``` REPOSITORY TAG SIZE dev nbconvert-alpine 83.9MB pandoc/minimal latest 79.3MB pandoc/core latest 371MB python 3.8-alpine 46.8MB python 3.8-slim 124MB ``` As for timing, with the same file, `pandoc/minimal` took a bit less than a second while `nbconvert` took around 3 seconds. Of course, one would probably have to set a file-size and timing limit for conversion to prevent `nbconvert` from taking too long or too much space.

FWIW, we will need this PR. Otherwise we couldn't use iframe to render the HTML.

FWIW, we will need [this PR](https://github.com/go-gitea/gitea/pull/20180). Otherwise we couldn't use iframe to render the HTML.

What about letting the client browser render the preview using nbviewer.js? This way no backend is needed. Perhaps not even an iframe.
https://github.com/kokes/nbviewer.js/

What about letting the client browser render the preview using `nbviewer.js`? This way no backend is needed. Perhaps not even an iframe. https://github.com/kokes/nbviewer.js/

@Gusted it seems the PR you mentioned above was closed about a month ago due to being stale :( Do you still think it's still possible without it or maybe if there is some alternative? Would what @bshtmichielsen suggested work?

@Gusted it seems the PR you mentioned above was closed about a month ago due to being stale :( Do you still think it's still possible without it or maybe if there is some alternative? Would what @bshtmichielsen suggested work?

A lot has changed since I first looked at it, I am not sure if that PR is still needed. Maybe it's possible to avoid it.

A lot has changed since I first looked at it, I am not sure if that PR is still needed. Maybe it's possible to avoid it.

Hi all, some additional info. I am considering to move all materials that I use for my machine learning classes at Fontys University of Applied Sciences from other git providers to CodeBerg. As a result many students will visit this site, and I am inclined to believe this also means publicity for CodeBerg as well as awareness on EU hosting and principles to many Dutch as well as international students. Currently, the only thing that is keeping me from moving my course materials is this issue (preview of Jupyter notebooks) as, during classes, I preview my notebooks to my students when explaining things. Hence my interest in this issue and suggestion for a possible fix. Are there any plans or outlooks on this issue?

Hi all, some additional info. I am considering to move all materials that I use for my machine learning classes at Fontys University of Applied Sciences from other git providers to CodeBerg. As a result many students will visit this site, and I am inclined to believe this also means publicity for CodeBerg as well as awareness on EU hosting and principles to many Dutch as well as international students. Currently, the only thing that is keeping me from moving my course materials is this issue (preview of Jupyter notebooks) as, during classes, I preview my notebooks to my students when explaining things. Hence my interest in this issue and suggestion for a possible fix. Are there any plans or outlooks on this issue?

@bshtmichielsen wrote in #671 (comment):

Are there any plans or outlooks on this issue?

Not really. Rendering it as a HTML is not trivial. Pandoc's output is no better than converting it to markdown and rendering that and jupyter nbconvert seems to allow user-specified HTML and thus would need to be isolated in some way and that's not a available option really.

There's the option to convert it to markdown and render that or convert it to PDF and show that.

@bshtmichielsen wrote in https://codeberg.org/Codeberg/Community/issues/671#issuecomment-7150345: > Are there any plans or outlooks on this issue? Not really. Rendering it as a HTML is not trivial. Pandoc's output is no better than converting it to markdown and rendering that and jupyter nbconvert seems to allow user-specified HTML and thus would need to be isolated in some way and that's not a available option really. There's the option to convert it to markdown and render that or convert it to PDF and show that.

@bshtmichielsen you can try using jupyterbook with CI so it can be displayed as a static page. I did this for the majority of my nb-related repos, and it takes a bit of time to set up but very doable.

@bshtmichielsen you can try using jupyterbook with CI so it can be displayed as a static page. I did this for the majority of my nb-related repos, and it takes a bit of time to set up but very doable.

Alternatively, there is also https://ipynb.js.org/, which one can feed raw ipynb URLs for rendering!

Alternatively, there is also https://ipynb.js.org/, which one can feed `raw` ipynb URLs for rendering!

Nevermind: Unfortunately, it seems like the Codeberg spam protection rules currently block fetching those raw URLs. I'm not sure if unblocking those would be an option, @Gusted?

Nevermind: Unfortunately, it seems like the Codeberg spam protection rules currently block fetching those raw URLs. I'm not sure if unblocking those would be an option, @Gusted?

The API url can be used instead. E.g. https://codeberg.org/api/v1/repos/sergedroz/Jupyter_Training/raw/pandas/Lesson_01.ipynb

This is because CORS is not enabled for the web routes, mostly for security reasons. But it is for the API routes :)

The API url can be used instead. E.g. https://codeberg.org/api/v1/repos/sergedroz/Jupyter_Training/raw/pandas/Lesson_01.ipynb This is because CORS is not enabled for the web routes, mostly for security reasons. But it is for the API routes :)

Ah, perfect, then that could be a good interim solution (at least for my use cases!)

Ah, perfect, then that could be a good interim solution (at least for my use cases!)

It seems that jirutka/ipynb2html has an MIT license. Possibly there is a way to integrate it with the current CodeBerg UI such that if one clicks an .ipynb file the preview shows the rendered output, rather than the currently shown json? In a similar way as the UI now shows rendered markdown.

It seems that [jirutka/ipynb2html](https://github.com/jirutka/ipynb2html) has an MIT license. Possibly there is a way to integrate it with the current CodeBerg UI such that if one clicks an .ipynb file the preview shows the rendered output, rather than the currently shown json? In a similar way as the UI now shows rendered markdown.

@bshtmichielsen wrote in #671 (comment):

It seems that jirutka/ipynb2html has an MIT license. Possibly there is a way to integrate it with the current CodeBerg UI such that if one clicks an .ipynb file the preview shows the rendered output, rather than the currently shown json? In a similar way as the UI now shows rendered markdown.

Seems like we cannot use it, https://github.com/jirutka/ipynb2html/issues/40

@bshtmichielsen wrote in https://codeberg.org/Codeberg/Community/issues/671#issuecomment-7177567: > It seems that [jirutka/ipynb2html](https://github.com/jirutka/ipynb2html) has an MIT license. Possibly there is a way to integrate it with the current CodeBerg UI such that if one clicks an .ipynb file the preview shows the rendered output, rather than the currently shown json? In a similar way as the UI now shows rendered markdown. Seems like we cannot use it, https://github.com/jirutka/ipynb2html/issues/40

Which is understandable, yet highly unfortunate, because now I feel like I need to stay with GitHub that does have this preview function.

Which is understandable, yet highly unfortunate, because now I feel like I need to stay with GitHub that does have this preview function.

The structure of the ipynb file is very simple, at least for basic outputs. Writing a custom converter should not be difficult. We can sanitize or disallow HTML parts at all (render as text) if needed.

I previously wrote a converter for my personal static blog engine. Note that here I did not sanitize HTML since .ipynb for me is just another format for a web page. Here is an example html and the ipynb on github.

The structure of the ipynb file is very simple, at least for basic outputs. Writing a custom converter should not be difficult. We can sanitize or disallow HTML parts at all (render as text) if needed. I previously wrote [a converter](https://github.com/hellman/sito/blob/master/modules/ipynb.py) for my personal static blog engine. Note that here I did not sanitize HTML since .ipynb for me is just another format for a web page. Here is an [example html](https://affine.group/writeup/2024-08-idekCTF-summertime) and the [ipynb on github](https://gist.github.com/hellman/6e2b2e417fd93716da399e3e5aec0753).

@bshtmichielsen wrote in #671 (comment):

Which is understandable, yet highly unfortunate, because now I feel like I need to stay with GitHub that does have this preview function.

As a workaround I have a simple script that provides the preview (it must be executed before any commit). All notebook(s) links in my README.md point to the preview file which can be located alongside the notebook or can be referenced in a proper README.md (the later was my chosen solution in the example project)

@bshtmichielsen wrote in https://codeberg.org/Codeberg/Community/issues/671#issuecomment-7389799: > Which is understandable, yet highly unfortunate, because now I feel like I need to stay with GitHub that does have this preview function. As a workaround I have a simple [script](https://codeberg.org/jalbiero/top_prog_lang_salaries_arg/src/branch/main/scripts/make-notebook-preview.sh) that provides the preview (it must be executed before any commit). All notebook(s) links in my [README.md](https://codeberg.org/jalbiero/top_prog_lang_salaries_arg/src/branch/main/README.md) point to the preview file which can be located alongside the notebook or can be referenced in a proper [README.md](https://codeberg.org/jalbiero/top_prog_lang_salaries_arg/src/branch/main/src) (the later was my chosen solution in the example project)

Perhaps there is a javascript library that can convert ipynb files to markdown on the client side (similar to the to html converter that we mentioned earlier) and then have the markdown previewer display it in Codeberg? Roughly what @jalbiero does manually. That also removes the security concerns from the converter that outputs to html, because the markdown output would have no executable code.

Perhaps there is a javascript library that can convert ipynb files to markdown on the client side (similar to the to html converter that we mentioned earlier) and then have the markdown previewer display it in Codeberg? Roughly what @jalbiero does manually. That also removes the security concerns from the converter that outputs to html, because the markdown output would have no executable code.

Is there some news on this? I just migrated from github to codeberg. From discussion it seems like a server resources issue...?

Is there some news on this? I just migrated from github to codeberg. From discussion it seems like a server resources issue...?

I don't see why isolating user defined HTML will be any issue on the server side, unless there are active objects or scripts. Github markdown rejects active scripts rendered inside user html and markdown. Using nbviewer in some way should solve this.

I don't see why isolating user defined HTML will be any issue on the server side, unless there are active objects or scripts. Github markdown rejects active scripts rendered inside user html and markdown. Using nbviewer in some way should solve this.

If this helps anyone, I made a browser extension which will somewhat render the notebooks from a repository. :)

https://codeberg.org/rashomon/jupyter-berg

If this helps anyone, I made a browser extension which will somewhat render the notebooks from a repository. :) https://codeberg.org/rashomon/jupyter-berg
Sign in to join this conversation.
No Branch/Tag specified
main
No results found.
Labels
Clear labels
accessibility

Reduces accessibility and is thus a "bug" for certain user groups on Codeberg.
bug

Something is not working the way it should. Does not concern outages.
bug
infrastructure

Errors evidently caused by infrastructure malfunctions or outages
Codeberg

This issue involves Codeberg's downstream modifications and settings and/or Codeberg's structures.
contributions welcome

Please join the discussion and consider contributing a PR!
docs

No bug, but an improvement to the docs or UI description will help
duplicate

This issue or pull request already exists
enhancement

New feature
infrastructure

Involves changes to the server setups, use `bug/infrastructure` for infrastructure-related user errors.
legal

An issue directly involving legal compliance
licence / ToS

involving questions about the ToS, especially licencing compliance
please chill
we are volunteers

Please consider editing your posts and remember that there is a human on the other side. We get that you are frustrated, but it's harder for us to help you this way.
public relations

Things related to Codeberg's external communication
question

More information is needed
question
user support

This issue contains a clearly stated problem. However, it is not clear whether we have to fix anything on Codeberg's end, but we're helping them fix it and/or find the cause.
s/Forgejo

Related to Forgejo. Please also check Forgejo's issue tracker.
s/Forgejo/migration

Migration related issues in Forgejo
s/Pages

Issues related to the Codeberg Pages feature
s/Weblate

Issue is related to the Weblate instance at https://translate.codeberg.org
s/Woodpecker

Woodpecker CI related issue
security

involves improvements to the sites security
service

Add a new service to the Codeberg ecosystem (instead of implementing into Gitea)
upstream

An open issue or pull request to an upstream repository to fix this issue (partially or completely) exists (i.e. Gitea, Forgejo, etc.)
wontfix

Codeberg's current set of contributors are not planning to spend time on delegating this issue.
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
9 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Codeberg/Community#671
Reference in a new issue
Codeberg/Community
No description provided.
Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?