Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

add fast inference tutorial #1948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yiheng-wang-nv wants to merge 15 commits into Project-MONAI:main
base: main
Choose a base branch
Loading
from yiheng-wang-nv:add-infer-accelerate-tutorial

Conversation

@yiheng-wang-nv
Copy link
Contributor

@yiheng-wang-nv yiheng-wang-nv commented Feb 28, 2025
edited
Loading

Part of #1865 .

Description

A few sentences describing the changes proposed in this pull request.

Checks

  • Avoid including large-size files in the PR.
  • Clean up long text outputs from code cells in the notebook.
  • For security purposes, please check the contents and remove any sensitive info such as user names and private key.
  • Ensure (1) hyperlinks and markdown anchors are working (2) use relative paths for tutorial repo files (3) put figure and graphs in the ./figure folder
  • Notebook runs automatically ./runner.sh -t <path to .ipynb file>

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
Copy link

Check out this pull request on ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

ericspod commented Mar 2, 2025

This addresses #1865 I assume.

@yiheng-wang-nv yiheng-wang-nv marked this pull request as ready for review March 7, 2025 04:54
Copy link
Contributor Author

HI @ericspod @Nic-Ma @KumoLiu , the tutorial is almost ready, could you help to review the layout first?
I will add analyze and visualization results later

Copy link
Contributor Author

HI @ericspod @Nic-Ma @KumoLiu , the tutorial is almost ready, could you help to review the layout first? I will add analyze and visualization results later

tutorial is ready.

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
@@ -0,0 +1,635 @@
{
Copy link
Contributor

@KumoLiu KumoLiu Mar 19, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can also include the .nii.gz benchmark result in the notebook since the original data is nii.gz format.


Reply via ReviewNB

Copy link
Contributor Author

@yiheng-wang-nv yiheng-wang-nv Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KumoLiu , thanks for the suggestion. .nii.gz files have to be decompressed in CPU, thus using GDS may not have acceleration. I added a section to introduces the limitations on each feature, could you help to review the updates? Thanks!

Nic-Ma reacted with thumbs up emoji
yiheng-wang-nv and others added 3 commits March 20, 2025 11:41
...pynb
Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Signed-off-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com>
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
Copy link
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the detailed tutorial, it overall looks good to me.
Do you plan to add the INT8/INT4 quantization in this PR or a separate PR later?

Thanks.

Copy link
Contributor Author

acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb

Hi @Nic-Ma , thanks for the suggestion. I think we can consider adding quantization in a separate PR. Before adding it, it may need some time to:

  1. prove it's faster
  2. prove there will not have too much accuracy loss.
Nic-Ma reacted with thumbs up emoji

Copy link
Contributor

Nic-Ma commented Mar 26, 2025

Plan sounds good to me.

Thanks.

Comment on lines +399 to +402
"```bash\n",
"for benchmark_type in \"original\" \"trt\" \"trt_gpu_transforms\" \"trt_gds_gpu_transforms\"; do\n",
" python run_benchmark.py --benchmark_type \"$benchmark_type\"\n",
"done\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could instead put this into a cell with %%bash at the top to allow users to run command, or you could do it with Python more directly for those that don't have bash:

for benchmark_type in ("original", "trt", "trt_gpu_transforms", "trt_gds_gpu_transforms"):
 !python run_benchmark.py --benchmark_type {benchmark_type}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also state here that the script contains the same code as what's in this notebook, running it will generate a csv with the results for each type, but if the user wants to run the benchmark here in this notebook then can run the following cell with the commented lines uncommented.

Copy link
Member

I've looked at the tutorial and it all looks good to me, however I am wondering about what the results show. It seems to me that GDS has the most impact so the example is just IO bound, using TRT or not has little impact. This is good to demonstrate how to overcome such issues, but it seems to me that the model is so small that it's not relevant to the benchmarks you're showing. If you used a much larger model with many more parameters the actual inference time itself would be significant. Since the inference results aren't considered you could just use a randomly initialised model so you don't need to load pre-trained weights. Thoughts?

Copy link
Contributor Author

I've looked at the tutorial and it all looks good to me, however I am wondering about what the results show. It seems to me that GDS has the most impact so the example is just IO bound, using TRT or not has little impact. This is good to demonstrate how to overcome such issues, but it seems to me that the model is so small that it's not relevant to the benchmarks you're showing. If you used a much larger model with many more parameters the actual inference time itself would be significant. Since the inference results aren't considered you could just use a randomly initialised model so you don't need to load pre-trained weights. Thoughts?

Thanks @ericspod for the suggestions, I will use a more suitable model to show these features, and then update the PR

Copy link
Member

Hi @yiheng-wang-nv we'd like to get this tutorial through, do we have any progress on using a different model to demonstrate speedup better? Thanks!

Copy link
Contributor Author

Hi @yiheng-wang-nv we'd like to get this tutorial through, do we have any progress on using a different model to demonstrate speedup better? Thanks!

Hi @ericspod , thanks for the notice. Sorry for late reply, I will do some updates later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@ericspod ericspod ericspod left review comments

@KumoLiu KumoLiu KumoLiu left review comments

@Nic-Ma Nic-Ma Nic-Ma approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /