💡 [REQUEST] - Add clarification on input tensor shapes for MLP-based vision models #3745

New issue

Open

@UtkarshSingh31

Description

@UtkarshSingh31

UtkarshSingh31

opened

on Jan 26, 2026

🚀 Describe the improvement or the new tutorial

The current tutorial inference example uses an input tensor of shape (N, 28, 28), which is correct for the shown MLP model because it applies nn.Flatten(start_dim=1).

Some users later encounter confusion when transitioning to CNN-based models (nn.Conv2d), which require inputs in (N, C, H, W) format (e.g., (1, 1, 28, 28) for grayscale images).

The example is technically correct as written, but the lack of an explicit clarification can lead learners to incorrectly assume that (N, 28, 28) is a general requirement for vision models.

A short explanatory comment near the inference snippet could help distinguish:

MLP-style models that flatten inputs
CNN-style models that preserve spatial and channel dimensions
The proposed clarification would be documentation-only:
No API changes
No behavior changes
No modification to the existing example code

The intent is to improve conceptual understanding for beginners, especially those progressing from fully connected networks to convolutional networks.

I would appreciate feedback on whether adding a brief note clarifying this distinction would be acceptable, and if so, whether the suggested wording aligns with tutorial style guidelines.

If this clarification sounds reasonable, I’d be happy to submit a small documentation PR incorporating it.

Existing tutorials on this topic

No response

Additional context

No response

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 [REQUEST] - Add clarification on input tensor shapes for MLP-based vision models #3745

Description

🚀 Describe the improvement or the new tutorial

Existing tutorials on this topic

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions