Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(builder): strip UTF-8 BOM from .ino sources before preprocessing #2983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
cmaglie merged 3 commits into arduino:master from ritesh006:fix/strip-bom-ino
Sep 25, 2025

Conversation

Copy link
Contributor

@ritesh006 ritesh006 commented Aug 24, 2025
edited
Loading

Arduino CLI: Strip UTF‐8 BOM from .ino before preprocessing

Summary

When a sketch .ino is saved as UTF-8 with BOM, the three BOM bytes (EF BB BF) reach the compiler and cause:

stray '357円' in program
stray '273円' in program
stray '277円' in program

This PR strips the BOM at read-time so the merged .cpp and any copied sources are clean.

Refs: #3015


Please check if the PR fulfills these requirements

See how to contribute

  • The PR has no duplicates (please search among the Pull Requests before creating one)
  • The PR follows our contributing guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • UPGRADING.md has been updated with a migration guide (for breaking changes)
  • configuration.schema.json updated if new parameters are added.

What kind of change does this PR introduce?

Bug fix — make the CLI robust to UTF-8 BOM at the start of .ino and additional files.


What is the current behavior?

  • If a .ino is saved as UTF-8 with BOM, the BOM bytes are preserved into the merged .cpp, leading to compiler errors (stray '357円' / '273円' / '277円').
  • This matches IDE issue arduino/arduino-ide#2752 and appears "random" to users because some editors silently add a BOM; a blank line after an initial block comment makes it easy to reproduce.

What is the new behavior?

  • On reading sketch sources:
    • Strip a leading UTF-8 BOM before merging .ino files.
    • Strip a leading UTF-8 BOM when copying additional files.
  • Result: BOM-prefixed sketches compile successfully. No behavior change for normal UTF-8 (no BOM) files.

Implementation notes

  • Added helper:
func stripUTF8BOM(b []byte) []byte {
 if len(b) >= 3 && b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF {
 return b[3:]
 }
 return b
}
  • Applied in:
    • internal/arduino/builder/sketch.gosketchMergeSources() (via getSource(...))
    • internal/arduino/builder/sketch.gosketchCopyAdditionalFiles(...)

Test plan (manual)

  1. Create a minimal sketch:
/* test */
int x = 42;
void setup(){ Serial.begin(9600); }
void loop(){ Serial.println(x); delay(1000); }
  1. Save with BOM (VS Code → Save with Encoding → UTF-8 with BOM).
  2. Compile:
arduino-cli compile -b arduino:avr:uno <sketch-folder>

Before this patch: fails with:

stray '357円' in program
stray '273円' in program
stray '277円' in program

After this patch: succeeds.

Control: Save as UTF-8 (no BOM) → succeeds (unchanged).

(Optional follow-up): add an automated test by placing a BOM-prefixed .ino in testdata and asserting the merged output compiles.


Does this PR introduce a breaking change?

No. The change only strips a BOM if present; no impact on existing UTF-8 (no BOM) files or other encodings.


Other information

  • The issue was reported in the IDE repo, but the root cause is in the CLI merge/preprocess path. Fixing it here resolves the problem for the IDE once it bundles a CLI containing this patch.
  • Performance/overhead is negligible (constant-time 3-byte check per file).

When a sketch .ino is saved as UTF-8 *with BOM*, the BOM bytes (EF BB BF)
reach the compiler and cause:
 stray '357円' in program
 stray '273円' in program
 stray '277円' in program
This strips the BOM at read-time so the merged .cpp and copied sources are clean.
Refs: arduino/arduino-ide#2752
Copy link

CLAassistant commented Aug 24, 2025
edited
Loading

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Aug 24, 2025
edited
Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.36%. Comparing base (08ff7e2) to head (5a27541).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@ Coverage Diff @@
## master #2983 +/- ##
=======================================
 Coverage 68.35% 68.36% 
=======================================
 Files 241 241 
 Lines 22724 22731 +7 
=======================================
+ Hits 15534 15541 +7 
 Misses 5992 5992 
 Partials 1198 1198 
Flag Coverage Δ
unit 68.36% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@per1234 per1234 added type: enhancement Proposed improvement topic: code Related to content of the project itself topic: build-process Related to the sketch build process labels Aug 24, 2025
Copy link
Contributor Author

ritesh006 commented Aug 26, 2025
edited
Loading

Hi @per1234
I’ve added unit tests for the new stripUTF8BOM function, and all checks are passing.
This PR should now be ready for review. Could you please take a look? Thanks!

cmaglie reacted with thumbs up emoji

@cmaglie cmaglie merged commit 20e315c into arduino:master Sep 25, 2025
101 checks passed
@cmaglie cmaglie added the conclusion: resolved Issue was resolved label Sep 25, 2025
@per1234 per1234 added type: imperfection Perceived defect in any part of project and removed type: enhancement Proposed improvement labels Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

@cmaglie cmaglie cmaglie approved these changes

Assignees
No one assigned
Labels
conclusion: resolved Issue was resolved topic: build-process Related to the sketch build process topic: code Related to content of the project itself type: imperfection Perceived defect in any part of project
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Spurious compilation failure when sketch code file has "UTF-8 with BOM" encoding

AltStyle によって変換されたページ (->オリジナル) /