Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

EddiG wants to merge 2 commits into msgpack:main from EddiG:main

Closed

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

EddiG wants to merge 2 commits into msgpack:main from EddiG:main

Conversation

EddiG

Copy link

Contributor

@EddiG EddiG commented Aug 23, 2024

This update introduces a solution to handle data alignment in custom extensions, ensuring that encoded data is aligned in memory according to specified requirements. The main benefit of this approach is enabling zero-copy deserialization, which significantly enhances performance when working with data types like Float32Array that have strict alignment needs.

Key Steps:

Alignment Handling in Encoding:
- In the Encoder class, the code checks if an alignment requirement (align) is specified for the extension before encoding it.
- The necessary padding is calculated to align the data properly in memory. Padding bytes (0xc1, known as noop bytes) are inserted before the extension type to ensure the data starts at an aligned memory position.
Padding Skipping in Decoding:
- The Decoder class is enhanced to recognize and skip padding bytes (0xc1) when decoding an extension. This ensures the actual extension type is correctly identified, regardless of any alignment padding added during encoding.
Storing Alignment Information:
- The ExtensionCodec class is modified to store the alignment requirement for each registered extension. This alignment information is utilized during encoding to ensure that data is correctly aligned in memory.
Modification of the ExtData Class:
- The ExtData class is extended to include an optional align property. This property stores the alignment requirement for each extension, allowing the encoder to handle data alignment appropriately.

@EddiG


 Support data alignment in the custom extensions

2796aef

@EddiG EddiG force-pushed the main branch from 5c88f17 to 2796aef Compare

August 23, 2024 10:56

@gfx


 Merge branch 'main' into main

c7e0d03

@codecov-commenter

Copy link

codecov-commenter commented Jan 26, 2025 •

edited

Loading

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.10%. Comparing base (3b6ef80) to head (c7e0d03).
Report is 27 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@ Coverage Diff @@
## main #245 +/- ##
==========================================
+ Coverage 98.07% 98.10% +0.02% 
==========================================
 Files 16 16 
 Lines 1092 1106 +14 
 Branches 249 251 +2 
==========================================
+ Hits 1071 1085 +14 
 Misses 21 21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gfx

Copy link

Member

gfx commented Jan 26, 2025

Hi, @EddiG. Thank you for your contribution.

This PR could boot performance in particular cases, but extending the MessagePack format might break compatibility with other MessagePack processors.

Padding is okay, but changing Decoder might not be a good idea. Could you please re-consider the format that does not affect Decoder in order not to break the compatibility?

@EddiG

Copy link

Contributor Author

EddiG commented Jan 26, 2025

Hi @gfx

I agree that breaking compatibility is possible in the current state.

Let me go ahead and write down my considerations so far.

There are two places to add the padding I see.

Before the type, so that the final format is [padding] + [extType] + [data]
After the type, so that the final format is [extType] + [padding] + [data]

In the first case, we must be sure that the padding byte doesn't intersect with the other extension type bytes.
The byte 0xc1 was chosen due to its special meaning in the msgpack format. I understand though that I could misinterpret the meaning of that byte and de facto it could be used broadly as an extension type. In that case, users who use that particular byte to register their extension codec will be affected by changes introduced in that PR.
To protect them from the compatibility issue we can extend the DecoderOptions class with the useAlignment boolean option. That option is false by default and does not change the current behaviour of the decoder. To enable the alignment logic in the decoder user must enable it explicitly like in the code below.

 const decoded = decode(encoded, { extensionCodec, useAlignment: true });

In the second case, we must be sure that we do not mess up the padding and data. So far I see only one way to do that, it is by adding an extra byte that tells us the number of the padding bytes we should offset to reach the actual data. That solution means that we add an extra byte for any data encoded with the extension codec with enabled alignment even if the data is already aligned.
The implementation of that solution could be done in different ways. I can think of two of them.
One implementation could allow utilisation of the encode/decode functions of the extension codec to make the encoding/decoding with the alignment. That solution requires having access to the current position of the final buffer that is filling in (const dataPos = this.pos + 1;) not only in the Encoder but in the encode function of the extension codec. Due to the current implementation of the Encoder, ExtData and ExtensionCodec it involves significant change.
Another implementation could be done as a continuation of the proposed PR. It still requires extra options to protect the current users from compatibility issues.

What do you think?

@EddiG

Copy link

Contributor Author

EddiG commented Jan 27, 2025

For me, the ideal solution would be to provide the current position in the filling buffer to the encode function of the extension codec. In that case, I can calculate how many padding bytes I should add to my data to keep it aligned. Then in the decode function, I skip the padding bytes and build a typed array.
I like it as the API is still clean, with no need for extra parameters such as align and realignment. It gives me more freedom to choose the way of the data alignment, such as the padding byte, format, and possible optimizations.

I'll give it a try to verify that it is conceptually doable.

@EddiG EddiG mentioned this pull request

Jan 27, 2025

Allow the data alignment to support zero-copy decoding #248

Merged

@EddiG

Copy link

Contributor Author

EddiG commented Jan 27, 2025

I've proposed another solution here #248
Please, take a look.

@gfx

Copy link

Member

gfx commented Feb 3, 2025 •

edited

Loading

Yes, we should conentrait on #248

@gfx gfx closed this

Feb 3, 2025

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Uh oh!

Conversation

@EddiG EddiG commented Aug 23, 2024

Key Steps:

Uh oh!

codecov-commenter commented Jan 26, 2025 •

edited

Loading

Uh oh!

Codecov Report

Uh oh!

gfx commented Jan 26, 2025

Uh oh!

EddiG commented Jan 26, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

gfx commented Feb 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Implement Alignment Support for Custom Extensions in Encoder/Decoder #245

Uh oh!

Conversation

@EddiG EddiG commented Aug 23, 2024

Key Steps:

Uh oh!

codecov-commenter commented Jan 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfx commented Jan 26, 2025

Uh oh!

EddiG commented Jan 26, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

EddiG commented Jan 27, 2025

Uh oh!

gfx commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 26, 2025 •

edited

Loading

gfx commented Feb 3, 2025 •

edited

Loading