-
Notifications
You must be signed in to change notification settings - Fork 171
Implement Alignment Support for Custom Extensions in Encoder/Decoder #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
All modified and coverable lines are covered by tests ✅
Project coverage is 98.10%. Comparing base (
3b6ef80
) to head (c7e0d03
).
Report is 27 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@ ## main #245 +/- ## ========================================== + Coverage 98.07% 98.10% +0.02% ========================================== Files 16 16 Lines 1092 1106 +14 Branches 249 251 +2 ========================================== + Hits 1071 1085 +14 Misses 21 21
☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.
Hi, @EddiG. Thank you for your contribution.
This PR could boot performance in particular cases, but extending the MessagePack format might break compatibility with other MessagePack processors.
Padding is okay, but changing Decoder might not be a good idea. Could you please re-consider the format that does not affect Decoder in order not to break the compatibility?
Hi @gfx
I agree that breaking compatibility is possible in the current state.
Let me go ahead and write down my considerations so far.
There are two places to add the padding I see.
- Before the type, so that the final format is [padding] + [extType] + [data]
- After the type, so that the final format is [extType] + [padding] + [data]
In the first case, we must be sure that the padding byte doesn't intersect with the other extension type bytes.
The byte 0xc1
was chosen due to its special meaning in the msgpack format. I understand though that I could misinterpret the meaning of that byte and de facto it could be used broadly as an extension type. In that case, users who use that particular byte to register their extension codec will be affected by changes introduced in that PR.
To protect them from the compatibility issue we can extend the DecoderOptions class with the useAlignment
boolean option. That option is false
by default and does not change the current behaviour of the decoder. To enable the alignment logic in the decoder user must enable it explicitly like in the code below.
const decoded = decode(encoded, { extensionCodec, useAlignment: true });
In the second case, we must be sure that we do not mess up the padding and data. So far I see only one way to do that, it is by adding an extra byte that tells us the number of the padding bytes we should offset to reach the actual data. That solution means that we add an extra byte for any data encoded with the extension codec with enabled alignment even if the data is already aligned.
The implementation of that solution could be done in different ways. I can think of two of them.
One implementation could allow utilisation of the encode/decode functions of the extension codec to make the encoding/decoding with the alignment. That solution requires having access to the current position of the final buffer that is filling in (const dataPos = this.pos + 1;
) not only in the Encoder but in the encode
function of the extension codec. Due to the current implementation of the Encoder
, ExtData
and ExtensionCodec
it involves significant change.
Another implementation could be done as a continuation of the proposed PR. It still requires extra options to protect the current users from compatibility issues.
What do you think?
For me, the ideal solution would be to provide the current position in the filling buffer to the encode
function of the extension codec. In that case, I can calculate how many padding bytes I should add to my data to keep it aligned. Then in the decode function, I skip the padding bytes and build a typed array.
I like it as the API is still clean, with no need for extra parameters such as align
and realignment
. It gives me more freedom to choose the way of the data alignment, such as the padding byte, format, and possible optimizations.
I'll give it a try to verify that it is conceptually doable.
I've proposed another solution here #248
Please, take a look.
Yes, we should conentrait on #248
This update introduces a solution to handle data alignment in custom extensions, ensuring that encoded data is aligned in memory according to specified requirements. The main benefit of this approach is enabling zero-copy deserialization, which significantly enhances performance when working with data types like
Float32Array
that have strict alignment needs.Key Steps:
Alignment Handling in Encoding:
Encoder
class, the code checks if an alignment requirement (align
) is specified for the extension before encoding it.0xc1
, known as noop bytes) are inserted before the extension type to ensure the data starts at an aligned memory position.Padding Skipping in Decoding:
Decoder
class is enhanced to recognize and skip padding bytes (0xc1
) when decoding an extension. This ensures the actual extension type is correctly identified, regardless of any alignment padding added during encoding.Storing Alignment Information:
ExtensionCodec
class is modified to store the alignment requirement for each registered extension. This alignment information is utilized during encoding to ensure that data is correctly aligned in memory.Modification of the
ExtData
Class:ExtData
class is extended to include an optionalalign
property. This property stores the alignment requirement for each extension, allowing the encoder to handle data alignment appropriately.