-
Notifications
You must be signed in to change notification settings - Fork 330
Client X I/O #2689
-
|
VERY VERY LONG POST ALERT Purpose:Starting this discussion thread to brainstorm the future of SqlClientX I/O to be performant, while adhering to the various requirements Transport requirements
TDC protocol basicsSql server clients and server exchange information as messages. Messages are a logical concept, where the last packet of the message has a status field with value of EOM which states that its the last packet of the message. Message
Outgoing Packet Header
The rest of the bytes in the packet follow the TDS protocol spec and structure is dependant on the message being transmitted. Incoming packet headerThe incoming packet header follows a similar structure, but some unused bytes in outgoing packet, end up being meaningful
Packets are buffers which have a pre-negotiated size with the server. The packet size is 4096 till negotiated at the end of login. After the login negotiation, all the packets on the connection, except the last packet of a message must be the negotiated size. For simplicity of discussion we will use a packet size of 8k, but this is modifiable in the connection string. Details of values is https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.packetsize?view=netframework-4.8.1 WritesWhile sending out the data over TDS, the client needs to fill in the payload in the TDS packet, and set the appropriate header bytes and flush the packet to the network. In case of writes, the Message type, status, length of data, packet number need to be adjusted for every packet. ReadsIn case of reads, the client needs to read the packet, and parse out the header to understand how much data is expected. In case of reading the packets, the client needs to assume that the complete TDS packet may not be available in a single transport read, and needs to account for partial packet being read. The following scenarios are possible:
StreamsTDS Write streamIn the current implementation of ClientX, writing the TDS packet is modeled as a When a TDS Read StreamIn the current implementation of ClientX, the readers of data from the network, calls into the stream to get the data as bytes, the consumers follow the TDS protocol, and are expected to request the right amount of data. They read the complete data of the stream and follow the TDS protocol structure. The stream takes care of making sure that it can account for split packets. If the stream has the data available in the buffer, it returns it, else it will read from the underlying transport and read the requested number of bytes, and return them to the consumer. Motivation behind streamsThe idea of using streams was to make the consumers unaware of the nuances of TDS packet header (for writes) and not having to worry about split packets while reading from the network. Streams offer a nice layering mechanism too, where the TDS stream can be layered with SSL Stream, which can be layered with SSL over TdsStream (for TDS 7.4), and then the actual transport stream (Network/NamedPipes). This separates the responsibilities of the stream. There are also intentions of adding a layer of MARS stream under the TDS stream, which will be backed by a Muxer and Demuxer which has a 1x1 relationship with the SSL/Transport stream. Each TDS stream would get its own MARS stream, which will write / read to/from muxer-demuxer which will understand the nuances of MARS over a single physical transport stream. This was to further separate the concerns and have a layered approach in adding features to ClientX. Deviation from streamThere are new methods on TdsStream in addition to what Problem
Potential solution
Fig 1: Current implementation in ClientX Fig 2: Potential improvement PipelinesPipelines in principle seem great for the usecase of SqlClient. Pipelines let the consumers use the buffers directly, which is an important takeaway from the above discussion post, however we have the following hurdles with pipelines for which solutions are needed.
Ask from communityBased on the above, please ask more clarifying questions/provide suggestions / improvements to the design/alternative ways of looking at the problem statement. |
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 5 comments 17 replies
-
Please tag others, who may have been active on the PRs for ClientX
@cheenamalhotra @benrr101 @samsharma2700 @mdaigle @VladimirReshetnikov
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Thanks for this @saurabh500. I agree with the way the streams are laid out - whether we use streams or pipelines, I think it's pretty much the only sensible way to separate responsibilities.
We're pretty much on the same page with problem 1. In an earlier PR, I've commented on TdsWriteStream.WriteAsync - if it currently receives a large byte array, it'll copy a packet-sized chunk of the input to the TdsWriteStream buffer, then flush the buffer out to the underlying transport, then repeat. To pull that thought from the PR (and the TODO in the method) to this thread, my view is that for inputs which exceed the size of the TdsWriteStream buffer, we should be writing down to the underlying transport directly, then patching up _writeBufferOffset to cope. This would cut one layer of buffering out, and it doesn't need any design changes.
Writes: buffer
This sounds like a good idea. Having a method which requests a buffer up to a specific size would be helpful, and would eliminate some of the work in TdsBufferAlloc.
Writes: streaming and tiny writes
To speak to the larger design of problem 1, I think part of the underlying issue is that we currently handle multiple sizes of writes identically:
- Tiny writes (i.e. 2-8 byte array)
- Medium writes, such as a 1-2MB byte array
- Large writes - 2-3GB byte array
- Very large writes - byte arrays which don't fit in memory, such as Streams loaded from disk
Our current approach is a decent middle ground; I think most third-party libraries will probably handle IO fairly similarly. The earlier suggestion about directly writing to the underlying transport will reduce execution time for medium and large writes.
To better handle large writes (and to handle very large writes at all) we'll need to use Stream.CopyTo[Async], or manually read from the source stream into the appropriate buffer and write that to TdsWriteStream. I'd suggest doing this in TdsWriter, but depending where the TdsWriteStream buffer is exposed we could do this upstream.
The way we handle tiny writes is difficult though. Even writing an Int16 could theoretically cross TDS packets, incurring network IO. In the async case, this means that we're forced to await each WriteShortAsync in sequence. Although I'm not sure whether or not ValueTask allocates if it completes synchronously, (I don't think it does) there are potentially still state machines to deal with. This could be unpleasant if we're sending a large number of parameters, sending table-valued parameters (or something else which generates a large number of small writes.)
I've alluded to this in the past, but I think we should aim to completely eliminate explicit tiny writes, instead trying to build up a structure in memory and writing that. This could be slightly more complex if we're going to support a table-valued parameter with a varbinary field streamed in from a file (because we're dealing with a combination of in-memory bytes and a Stream) but the net result is that we reduce the number of awaits - ideally, to a single WriteAsync.
Reads
I've not got any firm opinions here - I think we're hedged into performing many tiny reads because of the TDS protocol. I'd like to avoid that where possible; if we receive row metadata which indicates that every column in the recordset is fixed-length and non-nullable then can we optimise for a single fixed-length read?
Streaming data (whether byte arrays or strings) remains important here too. When presented with a varbinary(N), we might need to return a specific type of unseekable stream to the client which guarantees that we'll read N bytes from the network (but never more than that.) The implications of this would probably be difficult though - if we return a stream to the client, the client retrieves the next field and then reads from the stream, the transport stream will already have moved past the end of the varbinary data. There'd need to be some buffering in that stream instance, but this'd naturally cause memory usage to balloon with large amounts of data.
General streaming IO
When I normally think of streaming data, I usually think about byte arrays! When streaming large varchar/nvarchar/xml columns though, there's the overhead of converting from a byte array to a string. Encoding.GetString typically only works with a specific range of bytes stored in memory, and this can be difficult if we're dealing with Unicode characters - each character is two bytes, so we could have a character "split" between TDS packets.
This is purely a reminder: when decoding a string, we should use the relevant System.Text.Decoder or System.Text.Encoder classes. These can handle decoding from multiple byte buffers.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
One idea I had percolating around when I was considering how to handle MARS read streams was to effectively link together byte arrays to allow processing without the need to copy. I wonder if we could utilize something like this to better address one of the sorta problems you mentioned.
Eg, when we write an int:
- Get byte[]
a - Write int bytes into
a - Call WriteAsync(
a) - Copy from
ainto stream bufferb - When
bwould be full, flush contents ofb
Introducing a slightly more complex data structure, we could have:
- Rent a special returnable byte[]
a - Write in bytes int
a - Call WriteAsync(
a) - Append
ato a linked list structure of bytes to write - When total length of bytes to write meets or exceeds the length of a packet, the linked list is scanned and dumped to the stream
- As each element in the linked list is written, it is disposed, returning it to the array pool it was allocated from.
The main benefit of this is that instead of copying a to b to stream, there is only one copy a to stream. The main drawback of this is that there would be more calls to stream.WriteAsync.
Beta Was this translation helpful? Give feedback.
All reactions
-
Avoid byte arrays wherever possible. They lock you into arrays not spans. So, how about
let tds = static helper
let a = some value
let stream = writing primitive
let length = tds.get_byte_length_of(a)
let span= stream.get_span(length)
let written_bytes_count = tds.write_bytes(a, span)
stream.advance(span, written_bytes_count)
This pattern allows you to start with a single packet sized (4, or 8k usually) byte[] and reserve the first 8 ( or more if you know you need an smux header too) bytes of it.
When something requests a buffer you give it back a span from the buffer most of the time. When advance is called you can do a simple ref equality check on the span to make sure it's the current head you rented and then increment the head counter by the written bytes count.
When something requests a buffer longer than the amount of space you've got available in the packet you can allocate a temporary array and pass it back to the caller. Then when they call advance and pass you back the span you'll be able to tell if wasn't to your original buffer and work out how much data to copy from it to fill your current buffer. This is the point where you'd dispatch a packet.
Then create a new buffer, reserve header, copy the remainder of the temporary buffer into it.
Doing it this way has two really useful properties.
- it abstracts away all the logic needed to handle partial buffers so you don't have to handle it in WriteInt, WriteLong etc.
- it removes the need to handle packetization at higher layers and makes it entirely handled by the output writer.
Beta Was this translation helpful? Give feedback.
All reactions
-
I like the approach, with a caveat that we'd want to think about how we write large byte spans. Perhaps the get_span / advance approach should be limited to around 16k-32k bytes? This is enough to cross 4-5 packets, which should be plenty for most peoples' needs. Once we cross packet boundaries by sending a large (e.g. 10MB) byte span, the allocation of the temporary array can double memory usage. If we're in the middle of a partial packet, even a well-meaning client which chunks its writes by the packet size would trigger this behaviour.
One way to optimise this could be to leak the packet buffer state to the client with a get_remaining_buffer_span method. Writing a large file could be a matter of:
let file = new byte[int.MaxValue]
let position = 0;
while (position < file.length)
{
let span = stream.get_remaining_buffer_span();
file.AsSpan(position, span.Length).CopyTo(span);
stream.advance(span, span.Length);
position += span.Length;
}
I think we'd end up with zero temporary buffer allocations outside of stream with this method. The same method could also apply to large strings via Encoder.GetBytes.
In either event, the method we're currently talking about also reminds me a little of ReadOnlySequence and IBufferWriter, so there might be something in the design / implementation of this which helps.
Beta Was this translation helpful? Give feedback.
All reactions
-
I like the approach, with a caveat that we'd want to think about how we write large byte spans. Perhaps the get_span / advance approach should be limited to around 16k-32k bytes?
For non-fixed length data you'd need to be able to ask the stream for the largest buffer it could support, a GetRemainingBuffer function if you like, and then use the appropriate Encoder or Decoder to translate the data directly from the input into the outgoing buffer. Fill the buffer, advance, get next buffer and repeat. We shouldn't really need temporary buffers for non-fixed length content.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
@Wraith2 In my example I tend to use byte[] and span interchangeably. I understand there's semantic differences between the two, but ultimately a span has to be backed by concrete memory somewhere. You could just as easily build a linked list of spans as a linked list of byte[]s. The main benefit of my idea is that it doesn't rely on leasing memory, limited by the size of the buffer. Instead, it allows for free-form construction of a message and sending it to the underlying stream directly.
To address the two benefits of your idea:
abstracts away logic to handle partial buffers
That would still be handled by the stream or the writer. If we have a block of memory in the linked list, we packetize it and send however much will fit in the packet. Then we start writing the rest of the memory block before disposing it from the linked list.
removes the need to handle packetization at higher levels
That's been part of the design since the beginning - the stream handles the packetization of what's written to it. But the question is whether it needs to write to a buffer before writing to the underlying output stream.
One thing to consider here is that both of these ideas deviate from the goal of using a Stream. If we keep making changes to how things should be buffered, I think we may need to take a slight step back and double check what belongs in the TdsStream and what belongs in the TdsWriter. The TdsStream should aim to adhere to the Stream interface while we can pretty much build whatever we want into the TdsWriter.
Beta Was this translation helpful? Give feedback.
All reactions
-
As the PR is now open, can we excercise the above concerns (and maybe mine below, which in part overlap):
Lines 14 to 32 in 84bc56d
I still have a few questions (this is generic, I'll just take this token as an example):
- How many of the tokens (any token) exist per typical sql operation, I'm a bit concerned about the amount of objects created
- Are there any tokens with only a single Read operation? because the whole ValueTask as return type is kinda moot, if there is always an await in there, the same applies to the ValueTasks in TdsReader.
Honestly, the amount of awaits here is frightening, we're talking about a protocol parser after all, there should be 0 (maybe 1) awaits in here besides a possible await for VarCharAsync read as you know the minimum bytes required to build this token.
Beta Was this translation helpful? Give feedback.
All reactions
-
It looks like my README doesn't list it, but SequenceReader<T> is part of the library, in an API compatible way so that we type forward to the .NET version when it's available.
https://github.com/dotnet/Nerdbank.Streams/blob/main/src/Nerdbank.Streams/SequenceReader.cs
Beta Was this translation helpful? Give feedback.
All reactions
-
We are discussing the types that need to be read in case of Length prefixed data, where the length of complete data will be prefixed before the data stream.
However there is a case of partial length prefixed data, where we know the incoming datatype, but the server sends data in chunks.
The data stream begins with a Long Constant which signifies that the length is not known unfront.
Each chunk has the length prefixed to it. The last chunk is of length 0, which causes the clients to stop looking for data, and then assemble all the chunks into the datatype.
[Constant][chunk_len][array of bytes of chunk len][chunk_len][array of bytes of chunk len]...[chunk_len][array of bytes of chunk len][0]
XML payload may be sent this way, which is ultimately a string, which will need to be assembled from multiple chunks, and then converted to an XmlReader or whatever type the user has requested from the SqlDataReader.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @Tornhoof and @AArnott - that makes sense to me! We might need a slightly different method to maintain support for .NET Framework though. SequenceReader is a ref struct, but these require C# 11, which in turn requires .NET 7.0 and newer.
PLP data's a little trickier to deal with. We can still daisy-chain ReadOnlyMemory<byte> views over the same buffer in a linked list/ReadOnlySequence<byte> (so we're still not copying data around.) If a PLP chunk straddles TDS packets, I'd consider one chunk to be represented by multiple ReadOnlyMemory<byte> instances - one per packet.
Being able to construct a Stream or a TextReader over this would be helpful. We could then load the XmlReader from it, to spare the need to construct a string (in this instance, at least.) I agree on the underlying point though: assembling a byte[] or a string from PLP will likely be a little frustrating and memory-heavy.
If we can make this I/O design work efficiently, it might be a good idea to look at these discussions/PRs with a view to writing performance optimisation guidance for developers using SqlClient.
Beta Was this translation helpful? Give feedback.
All reactions
-
SequenceReader is a ref struct, but these require C# 11, which in turn requires .NET 7.0 and newer.
Not true, actually. You can use the latest version of C# with any version of .NET or .NET Framework. We do it all over the place. C# will disable its own features where necessary based on the target framework and runtime version.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Thanks @AArnott, @Tornhoof - you're right, and I'd just misread the spec. yesterday evening. In that case, we can just use your approach for SequenceReader as-is for both .NET and .NET Framework.
Beta Was this translation helpful? Give feedback.
All reactions
-
@edwardneal ref structs are c# 7.2, you might have mixed that up with ref fields, so you can use ref structs since .NET core 2.0. ref fields just made the implementation of Span a lot cleaner/easier.
For netfx you'd still need something like the type by Andrew, no way around that.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1