saurabh500
Jul 19, 2024
Maintainer

VERY VERY LONG POST ALERT

Purpose:

Starting this discussion thread to brainstorm the future of SqlClientX I/O to be performant, while adhering to the various requirements

Transport requirements

Named Pipes
Shared Memory (it is essentially Named pipe with a special NP path)
TCP

TDC protocol basics

Sql server clients and server exchange information as messages.
Messages contain a series of Packets.

Messages are a logical concept, where the last packet of the message has a status field with value of EOM which states that its the last packet of the message.

Message

Packet 1	Packet 2	Packet 3	...	Last Packet
[x, 0x04,x,x,x,x,x]	[x, 0x04,x,x,x,x,x]	[x, 0x04,x,x,x,x,x]	...	[x, 0x01,x,x,x,x,x]

Outgoing Packet Header

Message Type (Prelogin/login etc)	Status (0x1 signifies last packet of message)	Length (2 bytes)	Unused byte	Unused byte	Packet number	Unused byte (window)
1/2/3/4	0x01 0x04 0x02	Length including 8 bytes of header	...	...	1...Count	...

The rest of the bytes in the packet follow the TDS protocol spec and structure is dependant on the message being transmitted.

Incoming packet header

The incoming packet header follows a similar structure, but some unused bytes in outgoing packet, end up being meaningful

Message Type (Prelogin/login etc)	Status (0x1 signifies last packet of message)	Length (2 bytes)	SPID (2 bytes)	Packet number	Window (unused)
1/2/3/4	0x01 0x04 0x02	Length including 8 bytes of header	SPID of connection	1...Count	...

Packets are buffers which have a pre-negotiated size with the server. The packet size is 4096 till negotiated at the end of login.
The default packet size is 8000 bytes in connection string

After the login negotiation, all the packets on the connection, except the last packet of a message must be the negotiated size.

For simplicity of discussion we will use a packet size of 8k, but this is modifiable in the connection string. Details of values is https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.packetsize?view=netframework-4.8.1

Writes

While sending out the data over TDS, the client needs to fill in the payload in the TDS packet, and set the appropriate header bytes and flush the packet to the network.

In case of writes, the Message type, status, length of data, packet number need to be adjusted for every packet.

Reads

In case of reads, the client needs to read the packet, and parse out the header to understand how much data is expected.
Though there is a status bit, it is omitted, and the client rely on the size of data specified in the header to parse the information and parse it according to the TDS protocol.

In case of reading the packets, the client needs to assume that the complete TDS packet may not be available in a single transport read, and needs to account for partial packet being read.

The following scenarios are possible:

Partial packet header read: which means that the number of bytes read are less than 8 bytes. In this case the client needs to atleast have 8 bytes packet header to understand how much more data to expect in the packet.
Full header read, but partial payload read: In this case the client could have enough information to respond to the APIs, but would need to read the rest of the packet eventually.

Streams

TDS Write stream

In the current implementation of ClientX, writing the TDS packet is modeled as a TdsWriteStream. This stream ingests the TDS packet payload. It needs the Type of stream being sent out. If the incoming data spans across TDS packets, then the flushes the packet to the network. While writing data spanning multiple TDS packet, the stream calculates the packet number, the status message and the data length, and send out a Packet with 0x04 status. (Cancellation is not yet implemented, which requires other status bytes to be used)

When a Flush / FlushAsync is called on the stream, it assumes that no more data needs to be sent in this message and sets the EOM status (0x01) of the packet, computes the packet number, the length and flushes the buffer to the network.

TDS Read Stream

In the current implementation of ClientX, the readers of data from the network, calls into the stream to get the data as bytes, the consumers follow the TDS protocol, and are expected to request the right amount of data. They read the complete data of the stream and follow the TDS protocol structure. The stream takes care of making sure that it can account for split packets. If the stream has the data available in the buffer, it returns it, else it will read from the underlying transport and read the requested number of bytes, and return them to the consumer.
The consumer reads the data and uses it or discards it.

Motivation behind streams

The idea of using streams was to make the consumers unaware of the nuances of TDS packet header (for writes) and not having to worry about split packets while reading from the network.

Streams offer a nice layering mechanism too, where the TDS stream can be layered with SSL Stream, which can be layered with SSL over TdsStream (for TDS 7.4), and then the actual transport stream (Network/NamedPipes). This separates the responsibilities of the stream. There are also intentions of adding a layer of MARS stream under the TDS stream, which will be backed by a Muxer and Demuxer which has a 1x1 relationship with the SSL/Transport stream.

Each TDS stream would get its own MARS stream, which will write / read to/from muxer-demuxer which will understand the nuances of MARS over a single physical transport stream.

This was to further separate the concerns and have a layered approach in adding features to ClientX.

image

Deviation from stream

There are new methods on TdsStream in addition to what System.IO.Stream offers like peek byte, read byte async etc, since each of these methods could result in an I/O operation.

Problem

While the streams are successful at abstracting away the complexities of TDS packet header, they rely on the stream readers to bring in their own buffer to copy the data into. Or the writers need to buffer the data from CLR types to the byte[], which may need e.g. Consider writing an int value to the stream, this needs to be converted to a byte[] before it can be handed down to the stream for writing. This means that some kind of buffer management needs to be incorporated by the writer. Consider writing a string to the TDS stream. This would require that the string be converted to a byte[] first, and then written to the stream. This can either be done with chunked buffers, or this could have been made better by writing the string to the available buffer in the stream itself, flushing it, and repeating with the rest of the string, till the whole string is flushed out, without needing an intermediate buffer.
Async: While doing any reads/write operations, due to the nature of the protocol, the readers/writers may have to do network I/O calls, based on the space available in the buffer while writing and data available in the buffer while reading. We started clientX with a "written for async but usable for sync" philosophy. While the streams lend itself well to this philosophy, it almost always causes a statemachine to be created and executed if the code is written with async await patterns. The alternative is to manage ValueTask/Task and check for its completion or setup continuations. Streams also cause 1 more level of depth in the call stack, likely causing another statemachine to be generated for every async operation, where the data may just need to copied over to the buffer in the stream. However this can be mitigated by moving away from async await pattern

Potential solution

Expose the buffer for the streams directly. Allow the writers/readers of the CLR types to manipulate/read from the buffer if it has space/data available. When the buffer is full, then use write stream to flush the buffer. Writing to buffer when we dont need it to be flushed, would be lot more efficient.
For the "async first and use for sync" approach, The readers and writers will almost always have to have a return type of ValueTask / Task being exposed. However the readers / writers and their consumers need not always use async await and intelligently manage the ValueTask / Task being returned from network reads/writes.

image

Fig 1: Current implementation in ClientX

image

Fig 2: Potential improvement

Pipelines

Pipelines in principle seem great for the usecase of SqlClient. Pipelines let the consumers use the buffers directly, which is an important takeaway from the above discussion post, however we have the following hurdles with pipelines for which solutions are needed.

Sync support. Pipelines are async
Support on NetFx. SqlClient needs to cater to the customers on .Net Framework as well. (Pipelines are available as a Nuget package targeting .NET standard)
Named Pipe support. Do pipelines lend themselves well to non-TCP transport?

Ask from community

Based on the above, please ask more clarifying questions/provide suggestions / improvements to the design/alternative ways of looking at the problem statement.

Replies: 5 comments 17 replies

saurabh500
Jul 19, 2024
Maintainer Author

@Wraith2 @edwardneal @roji

Please tag others, who may have been active on the PRs for ClientX

@cheenamalhotra @benrr101 @samsharma2700 @mdaigle @VladimirReshetnikov

0 replies

edwardneal
Jul 20, 2024

Thanks for this @saurabh500. I agree with the way the streams are laid out - whether we use streams or pipelines, I think it's pretty much the only sensible way to separate responsibilities.

We're pretty much on the same page with problem 1. In an earlier PR, I've commented on TdsWriteStream.WriteAsync - if it currently receives a large byte array, it'll copy a packet-sized chunk of the input to the TdsWriteStream buffer, then flush the buffer out to the underlying transport, then repeat. To pull that thought from the PR (and the TODO in the method) to this thread, my view is that for inputs which exceed the size of the TdsWriteStream buffer, we should be writing down to the underlying transport directly, then patching up _writeBufferOffset to cope. This would cut one layer of buffering out, and it doesn't need any design changes.

Writes: buffer

This sounds like a good idea. Having a method which requests a buffer up to a specific size would be helpful, and would eliminate some of the work in TdsBufferAlloc.

Writes: streaming and tiny writes

To speak to the larger design of problem 1, I think part of the underlying issue is that we currently handle multiple sizes of writes identically:

Tiny writes (i.e. 2-8 byte array)
Medium writes, such as a 1-2MB byte array
Large writes - 2-3GB byte array
Very large writes - byte arrays which don't fit in memory, such as Streams loaded from disk

Our current approach is a decent middle ground; I think most third-party libraries will probably handle IO fairly similarly. The earlier suggestion about directly writing to the underlying transport will reduce execution time for medium and large writes.

To better handle large writes (and to handle very large writes at all) we'll need to use Stream.CopyTo[Async], or manually read from the source stream into the appropriate buffer and write that to TdsWriteStream. I'd suggest doing this in TdsWriter, but depending where the TdsWriteStream buffer is exposed we could do this upstream.

The way we handle tiny writes is difficult though. Even writing an Int16 could theoretically cross TDS packets, incurring network IO. In the async case, this means that we're forced to await each WriteShortAsync in sequence. Although I'm not sure whether or not ValueTask allocates if it completes synchronously, (I don't think it does) there are potentially still state machines to deal with. This could be unpleasant if we're sending a large number of parameters, sending table-valued parameters (or something else which generates a large number of small writes.)

I've alluded to this in the past, but I think we should aim to completely eliminate explicit tiny writes, instead trying to build up a structure in memory and writing that. This could be slightly more complex if we're going to support a table-valued parameter with a varbinary field streamed in from a file (because we're dealing with a combination of in-memory bytes and a Stream) but the net result is that we reduce the number of awaits - ideally, to a single WriteAsync.

Reads

I've not got any firm opinions here - I think we're hedged into performing many tiny reads because of the TDS protocol. I'd like to avoid that where possible; if we receive row metadata which indicates that every column in the recordset is fixed-length and non-nullable then can we optimise for a single fixed-length read?

Streaming data (whether byte arrays or strings) remains important here too. When presented with a varbinary(N), we might need to return a specific type of unseekable stream to the client which guarantees that we'll read N bytes from the network (but never more than that.) The implications of this would probably be difficult though - if we return a stream to the client, the client retrieves the next field and then reads from the stream, the transport stream will already have moved past the end of the varbinary data. There'd need to be some buffering in that stream instance, but this'd naturally cause memory usage to balloon with large amounts of data.

General streaming IO

When I normally think of streaming data, I usually think about byte arrays! When streaming large varchar/nvarchar/xml columns though, there's the overhead of converting from a byte array to a string. Encoding.GetString typically only works with a specific range of bytes stored in memory, and this can be difficult if we're dealing with Unicode characters - each character is two bytes, so we could have a character "split" between TDS packets.

This is purely a reminder: when decoding a string, we should use the relevant System.Text.Decoder or System.Text.Encoder classes. These can handle decoding from multiple byte buffers.

0 replies

benrr101
Jul 22, 2024
Maintainer

One idea I had percolating around when I was considering how to handle MARS read streams was to effectively link together byte arrays to allow processing without the need to copy. I wonder if we could utilize something like this to better address one of the sorta problems you mentioned.

Eg, when we write an int:

Get byte[] a
Write int bytes into a
Call WriteAsync(a)
Copy from a into stream buffer b
When b would be full, flush contents of b

Introducing a slightly more complex data structure, we could have:

Rent a special returnable byte[] a
Write in bytes int a
Call WriteAsync(a)
Append a to a linked list structure of bytes to write
When total length of bytes to write meets or exceeds the length of a packet, the linked list is scanned and dumped to the stream
As each element in the linked list is written, it is disposed, returning it to the array pool it was allocated from.

The main benefit of this is that instead of copying a to b to stream, there is only one copy a to stream. The main drawback of this is that there would be more calls to stream.WriteAsync.

4 replies

@Wraith2

Wraith2 Jul 23, 2024

Avoid byte arrays wherever possible. They lock you into arrays not spans. So, how about

let tds = static helper
let a = some value
let stream = writing primitive
let length = tds.get_byte_length_of(a)
let span= stream.get_span(length)
let written_bytes_count = tds.write_bytes(a, span)
stream.advance(span, written_bytes_count)

This pattern allows you to start with a single packet sized (4, or 8k usually) byte[] and reserve the first 8 ( or more if you know you need an smux header too) bytes of it.

When something requests a buffer you give it back a span from the buffer most of the time. When advance is called you can do a simple ref equality check on the span to make sure it's the current head you rented and then increment the head counter by the written bytes count.

When something requests a buffer longer than the amount of space you've got available in the packet you can allocate a temporary array and pass it back to the caller. Then when they call advance and pass you back the span you'll be able to tell if wasn't to your original buffer and work out how much data to copy from it to fill your current buffer. This is the point where you'd dispatch a packet.
Then create a new buffer, reserve header, copy the remainder of the temporary buffer into it.

Doing it this way has two really useful properties.

it abstracts away all the logic needed to handle partial buffers so you don't have to handle it in WriteInt, WriteLong etc.
it removes the need to handle packetization at higher layers and makes it entirely handled by the output writer.

@edwardneal

edwardneal Jul 23, 2024

I like the approach, with a caveat that we'd want to think about how we write large byte spans. Perhaps the get_span / advance approach should be limited to around 16k-32k bytes? This is enough to cross 4-5 packets, which should be plenty for most peoples' needs. Once we cross packet boundaries by sending a large (e.g. 10MB) byte span, the allocation of the temporary array can double memory usage. If we're in the middle of a partial packet, even a well-meaning client which chunks its writes by the packet size would trigger this behaviour.

One way to optimise this could be to leak the packet buffer state to the client with a get_remaining_buffer_span method. Writing a large file could be a matter of:

let file = new byte[int.MaxValue]
let position = 0;
while (position < file.length)
{
 let span = stream.get_remaining_buffer_span();
 file.AsSpan(position, span.Length).CopyTo(span);
 stream.advance(span, span.Length);
 position += span.Length;
}

I think we'd end up with zero temporary buffer allocations outside of stream with this method. The same method could also apply to large strings via Encoder.GetBytes.

In either event, the method we're currently talking about also reminds me a little of ReadOnlySequence and IBufferWriter, so there might be something in the design / implementation of this which helps.

@Wraith2

Wraith2 Jul 23, 2024

I like the approach, with a caveat that we'd want to think about how we write large byte spans. Perhaps the get_span / advance approach should be limited to around 16k-32k bytes?

For non-fixed length data you'd need to be able to ask the stream for the largest buffer it could support, a GetRemainingBuffer function if you like, and then use the appropriate Encoder or Decoder to translate the data directly from the input into the outgoing buffer. Fill the buffer, advance, get next buffer and repeat. We shouldn't really need temporary buffers for non-fixed length content.

@benrr101

benrr101 Jul 23, 2024
Maintainer

@Wraith2 In my example I tend to use byte[] and span interchangeably. I understand there's semantic differences between the two, but ultimately a span has to be backed by concrete memory somewhere. You could just as easily build a linked list of spans as a linked list of byte[]s. The main benefit of my idea is that it doesn't rely on leasing memory, limited by the size of the buffer. Instead, it allows for free-form construction of a message and sending it to the underlying stream directly.

To address the two benefits of your idea:

abstracts away logic to handle partial buffers

That would still be handled by the stream or the writer. If we have a block of memory in the linked list, we packetize it and send however much will fit in the packet. Then we start writing the rest of the memory block before disposing it from the linked list.

removes the need to handle packetization at higher levels

That's been part of the design since the beginning - the stream handles the packetization of what's written to it. But the question is whether it needs to write to a buffer before writing to the underlying output stream.

One thing to consider here is that both of these ideas deviate from the goal of using a Stream. If we keep making changes to how things should be buffered, I think we may need to take a slight step back and double check what belongs in the TdsStream and what belongs in the TdsWriter. The TdsStream should aim to adhere to the Stream interface while we can pretty much build whatever we want into the TdsWriter.

Tornhoof
Aug 8, 2024

As the PR is now open, can we excercise the above concerns (and maybe mine below, which in part overlap):

SqlClient/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClientX/Tds/Tokens/LoginAck/LoginAckTokenParser.cs

Lines 14 to 32 in 84bc56d

public override async ValueTask<Token> ParseAsync(TokenType tokenType, TdsStream tdsStream, bool isAsync, CancellationToken ct)

{

_ = await tdsStream.TdsReader.ReadUInt16Async(isAsync, ct).ConfigureAwait(false); // length

byte type = await tdsStream.TdsReader.ReadByteAsync(isAsync, ct).ConfigureAwait(false);

SqlInterfaceType interfaceType = LoginAckTokenParser.GetSqlInterfaceType(type);

uint version = await tdsStream.TdsReader.ReadUInt32BEAsync(isAsync, ct).ConfigureAwait(false);

TdsVersion tdsVersion = LoginAckTokenParser.GetTdsVersion(version);

tdsStream.TdsVersion = tdsVersion;

string progName = await tdsStream.TdsReader.ReadBVarCharAsync(isAsync, ct).ConfigureAwait(false);

byte major = await tdsStream.TdsReader.ReadByteAsync(isAsync, ct).ConfigureAwait(false);

byte minor = await tdsStream.TdsReader.ReadByteAsync(isAsync, ct).ConfigureAwait(false);

byte buildHi = await tdsStream.TdsReader.ReadByteAsync(isAsync, ct).ConfigureAwait(false);

byte buildLow = await tdsStream.TdsReader.ReadByteAsync(isAsync, ct).ConfigureAwait(false);

ProgVersion progVersion = new ProgVersion(major, minor, buildHi, buildLow);

return new LoginAckToken(interfaceType, tdsVersion, progName, progVersion);

I still have a few questions (this is generic, I'll just take this token as an example):

How many of the tokens (any token) exist per typical sql operation, I'm a bit concerned about the amount of objects created
Are there any tokens with only a single Read operation? because the whole ValueTask as return type is kinda moot, if there is always an await in there, the same applies to the ValueTasks in TdsReader.

Honestly, the amount of awaits here is frightening, we're talking about a protocol parser after all, there should be 0 (maybe 1) awaits in here besides a possible await for VarCharAsync read as you know the minimum bytes required to build this token.

13 replies

@AArnott

AArnott Aug 9, 2024

It looks like my README doesn't list it, but SequenceReader<T> is part of the library, in an API compatible way so that we type forward to the .NET version when it's available.
https://github.com/dotnet/Nerdbank.Streams/blob/main/src/Nerdbank.Streams/SequenceReader.cs

@saurabh500

saurabh500 Aug 9, 2024
Maintainer Author

We are discussing the types that need to be read in case of Length prefixed data, where the length of complete data will be prefixed before the data stream.

However there is a case of partial length prefixed data, where we know the incoming datatype, but the server sends data in chunks.
The data stream begins with a Long Constant which signifies that the length is not known unfront.
Each chunk has the length prefixed to it. The last chunk is of length 0, which causes the clients to stop looking for data, and then assemble all the chunks into the datatype.

[Constant][chunk_len][array of bytes of chunk len][chunk_len][array of bytes of chunk len]...[chunk_len][array of bytes of chunk len][0]

XML payload may be sent this way, which is ultimately a string, which will need to be assembled from multiple chunks, and then converted to an XmlReader or whatever type the user has requested from the SqlDataReader.

@edwardneal

edwardneal Aug 9, 2024

Thanks @Tornhoof and @AArnott - that makes sense to me! We might need a slightly different method to maintain support for .NET Framework though. SequenceReader is a ref struct, but these require C# 11, which in turn requires .NET 7.0 and newer.

@saurabh500

PLP data's a little trickier to deal with. We can still daisy-chain ReadOnlyMemory<byte> views over the same buffer in a linked list/ReadOnlySequence<byte> (so we're still not copying data around.) If a PLP chunk straddles TDS packets, I'd consider one chunk to be represented by multiple ReadOnlyMemory<byte> instances - one per packet.

Being able to construct a Stream or a TextReader over this would be helpful. We could then load the XmlReader from it, to spare the need to construct a string (in this instance, at least.) I agree on the underlying point though: assembling a byte[] or a string from PLP will likely be a little frustrating and memory-heavy.

If we can make this I/O design work efficiently, it might be a good idea to look at these discussions/PRs with a view to writing performance optimisation guidance for developers using SqlClient.

@AArnott

AArnott Aug 9, 2024

SequenceReader is a ref struct, but these require C# 11, which in turn requires .NET 7.0 and newer.

Not true, actually. You can use the latest version of C# with any version of .NET or .NET Framework. We do it all over the place. C# will disable its own features where necessary based on the target framework and runtime version.

@edwardneal

edwardneal Aug 10, 2024

Thanks @AArnott, @Tornhoof - you're right, and I'd just misread the spec. yesterday evening. In that case, we can just use your approach for SequenceReader as-is for both .NET and .NET Framework.

Tornhoof
Aug 9, 2024

@edwardneal ref structs are c# 7.2, you might have mixed that up with ref fields, so you can use ref structs since .NET core 2.0. ref fields just made the implementation of Span a lot cleaner/easier.

For netfx you'd still need something like the type by Andrew, no way around that.

0 replies

Client X I/O #2689

Uh oh!

Uh oh!

saurabh500 Jul 19, 2024 Maintainer

Purpose:

Transport requirements

TDC protocol basics

Message

Outgoing Packet Header

Incoming packet header

Writes

Reads

Streams

TDS Write stream

TDS Read Stream

Motivation behind streams

Deviation from stream

Problem

Potential solution

Pipelines

Ask from community

Replies: 5 comments · 17 replies

Uh oh!

saurabh500 Jul 19, 2024 Maintainer Author

Uh oh!

Writes: buffer

Writes: streaming and tiny writes

Reads

General streaming IO

Uh oh!

benrr101 Jul 22, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benrr101 Jul 23, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saurabh500 Aug 9, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saurabh500
Jul 19, 2024
Maintainer

Replies: 5 comments 17 replies

saurabh500
Jul 19, 2024
Maintainer Author

benrr101
Jul 22, 2024
Maintainer

benrr101 Jul 23, 2024
Maintainer

saurabh500 Aug 9, 2024
Maintainer Author