Wraith2
Jun 25, 2024

There have been a number of PR's referencing SqlClientX and they seem to be adding new functionality.
I'd like to understand the parameters of this effort, what platforms and runtimes is it targeting.
I'd like to know what features it is going to be used to add if it is possible to do so.
I'd like to know how this changes the datalab/woodstar work that seems to have stalled.
I'd like to understand the driving force behind the effort because it seems to be coming from non-EF set of developers meaning that political things inside MS have changed.
I'd like to know how external contributors can be involved and informed about this and help out if it is possible.

/cc @saurabh500

Replies: 6 comments 8 replies

edwardneal
Jun 25, 2024

I'm not sure what the larger design is, and what's started to drive the broader SqlClientX effort; I get the impression that a design's been developed offline and is being committed piecemeal.

Based on discussions within the PRs, my understanding is that the long-term goal is to replace the managed TDS implementation (completely or in part) and eliminate the various async-over-sync / sync-over-async code paths, then to move towards removing the native TDS implementation. This would maintain the .NET Framework and .NET targets, but enforce the documented SQL Server version compatibility more stringently and deadhead the code which supports it (so .NET Framework would probably lose SQL Server 2000 support and both versions would probably lose SQL Server 2005/2008/2012 support.)

We looked at an issue around connection-level packet locking a few months ago, and I'd tried to implement MARS and transports with minimal locking. I eventually found that while the transport layer could be improved, this increased lock contention in TdsParser and rendered the changes nearly meaningless until that could be addressed. I don't personally think that a partial TDS replacement would make a major performance difference, so suspect that the goal is a full replacement.

I'd also like to understand the wider SqlClientX effort though - a lot of this is educated guesswork and speculation.

0 replies

saurabh500
Jun 25, 2024
Maintainer

Thanks @Wraith2 for starting this discussion.

Introduction

Microsoft.Data.SqlClient (will be called SqlClient in the rest of the discussions), is a feature rich driver which has been serving the needs of the SQL Server customers since a really long time. The driver has evolved from System.Data.SqlClient namespace in System.Data.dll in .Net framework, to System.Data.SqlClient.dll in .Net core to now Microsoft.Data.SqlClient which can be used by applications on .Net framework and .Net runtime(s).

We looked at the issues in SqlClient and though there are many reported on Github, but there are two issues which stood out, which the team decided to prioritize, to solve the needs of the modern applications built on .Net, which intend to use SQL server as the database

Connection Pool throughput.
Async API performance/throughput of SqlClient.

Connection pool

For Connection Pool throughput issue, the issue is quite simple. The driver's connection pool was created for conservative warm up, and doesn't allow a burst of connections. We need to change the connection pool design to allow more than one connection to be opened in parallel. We will cover the strategy to tackle the connection pool in separate discussions, or different post in this discussion.

However to increase the throughput of the connection pool, we need to move away from current Connection establishment method, where we invoke a constructor SqlInternalConnectionTds. We need a modern connectivity internal which is async all the way down to the I/O but at the same time, something that serves the purpose of Sync connection.Open() calls as well.

Async API performance improvement

Today, the Async APIs of SqlClient are much slower to perform, when compared to the sync APIs. Async is not expected to be always faster than Sync, but even with a modest workload using SqlClient, with reasonably large amount of data, the async APIs performance can be lot better. This can be seen in various issues reported on Github, and many issues brought to the attention of our team, from customers inside Microsoft itself.

We have had significant contributions from the community to improve the async behavior of SqlClient, and it is clear that the SqlClient today, is lot better than what it was about 6-7 years ago.

Q: But can the driver performance be even better, with a completely managed .Net implementation of SqlClient?

Based on the feedback from EF team, and the folks who have worked on the driver, we spent some time working on writing the driver to be able to fetch some large data with async APIs (1MB, 5 MB , 10 MB, 20MB) varchar (max) from the server and test its performance.

@roji had opened the issue #593 which shows that the driver performance while reading large data using Async APIs leaves a lot to be desired from the driver performance.

At this point we had the following to work with:

Modern .Net can be used for high performance networking, which is visible with the successful of Npgsql driver, which is a very high perf ADO.net driver.
Async-first should be the principle to be followed. Its easier (not trivial), to add sync on top of an async implementation, but not the other way around.
Synchronization primitives have async capabilities as well. Reduce the usage of synchronization primitives, and use them wisely when needed.

Working with these principles, an implementation of SqlClient focussed on Async I/O using .Net 8 runtime, was tried. https://github.com/saurabh500/SqlClient/tree/refactor
The implementation was focused on reading integers and varchar(max) from the TDS protocol. It changes the way the driver does I/O, while reusing a lot of existing code for reading data/metadata etc.

We noticed a VERY VERY significant difference between the current implementation of MDS and the experiment in the link I have posted above. It turns out that we can improve the driver's async performance for large data reads by atleast 2X (and I am being very very conservative with this number). The benchmarks were done on Windows and on Linux. There was no native SNI involved in the new implementation.

I haven't performed any benchmarks with Tech empower.

Keep in mind, that the implementation that I have posted above was a matter of writing code for only a week, to see if using the principles above, without any micro-optimizations can really improve SqlClient! Well, turns out that the principles can hold true for SqlClient's perf improvements.

Well, good. Now what? SqlClientX is born

We have this large driver codebase, with a lot of features, and this proof that it can be better. So how do we combine the two?

We all agree that if we change the way the I/O is rewritten in the driver, then we need to make sure that all the parts of the driver which touch I/O (which is a lot of parts of the driver), will need to be changed.

It all starts with the connection, and then goes into the Query execution.

It is hard to remove pieces of the driver, and write them and plug them back in, so we decided to start with Connection Pool, and start by taking care of connection pool throughput improvement, followed by query execution.

What we are doing is, for public surface area like SqlConnection SqlCommand and SqlDataReader we will be introducing their X counterparts. SqlConnectionX, SqlCommandX and SqlDataReaderX.

These X types will never be exposed in the API of the SqlClient. But X types allow us to build the important pieces without disrupting the main driver.
So how do we use X ?

We want to be able to somehow be able to tell SqlConnection to use its SqlConnectionX counterpart to do direct all its APIs.
We don't want to expose any connection string parameters. At some point we want the capability of X to become the default, but there is still some time to get there. Hence we will be leveraging SqlDataSourceBuilder with experimental APIs or ExperimentalDataSourceBuilder (still TBD) to be able to use SqlConnectionX.

When we know that SqlConnectionX and its other X friends are ready ( I will define "ready"), we want to be able to offer SqlConnectionX's implementation as the default implementation.

Goals

Backward compatibility. The SqlConnectionX implementation needs to be backward compatible. Remember that the customers will eventually get an upgraded Nuget package. They should only expect a fast driver, not a driver where the behavior was changed. There are some areas, where back compat may be impacted, e.g. Error messages.
We need absolute Focus. This means that while we embark on the journey of improving the driver performance, which focussed on I/O implementation improvement, we may be tempted to think that we can re-imagine every part of the driver. Though that makes a lot of sense, but we don't have unlimited time. For this effort to be successful, we need to absolutely focus on the I/O changes.
Make the I/O of the driver a bit more structured. There is some more structure that is possible in the driver apart from a TdsParser, TdsParserStateObject and SNI 😄
This is NOT AN EXPERIMENT. This effort is to productize a better SqlClient driver. We want to remove dependency on SNI.dll, start by unifying the codebase for SqlClient for .net core (no OS specific switches), and then take on NetFx.

@Wraith2 I hope I have begun to answer your questions. I imagine more questions coming up about SNI and what not. But I think I will stop my post at this point. We can take more discussions around why Stream, why not Pipeline, or when Pipeline etc. But I hope to focus this discussion on the goals of SqlClientX. I hope my point about Backward compatibility speaks of the platform and runtime.
However, there is an execution plan that we have in our mind as well, which will talk about what platforms will be supported when.

@edwardneal

I get the impression that a design's been developed offline and is being committed piecemeal.

Yes and no. We have a general idea about what we are trying to do and how we want to structure the connectivity pieces of the driver. Not everything is ironed out, like Exception handling, event source tracing as you can tell from my PR. However, we felt that we need to have a point of view of how we want ClientX to look like. We have folks working on the driver, who have a lot of experience with the driver and other team members, for who the driver space is new. Hence you will see some patterns emerge in how the driver would look which is the outcome of the discussion inside the team. Overall the patterns would not change, but there is some fine tuning that is needed.

/cc @Wraith2 @ErikEJ @edwardneal @roji @benrr101 @mdaigle @VladimirReshetnikov @samsharma2700 @ajcvickers @David-Engel @JRahnama

Folks, I am sure I haven't copied all those who might be interested in this blurb. Feel free to tag other contributors for SqlClient.

4 replies

@Wraith2

Wraith2 Jun 26, 2024
Author

We have a general idea about what we are trying to do and how we want to structure the connectivity pieces of the driver. Not everything is ironed out, like Exception handling, event source tracing as you can tell from my PR. However, we felt that we need to have a point of view of how we want ClientX to look like.

Can you given an overview of the things that you want to change and ideas about changing them? You've mentioned sni and pooling, are you looking at completely reworking those? and if so and the interfaces change the requirements of the layers that they depend on are we going to end up with a second internal codebase or attempt to keep sharing things like the TdsParser or just rewrite it?

We've discussed before that postgres had a good pooling strategy but has now changed to using Channels from the bcl, are we investigating that sort of thing?

For woodstar my hope was that a complete rewrite would give is a lot more freedom to have separate and individually testable layers. Testing the parser should not need an active sql server and we should be able to comformance check the parser while adding new features.

@saurabh500

saurabh500 Jun 26, 2024
Maintainer

@Wraith2 your points align with our intentions as well.
To work together and take this forward, I propose I close this discussion since this revolves around creating clarity around SqlClientX and open individual discussion items which talk about the following

Connection Pool.
Usage of Handlers which to an extent, tie into testing the driver's pieces without a server.
Streams and how we envision using them. Would like to bring the community to speed on how the Streams fit into the design and how we want to leverage them further. If we want to use pipelines, then we can abstract out the usage of stream, so that the consumers of streams don't know whether they are using a stream or pipeline, but they should only know that they can read and write bytes and there is a sync and async path forward.
Testing. We can likely unit test a bunch of stuff in SqlClientX but then how do we approach the problem of serverless testing when we are parsing/sending a data stream and not just individual tokens.
Anything else.

I think each of these items can lead to very in-depth discussion, and hence they deserve their own discussion items.

@Wraith2

Wraith2 Jun 26, 2024
Author

Ok. Seems reasonable. On streams vs pipelines it's worth remembering that using NetworkStream is likely to be one of the reasons behind the long running threading issue #422 #1543 (comment)

@benrr101

benrr101 Jul 2, 2024
Maintainer

@Wraith2 to address your desire for serverless testing, I am definitely pushing for good design practices that allow for unit testing of each layer. Although I acknowledge we will always need some degree of integration testing that leverages a real server, I think the existing codebase has overindexed on integration testing, and significantly underindexed on granular unit tests.

saurabh500
Jun 25, 2024
Maintainer

@roji is the right person to comment on Woodstar.

0 replies

saurabh500
Jun 25, 2024
Maintainer

@Wraith2

I'd like to understand the driving force behind the effort because it seems to be coming from non-EF set of developers meaning that political things inside MS have changed.

Many of the folks contributing to the PRs work in the SQL Server organization.
We actively consult with the EF team and they proactively help us.

0 replies

Wraith2
Jun 25, 2024
Author

t turns out that we can improve the driver's async performance for large data reads by atleast 2X

I can linearize the performance and bring it into line with sync performance in terms of time, there is additional memory overhead (largely inline with existing async) but it's tolerable. I've been able to do this for years, see #593 (comment)

I spent the weekend reimplementing that PR on the current codebase and adding test coverage for the new packet handling that is needed to make it stable. I've been waiting for the merge of the codebases to get to a point where i didn't have to duplicate the work onto the two coedbases but i've given up waiting.

A major problem with SqlClient is that there isn't enough investment in it. We could move a lot faster if PR's could be reviewed and tested faster.

Another major problem is communication, if you'd opened a feature request for some of the things that need doing a community member might have picked them up and given it a go but because any internal-to-MS requirements aren't communicated the first contributors know of it is when a PR turns up to add the feature.
This SqlClientX is a good example of communication done poorly. There have been PR's opened adding all sorts of interesting things that community members are interested in but nothing is said until i directly query what's going on? Why not start with the communication and get us all involved?

4 replies

@saurabh500

saurabh500 Jun 25, 2024
Maintainer

@Wraith2 I disagree that SqlClientX is an example of communication done poorly.
It's OK to take sometime to build a point of view before coming out to the community about the plan. I love the community but ultimately when CSS comes with a dump / trace of a customer production issue, we are responsible for debugging it. So I don't see anything wrong with the team thinking about not just improving but supporting the driver.

A major problem with SqlClient is that there isn't enough investment in it. We could move a lot faster if PR's could be reviewed and tested faster.

I have worked on this driver long enough to confidently say that anyone would simply be bleeding money/time and effort if we keep building on top of the I/O of the current driver. It was built on top of native SNI, which was written for Sync programming, was then hurriedly modified for async APIs in ADO.Net and then the same patterns simply got into the managed SNI implementation, again because this wasn't thought through. There are many comments after the link you have provided which support a similar point of view.

If you talk about investments, then the fact that we are actually looking into the driver with a fresh perspective into the I/O speaks of the investment we are ready to make.

@Wraith2

Wraith2 Jun 25, 2024
Author

In an ideal world we could rework the SNI layer entirely without people noticing. I don't think we can do that because of so many quirks people might be relying on. Any small change in locking behaviour breaks someone. See my attempt to rework the mars muxer for the sort of thing I'm talking about. Swapping out the implementation and having people not need to adapt their code is a nice idea but unlikely to work for a small but significant amount of users with legacy codebases.

It's nice that people are looking at fresh approaches to improve the library. It's a bit strange that the existing attempt to do that (woodstar) seem to have died and we can't get a response about it. It smells like the internal disconnect between two different divisions with MS are being surfaced to users again.

There contributors to the library who have worked around the IO in the library who might have valuable information about how to rewrite efficiently, using pipelines for example. It would be a good idea to work with them.

@roji

roji Jun 25, 2024
Collaborator

Here's my perspective on things...

Over the past few years we've occasionally had the discussion on whether SqlClient can/should be evolved in its current form, or whether it basically needs a deeper rewrite. For a long time, I've been of the opinion that SqlClient is very hard to evolve precisely of how the codebase looks; the way async is implemented is probably the best (but not the only) example of this - it represents a huge amount of technical debt that is extremely hard to untangle. Another form of technical debt in SqlClient is the split across Windows (native SNI) vs. non-Windows (managed), as well as the split across .NET Framework vs. modern .NET. To be frank, I think there's simply no reason for a database driver to be as complicated as SqlClient - it's merely an accumulation of many years of technical debt etc.

Rewriting anything indeed carries the risk of unintended backwards compatibility breaks - that's a very valid point. But I believe that at some point, trying to untangle something as complicated as SqlClient can be even more risky than a rewrite. I'm thinking of various PRs that I've seen along the years - some of which you authored - which tried to fix a specific point but which then had to reverted because they regressed something. This happened not because the contributor didn't wasn't smart enough or didn't do enough work - I think it's just very hard to do anything with the current state of things. I also have a strong sense of an fundamental rewrite simply being needed to move forward in certain areas, such as the I/O layer; there's sometimes no other way for a product that's been around for as long as SqlClient - don't forget that EF and .NET were also "rebooted" (and to a large extent rewritten) with .NET/EF Core.

Re Woodstar... I see this new effort as basically being Woodstar, in a sense. True, it's not a greenfield new driver since it's being done within SqlClient and must respect backwards compat, but the goals of the two projects are the same - arrive at a modern, efficient SQL Server driver without all the technical debt, and which is able to evolve safely and quickly. In fact, it's much better for this effort to be done by SqlClient engineers, rather than by outsiders as an experimental project, with uncertain maintenance/stability guarantees (which is what Woodstar would have become). This way, we have the SqlClient team itself taking on this task, and ensuring its evolution and maintenance going forward - and I have full confidence in them doing the right things.

Finally, from an external contributor perspective, I'd propose looking at this as an opportunity: it's a chance to provide insights and contribute to building the driver right, without being bound to the current codebase and all the difficulties it imposes.

@roji

roji Jul 1, 2024
Collaborator

For people interested in Woodstar, I've written a quick summary of the project: dotnet/datalab#22

mdaigle
Jun 26, 2024
Maintainer

I've started a discussion for the connection pool redesign over at #2612.

0 replies

What is SqlClientX #2603

Uh oh!

Wraith2 Jun 25, 2024

Replies: 6 comments · 8 replies

Uh oh!

edwardneal Jun 25, 2024

Uh oh!

saurabh500 Jun 25, 2024 Maintainer

Introduction

Connection pool

Async API performance improvement

Well, good. Now what? SqlClientX is born

Goals

Uh oh!

Wraith2 Jun 26, 2024 Author

Uh oh!

saurabh500 Jun 26, 2024 Maintainer

Uh oh!

Wraith2 Jun 26, 2024 Author

Uh oh!

benrr101 Jul 2, 2024 Maintainer

Uh oh!

saurabh500 Jun 25, 2024 Maintainer

Uh oh!

saurabh500 Jun 25, 2024 Maintainer

Uh oh!

Uh oh!

Wraith2 Jun 25, 2024 Author

Uh oh!

saurabh500 Jun 25, 2024 Maintainer

Uh oh!

Wraith2 Jun 25, 2024 Author

Uh oh!

Uh oh!

roji Jun 25, 2024 Collaborator

Uh oh!

roji Jul 1, 2024 Collaborator

Uh oh!

mdaigle Jun 26, 2024 Maintainer

Wraith2
Jun 25, 2024

Replies: 6 comments 8 replies

edwardneal
Jun 25, 2024

saurabh500
Jun 25, 2024
Maintainer

Wraith2 Jun 26, 2024
Author

saurabh500 Jun 26, 2024
Maintainer

Wraith2 Jun 26, 2024
Author

benrr101 Jul 2, 2024
Maintainer

saurabh500
Jun 25, 2024
Maintainer

saurabh500
Jun 25, 2024
Maintainer

Wraith2
Jun 25, 2024
Author

saurabh500 Jun 25, 2024
Maintainer

Wraith2 Jun 25, 2024
Author

roji Jun 25, 2024
Collaborator

roji Jul 1, 2024
Collaborator

mdaigle
Jun 26, 2024
Maintainer