-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Prefer matching editor sessions when opening files. #6191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b61e5d6
to
8680610
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution and nice work!
The test failures seem to be unrelated to your changes. Not sure what is going on there.
@code-asher Here's a product question: If two tabs for the same workspace are opened, and if the later-opened tab is closed, should we open files in the first tab? Or are we okay with the first tab being orphaned?
To rephrase in pseudocode:
tab1 = openEditor("/foo/")
tab2 = openEditor("/foo/")
tab2.close()
openFile("/foo/README.md") // Should this error, or open in tab1?
Another product question: We should support editor resolution across multiple daemonized code-server instances, right? Or is it sufficient to assume a single a daemonized code-server instance?
Also I think appending like this will cause us to endlessly accumulate socket paths? I know we have the deletion code but that only deletes until it finds a good socket and since the good socket will always be the latest we will never clean up the old ones. Plus that code only runs when you are trying to open a file via the CLI which some folks may never do. Changing how we write here would more or less remove the need to clean up there.
Yes, you're right, the PR right now has the issue of only deleting when opening files via CLI.
If we don't want orphan tabs (#6191 (comment)), we cannot overwrite an entry based on some key (i.e. workspace id) as a mechanism to limit the growth of entries, as each session needs to be stored independently.
AFAIK this means we need to perform some sort of explicit "garbage collection" procedure to limit the growth of entries.
Here's my thought process into how to do this, which builds up to what I'm ultimately proposing (3).
1) Garbage collect on startup
Suppose we "garbage collect" when a daemonized code-server instance starts up: iterate through all sockets and delete dead ones.
This still leaves the possibility for entries to grow unbounded if only a single code-server instance runs and never stops.
2) Periodic garbage collection
Same as (1), but we garbage collect periodically as to limit entries from growing unbounded given a single code-server instance.
Note that if n daemonized code-server instances are running, we have to live with n times more garbage collections.
Read-write races are okay here since two garbage collections should return compatible if not same results.
3) Periodic garbage collection by a single instance
If we're not okay with n-times more garbage collections, we can allow at most a single instance to own responsibility of garbage collection (a "leader"):
Same as (2), but upon garbage collection, we write a pid, timestamp
pair to vscode-ipc
that indicates which instance performed the GC, and when it happened.
All instances will periodically check the vscode-ipc
file:
- If the PID of the instance matches the pid of the
pid, timestamp
pair, we garbage collect - Otherwise:
- If the timestamp is beyond a certain threshold (i.e. 2.5x the gc period), we assume that the leader is down. And so we garbage collect, changing the leader.
- Otherwise, we skip garbage collection.
@code-asher WDYT?
There is also another possibility where we use an HTTP server instead of a file to manage editor sessions. This could be served by a daemonized code-server instance, and run under a dedicated UDS socket.
- Editor registration: Editor calls a POST handler at the socket with a JSON body.
- Editor resolution: CLI calls a GET handler.
This way, because a single server handles registration, we never have to deal with races. Also, we can just use JSON instead of new-line delimited JSON. Also, we have an API :)
To account for multiple daemonized code-server instances, we can serve under a single instance, using the mechanism detailed in #6191 (comment) to select the serving instance (and garbage-collecting instance).
Here's a product question: If two tabs for the same workspace are opened, and if the later-opened tab is closed, should we open files in the first tab? Or are we okay with the first tab being orphaned?
Interesting scenario! All else being equal opening in the first tab does seem like the better option but opening the same workspace multiple times at once seems like an unusual workflow to me (maybe there is a use case I am missing though) so I personally would not be opposed to having it be orphaned if that makes things easier for us.
We should support editor resolution across multiple daemonized code-server instances, right? Or is it sufficient to assume a single a daemonized code-server instance?
We have generally been operating under the assumption that there is a single code-server daemon per user. I know some people run one instance per project/workspace but that is not how it was meant to be used. That does make me realize we might want to use ~/.local/share
or something instead of /tmp
though as multiple users could be writing their own sockets to it.
These are very well thought-out options! If we end up deciding a tab can be orphaned if a second tab with that same workspace is opened then we could do this file-based (named by workspace ID or something) that way we would not need any garbage collection. Actually that would eliminate a type of theoretical race the garbage collection system has: if a tab is opened as garbage collection is running after the collector reads but before it writes.
code-server is meant to be long-lived so I would avoid only garbage collecting on startup and since we expect one instance per user I think it will not be worth the effort to do any coordination between multiple instances.
I think the API idea is brilliant. Come to think of it, we actually already use this pattern where VS Code checks for updates against an /update
endpoint that the daemon serves so there is some prior art here.
We could probably skip the GET handler if we wanted, since it would be the same instance we could just call something directly. But maybe a GET handler would be nice if there are external tools that want to hook into this API or something.
The only other ideas I have are to do some kind of file locking or store the socket paths via VS Code's storage provider which presumably does its own file locking. The advantage to one of those is that the change would be more likely to be accepted upstream one day, but since this is already a feature specific to code-server going the API route seems reasonable to me.
So I was trying to see how native VS Code does this and I think they do it by creating predictable socket names (see createStaticIPCHandle
). For example we could patch the socket paths to be /tmp/vscode-ipc/<hashed directory>.sock
. If it already exists we see if it is in use and if so we append -2
and so on until we find a free suffix to use (or instead of hashing we could have a nested directory structure so /tmp/vscode-ipc/my/workspace/ipc-2.sock
or something).
Then when opening a file like /home/coder/foo/bar/baz
we hash /home/coder/foo/bar
, look for a match, hash /home/coder/foo
, look for a match, and so on. If no match we select the most recently created socket.
Presumably VS Code cleans up their own sockets so we would not have to worry about that. Or even if they do not clean up their own sockets at least it would be an already existing problem rather than a regression.
We have generally been operating under the assumption that there is a single code-server daemon per user. I know some people run one instance per project/workspace but that is not how it was meant to be used.
Great, not having to worry about multiple instances should dramatically simplify things :)
If we combine this assumption with going the server route, we can simplify things even more by keeping the session registry entirely in memory (instead of in a file). Once the daemon dies, all the sessions end up dying too, so we'll never have sessions from previous startups.
All else being equal opening in the first tab does seem like the better option but opening the same workspace multiple times at once seems like an unusual workflow to me (maybe there is a use case I am missing though)
If I reflect on my own usage of code-server, for most part I converge to one editor session per workspace. But then sometimes I accumulate enough Chrome windows and tabs to the point that spinning up a new editor session to a workspace I might already have opened is less work than finding my original tab.
Or suppose I accidentally open the workspace again in some tab. In either case, I would not expect opening and closing the later opened tab to affect the behavior of some previously opened tab.
Here's another thing to consider: once #1964 is resolved, you could imagine use cases popping up that involve hotlinking / bookmarking files, which probably only makes sense to open in new editor sessions (without something like a Chrome extension to foreground existing tabs, but I digress). For example, I currently use code-server to jot down notes in markdown in a single workspace, and once #1964 is resolved I'd want to be able to make quick ephemeral edits to certain files by just clicking into my bookmarks bar.
FWIW VSCode will open files in an earlier-opened tab if a later-opened tab is closed.
So I was trying to see how native VS Code does this and I think they do it by creating predictable socket names (see createStaticIPCHandle).
I traced through where the IPC path comes from in extHostExtensionService and it seems like it actually uses createRandomIPCHandle
, which also makes sense if you look through the logic specified in the functions. If you search for createStaticIPCHandle
it seem to be used only in the electron case.
For example we could patch the socket paths to be
/tmp/vscode-ipc/<hashed directory>.sock
. If it already exists we see if it is in use and if so we append-2
and so on until we find a free suffix to use (or instead of hashing we could have a nested directory structure so/tmp/vscode-ipc/my/workspace/ipc-2.sock
or something).
That's an interesting idea, to use the filesystem itself as some sort of registry, where the paths are esssentially registry keys. And not have to maintain a separate source of state.
We still have an orphan tab problem that any sort of registry with overwrites would cause.
But even if we ignore that, we run into issues once we consider multi-root workspaces. For example, a workspace with two roots would have to be registered under two keys, aka two paths. And a socket can only live under one path.
(...unless we symlinked the socket under different path... hmm, yeah I'm not sure if this is worth it all just to avoid having a separate source of state...)
What I'm thinking:
I think if we go the http server route, the registry can just live entirely in memory, which should avoid a lot of cleanup since the editor session registry is always effectively cleared on restart.
- If we're really okay with orphan tabs, then we can key on
workspace_id
(which generalizes to multi-root workspaces, unlike keying on folder). If we assume people are opening the same workspaces, then we won't have to deal with garbage collection, because we are overwriting entries per-workspace. - Otherwise, if we're concerned about memory growing unbounded, we can do something like A) a periodic gc procedure, or B) only gc when a new session is registered, or C) when a new session is registered, attempt a gc just for sessions sharing the same workspace.
If you search for createStaticIPCHandle it seem to be used only in the electron case
Yup exactly, by native VS Code I mean the Electron version.
My thinking was that if all of these cases work as expected in Electron-based VS Code and we want parity then we could just copy what they are doing into the web version.
Buuuuut I completely misread what they do; I thought they were creating a static IPC handle per workspace and using that to open in the right workspace but I think they actually just always have the single socket and connect every instance to that, then I imagine they find the right instance in memory. Since the web always spins up a new socket on every connection (unfortunately) there is no way we can copy that without changing the architecture to match. 😢
We still have an orphan tab problem that any sort of registry with overwrites would cause.
No orphan tabs I think because we would append with a suffix if the socket is in use by another tab (so we never overwrite active sockets). We might have inactive sockets if VS Code is not cleaning them up but no orphaned tabs.
multi-root workspaces
Ahh excellent point! The symlink idea is quite clever but yeah I agree with abandoning the file system idea; I was wrong about Electron-based VS Code doing the same thing which was the only reason I even brought it up. 🤦
All things considered HTTP seems like the best path forward to me.
If we're really okay with orphan tabs
From reading your use cases I feel convinced we should avoid orphaning tabs. One alternative idea to GC is to remove the socket when the extension host terminates, maybe through a DELETE endpoint. We could put the API calls in https://github.com/microsoft/vscode/blob/9ff83d41a3268ec578c032ae54102285b447e947/src/vs/workbench/api/node/extensionHostProcess.ts in the initialize
and terminate
functions maybe.
Hmm one thing to consider is that the extension host will need to know where to send the HTTP request which I suppose means we need to send the port or socket that code-server is listening on down to the extension host.
But if we have to send a message down anyway, maybe we could send a path for it to register its socket on (rather than have it generate one randomly) then we would not need the extension host to make any HTTP calls back since we already know the socket it will use. When the host process exits we can remove the socket.
Or alternatively we could post messages back up to code-server through process.send()
.
Ah wait no we could not send down the socket path because we need to know all the workspaces attached to it as well. Although we could figure this out by checking the folder
and workspace
query params as they should always be set. That might be pretty chill.
And I think sending the port/socket would not work either because it might be authenticated and the extension host will not have credentials. We could probably come up with something for that but maybe it would be easier to send the IPC handle and workspaces up through process.send()
instead.
A separate non-authenticated socket just for this could work although I suppose an extra socket is somewhat unfortunate.
We could add a new command to VS Code's socket that replies with its workspaces and get them that way but that seems like a lot to me.
Hmm one thing to consider is that the extension host will need to know where to send the HTTP request which I suppose means we need to send the port or socket that code-server is listening on down to the extension host.
(削除) I was thinking of just generating a UDS socket with a filename based the value of CODE_SERVER_PARENT_PID, which I checked was available in the extension host. Something like code-server-ipc-[PID].sock
. (削除ここまで)
Hmm, why don't we just do something like $TMPDIR/code-server-ipc.sock
, assuming that we only need to support a single daemonized instance at a time?
One alternative idea to GC is to remove the socket when the extension host terminates, maybe through a DELETE endpoint. We could put the API calls in https://github.com/microsoft/vscode/blob/9ff83d41a3268ec578c032ae54102285b447e947/src/vs/workbench/api/node/extensionHostProcess.ts in the initialize and terminate functions maybe.
Oooh, that's not a bad idea. I like the simplicity of it, though it does require yet another patch, for arguably a pretty non-critical piece of functionality.
- Periodic GC is slightly icky (i.e. having to reason about some periodic process).
- Per-workspace GC on workspace open means needing to additionally maintain a n:1 relation of session to workspace id (on top of the existing n:m relation of session to folder path).
Let me take a first pass with the HTTP server and I'll try your suggestion with the DELETE endpoint.
Yeah we can add another socket although we may want to avoid $tmpdir
and do something user-based to avoid users being able to interfere with other users' instances.
Come to think of it, I guess we actually have to add another socket with a predictable file path so the external CLI can interact with it otherwise we would have to go back to storing a list of sockets in a file which is the whole thing we are trying to avoid hahaha
f678836
to
a3bee4d
Compare
2554c19
to
15156a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally got around to finishing this!
Now a daemonized instance will spin up an additional http server at $TMPDIR/code-server.sock
.
- The extension host connects to it to register sessions on creation and deregister on termination. No garbage collection needed :)
- The CLI connects to it to get the best matching active socket
Yeah we can add another socket although we may want to avoid
$tmpdir
and do something user-based to avoid users being able to interfere with other users' instances.
Both the extension host's vscode-ipc-XXX.sock
files and the old vscode-ipc
file use $TMPDIR, so perhaps this could be done a in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you're aware of this but there are maximum unix socket path lengths of ~100 chars, so I had to shorten the test name to get this working on MacOS.
Without shortening it, there's some really wonky behavior where it thinks the socket already exists, which seems to be caused by the fact we're creating two sockets now.
Spent 2/3 hours debugging before I realized the socket path lengths were the issue. I'd suggest that we error out explicitly if the socket path lengths exceed the OS maximum, or at least shorten the test tmpdir paths, lest some poor developer come across this silent failure mode again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow that is wild. Good to know! Erroring out seems very wise to me. Sounds like the perfect nightmare of a bug.
Absolutely stellar! I will review the changes first thing tomorrow. Thank you sticking through all the back and forth!
Both the extension host's vscode-ipc-XXX.sock files and the old vscode-ipc file use $TMPDIR, so perhaps this could be done a in a separate PR?
That makes perfect sense! 👍
One thing we will have to do is update VS Code to 1.78.2 and make sure the patches still apply since mainline was updated to 1.78.2 somewhat recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rebased it to the latest version on main (with 1.78.2), works so far for me!
Codecov Report
Additional details and impacted files@@ Coverage Diff @@ ## main #6191 +/- ## ========================================== + Coverage 72.61% 73.54% +0.93% ========================================== Files 30 31 +1 Lines 1749 1860 +111 Branches 387 399 +12 ========================================== + Hits 1270 1368 +98 - Misses 402 415 +13 Partials 77 77
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the linter wants something formatted then we are good to go!
For some reason when I run yarn watch
I still get no folders but I downloaded the build from CI and it works perfectly fine for me so I figure it is something weird on my machine.
Signed-off-by: Sean Lee <freshdried@gmail.com>
Cool, fixed the formatting issue!
Thanks again!
Signed-off-by: Sean Lee <freshdried@gmail.com>
Uh oh!
There was an error while loading. Please reload this page.
Fixes #5709
readSocketPath
is now replaced byVscodeSocketResolver
which will prefer editor sessions that match the files being opened.How it works:
vscode-ipc
.vscode-ipc
vscode-ipc
vscode-ipc
with those sockets deleted. This also serves as a mechansim to prevent the file from growing unbounded.Notes:
Future improvements:
/foo/bar/
beats/foo/
given/foo/bar/baz.txt
). This was left out, but should not be too hard to add.