Search code, repositories, users, issues, pull requests...

patches/store-socket.diff Show resolved Hide resolved

@sleexyz sleexyz force-pushed the open branch from 8680610 to f50ab99 Compare

May 5, 2023 22:21

@sleexyz sleexyz marked this pull request as ready for review

May 5, 2023 22:24

@sleexyz sleexyz requested a review from a team as a code owner

May 5, 2023 22:24

@sleexyz sleexyz force-pushed the open branch from f50ab99 to ea70a42 Compare

May 8, 2023 18:17

code-asher reviewed

May 8, 2023

Copy link

Member

@code-asher code-asher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution and nice work!

The test failures seem to be unrelated to your changes. Not sure what is going on there.

patches/store-socket.diff Outdated Show resolved Hide resolved

src/node/vscodeSocket.ts Outdated Show resolved Hide resolved

src/node/vscodeSocket.ts Show resolved Hide resolved

test/unit/node/vscodeSocket.test.ts Outdated Show resolved Hide resolved

patches/store-socket.diff Show resolved Hide resolved

code-asher reviewed

May 8, 2023

src/node/main.ts Outdated Show resolved Hide resolved

@sleexyz sleexyz force-pushed the open branch from ea70a42 to 36c4816 Compare

May 10, 2023 15:32

Copy link

Contributor Author

sleexyz commented May 10, 2023 •

edited

Loading

@code-asher Here's a product question: If two tabs for the same workspace are opened, and if the later-opened tab is closed, should we open files in the first tab? Or are we okay with the first tab being orphaned?

To rephrase in pseudocode:

tab1 = openEditor("/foo/")
tab2 = openEditor("/foo/")
tab2.close()
openFile("/foo/README.md") // Should this error, or open in tab1?

Copy link

Contributor Author

sleexyz commented May 10, 2023

Another product question: We should support editor resolution across multiple daemonized code-server instances, right? Or is it sufficient to assume a single a daemonized code-server instance?

Copy link

Contributor Author

sleexyz commented May 10, 2023 •

edited

Loading

Also I think appending like this will cause us to endlessly accumulate socket paths? I know we have the deletion code but that only deletes until it finds a good socket and since the good socket will always be the latest we will never clean up the old ones. Plus that code only runs when you are trying to open a file via the CLI which some folks may never do. Changing how we write here would more or less remove the need to clean up there.

Yes, you're right, the PR right now has the issue of only deleting when opening files via CLI.

If we don't want orphan tabs (#6191 (comment)), we cannot overwrite an entry based on some key (i.e. workspace id) as a mechanism to limit the growth of entries, as each session needs to be stored independently.

AFAIK this means we need to perform some sort of explicit "garbage collection" procedure to limit the growth of entries.

Here's my thought process into how to do this, which builds up to what I'm ultimately proposing (3).

1) Garbage collect on startup

Suppose we "garbage collect" when a daemonized code-server instance starts up: iterate through all sockets and delete dead ones.

This still leaves the possibility for entries to grow unbounded if only a single code-server instance runs and never stops.

2) Periodic garbage collection

Same as (1), but we garbage collect periodically as to limit entries from growing unbounded given a single code-server instance.

Note that if n daemonized code-server instances are running, we have to live with n times more garbage collections.
Read-write races are okay here since two garbage collections should return compatible if not same results.

3) Periodic garbage collection by a single instance

If we're not okay with n-times more garbage collections, we can allow at most a single instance to own responsibility of garbage collection (a "leader"):

Same as (2), but upon garbage collection, we write a pid, timestamp pair to vscode-ipc that indicates which instance performed the GC, and when it happened.

All instances will periodically check the vscode-ipc file:

If the PID of the instance matches the pid of the pid, timestamp pair, we garbage collect
Otherwise:
- If the timestamp is beyond a certain threshold (i.e. 2.5x the gc period), we assume that the leader is down. And so we garbage collect, changing the leader.
- Otherwise, we skip garbage collection.

@code-asher WDYT?

Copy link

Contributor Author

sleexyz commented May 10, 2023

There is also another possibility where we use an HTTP server instead of a file to manage editor sessions. This could be served by a daemonized code-server instance, and run under a dedicated UDS socket.

Editor registration: Editor calls a POST handler at the socket with a JSON body.
Editor resolution: CLI calls a GET handler.

This way, because a single server handles registration, we never have to deal with races. Also, we can just use JSON instead of new-line delimited JSON. Also, we have an API :)

To account for multiple daemonized code-server instances, we can serve under a single instance, using the mechanism detailed in #6191 (comment) to select the serving instance (and garbage-collecting instance).

Copy link

Member

code-asher commented May 11, 2023

Here's a product question: If two tabs for the same workspace are opened, and if the later-opened tab is closed, should we open files in the first tab? Or are we okay with the first tab being orphaned?

Interesting scenario! All else being equal opening in the first tab does seem like the better option but opening the same workspace multiple times at once seems like an unusual workflow to me (maybe there is a use case I am missing though) so I personally would not be opposed to having it be orphaned if that makes things easier for us.

We should support editor resolution across multiple daemonized code-server instances, right? Or is it sufficient to assume a single a daemonized code-server instance?

We have generally been operating under the assumption that there is a single code-server daemon per user. I know some people run one instance per project/workspace but that is not how it was meant to be used. That does make me realize we might want to use ~/.local/share or something instead of /tmp though as multiple users could be writing their own sockets to it.

These are very well thought-out options! If we end up deciding a tab can be orphaned if a second tab with that same workspace is opened then we could do this file-based (named by workspace ID or something) that way we would not need any garbage collection. Actually that would eliminate a type of theoretical race the garbage collection system has: if a tab is opened as garbage collection is running after the collector reads but before it writes.

code-server is meant to be long-lived so I would avoid only garbage collecting on startup and since we expect one instance per user I think it will not be worth the effort to do any coordination between multiple instances.

I think the API idea is brilliant. Come to think of it, we actually already use this pattern where VS Code checks for updates against an /update endpoint that the daemon serves so there is some prior art here.

We could probably skip the GET handler if we wanted, since it would be the same instance we could just call something directly. But maybe a GET handler would be nice if there are external tools that want to hook into this API or something.

The only other ideas I have are to do some kind of file locking or store the socket paths via VS Code's storage provider which presumably does its own file locking. The advantage to one of those is that the change would be more likely to be accepted upstream one day, but since this is already a feature specific to code-server going the API route seems reasonable to me.

Copy link

Member

code-asher commented May 11, 2023 •

edited

Loading

So I was trying to see how native VS Code does this and I think they do it by creating predictable socket names (see createStaticIPCHandle). For example we could patch the socket paths to be /tmp/vscode-ipc/<hashed directory>.sock. If it already exists we see if it is in use and if so we append -2 and so on until we find a free suffix to use (or instead of hashing we could have a nested directory structure so /tmp/vscode-ipc/my/workspace/ipc-2.sock or something).

Then when opening a file like /home/coder/foo/bar/baz we hash /home/coder/foo/bar, look for a match, hash /home/coder/foo, look for a match, and so on. If no match we select the most recently created socket.

Presumably VS Code cleans up their own sockets so we would not have to worry about that. Or even if they do not clean up their own sockets at least it would be an already existing problem rather than a regression.

Copy link

Contributor Author

sleexyz commented May 12, 2023

We have generally been operating under the assumption that there is a single code-server daemon per user. I know some people run one instance per project/workspace but that is not how it was meant to be used.

Great, not having to worry about multiple instances should dramatically simplify things :)

If we combine this assumption with going the server route, we can simplify things even more by keeping the session registry entirely in memory (instead of in a file). Once the daemon dies, all the sessions end up dying too, so we'll never have sessions from previous startups.

All else being equal opening in the first tab does seem like the better option but opening the same workspace multiple times at once seems like an unusual workflow to me (maybe there is a use case I am missing though)

If I reflect on my own usage of code-server, for most part I converge to one editor session per workspace. But then sometimes I accumulate enough Chrome windows and tabs to the point that spinning up a new editor session to a workspace I might already have opened is less work than finding my original tab.

Or suppose I accidentally open the workspace again in some tab. In either case, I would not expect opening and closing the later opened tab to affect the behavior of some previously opened tab.

Here's another thing to consider: once #1964 is resolved, you could imagine use cases popping up that involve hotlinking / bookmarking files, which probably only makes sense to open in new editor sessions (without something like a Chrome extension to foreground existing tabs, but I digress). For example, I currently use code-server to jot down notes in markdown in a single workspace, and once #1964 is resolved I'd want to be able to make quick ephemeral edits to certain files by just clicking into my bookmarks bar.

FWIW VSCode will open files in an earlier-opened tab if a later-opened tab is closed.

Copy link

Contributor Author

sleexyz commented May 12, 2023 •

edited

Loading

So I was trying to see how native VS Code does this and I think they do it by creating predictable socket names (see createStaticIPCHandle).

I traced through where the IPC path comes from in extHostExtensionService and it seems like it actually uses createRandomIPCHandle, which also makes sense if you look through the logic specified in the functions. If you search for createStaticIPCHandle it seem to be used only in the electron case.

For example we could patch the socket paths to be /tmp/vscode-ipc/<hashed directory>.sock. If it already exists we see if it is in use and if so we append -2 and so on until we find a free suffix to use (or instead of hashing we could have a nested directory structure so /tmp/vscode-ipc/my/workspace/ipc-2.sock or something).

That's an interesting idea, to use the filesystem itself as some sort of registry, where the paths are esssentially registry keys. And not have to maintain a separate source of state.

We still have an orphan tab problem that any sort of registry with overwrites would cause.

But even if we ignore that, we run into issues once we consider multi-root workspaces. For example, a workspace with two roots would have to be registered under two keys, aka two paths. And a socket can only live under one path.

(...unless we symlinked the socket under different path... hmm, yeah I'm not sure if this is worth it all just to avoid having a separate source of state...)

What I'm thinking:

I think if we go the http server route, the registry can just live entirely in memory, which should avoid a lot of cleanup since the editor session registry is always effectively cleared on restart.

If we're really okay with orphan tabs, then we can key on workspace_id (which generalizes to multi-root workspaces, unlike keying on folder). If we assume people are opening the same workspaces, then we won't have to deal with garbage collection, because we are overwriting entries per-workspace.
Otherwise, if we're concerned about memory growing unbounded, we can do something like A) a periodic gc procedure, or B) only gc when a new session is registered, or C) when a new session is registered, attempt a gc just for sessions sharing the same workspace.

Copy link

Member

code-asher commented May 15, 2023 •

edited

Loading

If you search for createStaticIPCHandle it seem to be used only in the electron case

Yup exactly, by native VS Code I mean the Electron version.

My thinking was that if all of these cases work as expected in Electron-based VS Code and we want parity then we could just copy what they are doing into the web version.

Buuuuut I completely misread what they do; I thought they were creating a static IPC handle per workspace and using that to open in the right workspace but I think they actually just always have the single socket and connect every instance to that, then I imagine they find the right instance in memory. Since the web always spins up a new socket on every connection (unfortunately) there is no way we can copy that without changing the architecture to match. 😢

We still have an orphan tab problem that any sort of registry with overwrites would cause.

No orphan tabs I think because we would append with a suffix if the socket is in use by another tab (so we never overwrite active sockets). We might have inactive sockets if VS Code is not cleaning them up but no orphaned tabs.

multi-root workspaces

Ahh excellent point! The symlink idea is quite clever but yeah I agree with abandoning the file system idea; I was wrong about Electron-based VS Code doing the same thing which was the only reason I even brought it up. 🤦

All things considered HTTP seems like the best path forward to me.

If we're really okay with orphan tabs

From reading your use cases I feel convinced we should avoid orphaning tabs. One alternative idea to GC is to remove the socket when the extension host terminates, maybe through a DELETE endpoint. We could put the API calls in https://github.com/microsoft/vscode/blob/9ff83d41a3268ec578c032ae54102285b447e947/src/vs/workbench/api/node/extensionHostProcess.ts in the initialize and terminate functions maybe.

Copy link

Member

code-asher commented May 15, 2023 •

edited

Loading

Hmm one thing to consider is that the extension host will need to know where to send the HTTP request which I suppose means we need to send the port or socket that code-server is listening on down to the extension host.

But if we have to send a message down anyway, maybe we could send a path for it to register its socket on (rather than have it generate one randomly) then we would not need the extension host to make any HTTP calls back since we already know the socket it will use. When the host process exits we can remove the socket.

Or alternatively we could post messages back up to code-server through process.send().

Copy link

Member

code-asher commented May 15, 2023 •

edited

Loading

Ah wait no we could not send down the socket path because we need to know all the workspaces attached to it as well. Although we could figure this out by checking the folder and workspace query params as they should always be set. That might be pretty chill.

And I think sending the port/socket would not work either because it might be authenticated and the extension host will not have credentials. We could probably come up with something for that but maybe it would be easier to send the IPC handle and workspaces up through process.send() instead.

A separate non-authenticated socket just for this could work although I suppose an extra socket is somewhat unfortunate.

We could add a new command to VS Code's socket that replies with its workspaces and get them that way but that seems like a lot to me.

Copy link

Contributor Author

sleexyz commented May 16, 2023 •

edited

Loading

Hmm one thing to consider is that the extension host will need to know where to send the HTTP request which I suppose means we need to send the port or socket that code-server is listening on down to the extension host.

(削除) I was thinking of just generating a UDS socket with a filename based the value of CODE_SERVER_PARENT_PID, which I checked was available in the extension host. Something like code-server-ipc-[PID].sock. (削除ここまで)

Hmm, why don't we just do something like $TMPDIR/code-server-ipc.sock, assuming that we only need to support a single daemonized instance at a time?

One alternative idea to GC is to remove the socket when the extension host terminates, maybe through a DELETE endpoint. We could put the API calls in https://github.com/microsoft/vscode/blob/9ff83d41a3268ec578c032ae54102285b447e947/src/vs/workbench/api/node/extensionHostProcess.ts in the initialize and terminate functions maybe.

Oooh, that's not a bad idea. I like the simplicity of it, though it does require yet another patch, for arguably a pretty non-critical piece of functionality.

Periodic GC is slightly icky (i.e. having to reason about some periodic process).
Per-workspace GC on workspace open means needing to additionally maintain a n:1 relation of session to workspace id (on top of the existing n:m relation of session to folder path).

Let me take a first pass with the HTTP server and I'll try your suggestion with the DELETE endpoint.

@sleexyz sleexyz force-pushed the open branch from 36c4816 to 9ead6d5 Compare

May 16, 2023 22:12

Copy link

Member

code-asher commented May 17, 2023 •

edited

Loading

Yeah we can add another socket although we may want to avoid $tmpdir and do something user-based to avoid users being able to interfere with other users' instances.

Copy link

Member

code-asher commented May 17, 2023

Come to think of it, I guess we actually have to add another socket with a predictable file path so the external CLI can interact with it otherwise we would have to go back to storing a list of sockets in a file which is the whole thing we are trying to avoid hahaha

@sleexyz sleexyz force-pushed the open branch 3 times, most recently from f678836 to a3bee4d Compare

May 29, 2023 19:24

@sleexyz sleexyz force-pushed the open branch 8 times, most recently from 2554c19 to 15156a4 Compare

May 30, 2023 20:17

sleexyz

sleexyz commented

May 30, 2023

Copy link

Contributor Author

@sleexyz sleexyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got around to finishing this!

Now a daemonized instance will spin up an additional http server at $TMPDIR/code-server.sock.

The extension host connects to it to register sessions on creation and deregister on termination. No garbage collection needed :)
The CLI connects to it to get the best matching active socket

Yeah we can add another socket although we may want to avoid $tmpdir and do something user-based to avoid users being able to interfere with other users' instances.

Both the extension host's vscode-ipc-XXX.sock files and the old vscode-ipc file use $TMPDIR, so perhaps this could be done a in a separate PR?

patches/store-socket.diff Show resolved Hide resolved

src/node/vscodeSocket.ts Show resolved Hide resolved

patches/store-socket.diff Outdated Show resolved Hide resolved

test/unit/node/app.test.ts

beforeAll(async () => {

mockLogger()

const testName = "unlink-socket"

Copy link

Contributor Author

@sleexyz sleexyz May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you're aware of this but there are maximum unix socket path lengths of ~100 chars, so I had to shorten the test name to get this working on MacOS.

Without shortening it, there's some really wonky behavior where it thinks the socket already exists, which seems to be caused by the fact we're creating two sockets now.

Spent 2/3 hours debugging before I realized the socket path lengths were the issue. I'd suggest that we error out explicitly if the socket path lengths exceed the OS maximum, or at least shorten the test tmpdir paths, lest some poor developer come across this silent failure mode again...

Copy link

Member

@code-asher code-asher May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow that is wild. Good to know! Erroring out seems very wise to me. Sounds like the perfect nightmare of a bug.

Copy link

Member

code-asher commented May 30, 2023

Absolutely stellar! I will review the changes first thing tomorrow. Thank you sticking through all the back and forth!

Both the extension host's vscode-ipc-XXX.sock files and the old vscode-ipc file use $TMPDIR, so perhaps this could be done a in a separate PR?

That makes perfect sense! 👍

Copy link

Member

code-asher commented May 31, 2023

One thing we will have to do is update VS Code to 1.78.2 and make sure the patches still apply since mainline was updated to 1.78.2 somewhat recently.

code-asher reviewed

May 31, 2023

src/node/app.ts Outdated Show resolved Hide resolved

src/node/app.ts Show resolved Hide resolved

src/node/vscodeSocket.ts Outdated Show resolved Hide resolved

test/unit/node/vscodeSocket.test.ts Show resolved Hide resolved

patches/store-socket.diff Outdated Show resolved Hide resolved

@sleexyz sleexyz force-pushed the open branch from 15156a4 to ca152e5 Compare

June 9, 2023 23:00

sleexyz

sleexyz commented

Jun 9, 2023

Continue to review full report in Codecov by Sentry.

Copy link

Contributor Author

@sleexyz sleexyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just rebased it to the latest version on main (with 1.78.2), works so far for me!

src/node/app.ts Outdated Show resolved Hide resolved

src/node/app.ts Show resolved Hide resolved

src/node/vscodeSocket.ts Outdated Show resolved Hide resolved

patches/store-socket.diff Outdated Show resolved Hide resolved

@codecov

Copy link

codecov bot commented Jun 12, 2023 •

edited

Loading

Codecov Report

Merging #6191 (ca152e5) into main (0703ef0) will increase coverage by 0.93%.
The diff coverage is 88.40%.

❗ Current head ca152e5 differs from pull request most recent head 192f579. Consider uploading reports for the commit 192f579 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@ Coverage Diff @@
## main #6191 +/- ##
==========================================
+ Coverage 72.61% 73.54% +0.93% 
==========================================
 Files 30 31 +1 
 Lines 1749 1860 +111 
 Branches 387 399 +12 
==========================================
+ Hits 1270 1368 +98 
- Misses 402 415 +13 
 Partials 77 77

Impacted Files	Coverage Δ
src/node/main.ts	`50.52% <50.00%> (+1.05%)`	⬆️
src/node/vscodeSocket.ts	`87.37% <87.37%> (ø)`
src/node/cli.ts	`90.56% <91.66%> (-0.42%)`	⬇️
src/node/app.ts	`100.00% <100.00%> (ø)`

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccb0d3a...192f579. Read the comment docs.

code-asher approved these changes

Jun 12, 2023

Copy link

Member

@code-asher code-asher left a comment •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the linter wants something formatted then we are good to go!

For some reason when I run yarn watch I still get no folders but I downloaded the build from CI and it works perfectly fine for me so I figure it is something weird on my machine.

@sleexyz sleexyz force-pushed the open branch from d81dcac to 80877c2 Compare

June 14, 2023 18:53


 Prefer matching editor sessions when opening files.

192f579

Signed-off-by: Sean Lee <freshdried@gmail.com>

@sleexyz sleexyz force-pushed the open branch from 80877c2 to 192f579 Compare

June 14, 2023 19:17

Copy link

Contributor Author

sleexyz commented Jun 14, 2023

Cool, fixed the formatting issue!

@code-asher code-asher merged commit fb73742 into coder:main

Jun 14, 2023