Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Better connection lost handling #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
leob merged 2 commits into georgeboot:master from Adesin-fr:master
Nov 27, 2024
Merged

Better connection lost handling #33

leob merged 2 commits into georgeboot:master from Adesin-fr:master
Nov 27, 2024

Conversation

@Adesin-fr
Copy link
Contributor

@Adesin-fr Adesin-fr commented Nov 27, 2024

There are some issues with the current version :

  • Before this, if there were no internet connection on first connect, it would fail and did not retry. That use-case is now handled.
  • After a connection failure, channels were not re-subscribed. That is now fixed.
  • The "Ping" interval was triggered everytime we did tried to connect, even without success. So if we tried to connect 40x, we would have 40 "setInterval" triggered, and 40x ping/minute , instead of one !

That's a bit more production-ready now :)

leob reacted with thumbs up emoji
- Before this, if there were no internet connection on first connect, it would fail and did not retry. That use-case is now handled.
- After a connection failure, channels were not re-subscribed. That is now fixed.
- The "Ping" interval was triggered everytime we did tried to connect, even without success. So if we tried to connect 40x, we would have 40 "setInterval" triggered, and 40x ping/minute , instead of one !
That's a bit more production-ready now :)
Copy link
Contributor Author

@leob I think you should have a look at this ;)
Since I'm using this in a 24/24 connected machine, I had to check it would survive rude conditions.
And it seems some of theme weren't ready enough for prod.
So I fixed them !

My WIFI ON/OFF switch had a hard time, being switched back-forth for a while ;)

leob reacted with thumbs up emoji

Copy link
Collaborator

leob commented Nov 27, 2024

@leob I think you should have a look at this ;) Since I'm using this in a 24/24 connected machine, I had to check it would survive rude conditions. And it seems some of theme weren't ready enough for prod. So I fixed them !

My WIFI ON/OFF switch had a hard time, being switched back-forth for a while ;)

@Adesin-fr Thanks, will check it out !

@leob leob merged commit 5474b52 into georgeboot:master Nov 27, 2024
2 of 4 checks passed
Copy link
Collaborator

leob commented Nov 27, 2024
edited
Loading

@Adesin-fr Okay, I've merged this PR and published a new version to npmjs.

Then I went on to create a new build of our app, and deploy and test it - but unfortunately it looks like the web sockets functionality in our app is more or less broken now, I'm not seeing any "notifications" (messages) being delivered over our private channels anymore - nothing is coming through ...

Does it work for you?

What I noticed is that I seem to be seeing fewer log messages than I would expect - here's a partial screenshot of my JS console:

image

I do see:

"just set socketId ..."
"Subscribing to ..."
"Received event ..."
"Sending ping"

but I'm not seeing:

"Trying to connect..."
"Connected !"
"... onmessage ..."

Not saying that I should be seeing all of these, but shouldn't I be seeing at least some of them?

Can you debug this a bit? From the testing that I did with our app this seems to have more or less fallen apart now - maybe the solution turns out to be something very simple, or even something which we need to change in our app, but right now I don't really have a clue (but I admit that I didn't really debug it yet either).

(by the way - the only way that I know of to debug this stuff is to go through a full AWS/lamba/app deployment, which makes iterating/debugging this really slow and painful ... not sure if there's a "better" way, I guess not)

I also wonder if our "Websocket" implementation needs this much "specific" code, while all we really do is piggyback on top of the standard Laravel Echo protocol - we might be reinventing the wheel a bit here, although I also don't know if there's a better way.

P.S. if you want to work on this, please pull master first - I've cleaned up the code a little bit by adding semicolons where they were missing - the JS code in this project was consistently using semicolons, but we now ended up with a mix of "semicolon" and "no semicolon" - I've added the semicolons where they were missing, please let's keep it consistent ...

Another thing I did was to use a "logging prefix" for the console.log statements - I'm now putting the string [AG-WS] (short for "Api Gateway Web Sockets") in front of the log messages, so that I can see which log messages originate from this lib, rather than from our app ...

Copy link
Contributor Author

Adesin-fr commented Nov 27, 2024
edited
Loading

First, thanks for having merged this.

That's strange since I don't think it should not break private channels ...

To see "trying to connect" messages and others , you must enable debug mode on the Echo instruction (remember the last dockbloc of the readme ?)

I'll have a look at this with private channels later today :)

PS : have you checked the auth requests on your laravel app ? They must succeed and give back an auth key, that is then used for the websocket things...
If your auth requests are failing (and not giving back auth key), the private features will fail silently.
I remember having a hard time with this , I'll dig a bit more later, I have another project that also use private channels and this lib !

Copy link
Collaborator

leob commented Nov 27, 2024

First, thanks for having merged this.

That's strange since I don't think it should not break private channels ...

To see "trying to connect" messages and others , you must enable debug mode on the Echo instruction (remember the last dockbloc of the readme ?)

I'll have a look at this with private channels later today :)

@Adesin-fr Yes, I did enable the debug option, otherwise I wouldn't have seen the messages which I saw ... so, I do see some messages, but I'm not seeing all of the messages that I would expect to see :)

Thanks for looking at it! If you can't find anything then let me know, and I'll put a bit more effort into debugging it ...

Copy link
Collaborator

leob commented Nov 27, 2024
edited
Loading

@Adesin-fr P.S. please don't forget to pull master before you start working on it - I've pushed a commit 1 minute ago ...

By the way, I still see some Github jobs failing, e.g. one of the Jest tests - don't know whether that's relevant or not ... but for now I'd say the failing Github jobs are low priority :)

Copy link
Contributor Author

I've tried to have a look.
I'd like to push my project to a staging env with the 0.5.2 version, but it missed axios (seems you removed it some commits ago !!)

In my project, I'm seeing the "trying to connect messages and every others.
Are you sure you correctly imported Echo in your project ?

Copy link
Collaborator

leob commented Nov 28, 2024
edited
Loading

I've tried to have a look. I'd like to push my project to a staging env with the 0.5.2 version, but it missed axios (seems you removed it some commits ago !!)

In my project, I'm seeing the "trying to connect messages and every others. Are you sure you correctly imported Echo in your project ?

@Adesin-fr I've merged your new PR to bring Axios back - I saw that orginally the Axios import wasn't there, then you added it in your previous PR, and I then deleted it again - but I don't know why I deleted it ... no, actually I do know why now - it was because of a build error which I was getting - see comment in the other PR ...

But okay, I don't know why you do see those log messages, while I don't see them!

I'm going to debug the problem, I think I'm going to add more log statements, including in the PHP code, so that I can see exactly what's going on ... probably not today, but maybe tomorrow - work in progress!

P.S. we do have "echo" in our app's package.json:

"laravel-echo": "^1.14.0"

but it's a different (slightly newer) version than the one required by laravel-echo-api-gateway (which has "laravel-echo": "^1.10.0") ...

Copy link
Collaborator

leob commented Dec 7, 2024

@Adesin-fr I was trying to debug my websocket issues - what was just baffling was that when I opened our app in two browsers (Chrome and Firefox) and logged in in both browser windows, I could send a message (via websockets) from "Chrome" to "Firefox", but not the other way around (from "Firefox" to "Chrome") !

Debugging it was an ordeal (I had a hard time to even log anything to CloudWatch), but what I did see was this - a "410 Gone" exception from API Gateway:

[2024年12月07日 10:57:01] DEBUG: Preparing to send to connections for channel .........
[2024年12月07日 10:57:01] DEBUG: Sending message to connection 'CascQeh7oAMCJLw='
[2024年12月07日 10:57:01] DEBUG: Sending message to connection 'K8gVIdmDIAMCLIA='
[2024年12月07日 10:57:02] ERROR: MESSAGE ###### MessageService@sendNotification - exception - error: Error executing "PostToConnection" on "https://..............."; AWS HTTP error: Client error: `POST https://...........` resulted in a `410 Gone` response Unable to parse error information from response - Error parsing JSON: Syntax error[#0 /var/task/vendor/aws/aws-sdk-php/src/WrappedHttpHandler.php(98): Aws\WrappedHttpHandler->parseError()

So, sending the message to the first connection worked, but sending it to the second connection failed with a "410 Gone" ...

And that gave me an idea - "let's look in the Connections table in DynamoDB" - and bingo! When I cleared that table, things started to work ...

Pretty clear that there are sometimes "stale" connections left behind in that table - which is strange, because we DO have this piece of code in the catch block in ConnectionRepository::sendMessage:

if ($e->getAwsErrorCode() === 'GoneException') {
 $this->subscriptionRepository->clearConnection($connectionId);
 return;
}

So the idea is that in case of an HTTP 410 Gone" it SHOULD remove that connection from the DynamoDB table ... strange that that doesn't always seem to work.

This still looks like a potential weakness - the fact that the Connections table can contain "stale" connections ... I mean, in theory "it should just work", but it feels somewhat fragile (potentially).

P.S. I was also struggling with getting any logging to work - I cloned the repo (laravel-api-gateway) and then I made changes locally (I added log statements in the PHP code), and I linked to the local directory in my app's composer.json, like this:

 "repositories": [
 {
 "type": "path",
 "url": "../laravel-echo-api-gateway"
 }
 ],

and then with:

"georgeboot/laravel-echo-api-gateway": "dev-master",

but it seemed to just refuse to pick up the dependency from my local folder - I did NOT get to see any of the logging that I added in the PHP code - somehow it did not seem to use the "local" version, but the version from Github ... ?

And to top it all off I had issues with the AWS S3 file adapter because of the upgraded aws-sdk-php version - it threw this exception when trying to fetch a file from S3:

Invalid configuration value provided for "token". Expected Aws\Token\TokenInterface|Aws\CacheInterface|array|bool|callable, but got string

Apparently that requires a PHP fix/upgrade:

aws/aws-sdk-php#2567 (comment)
laravel/framework#44979

Copy link
Contributor Author

Well I did not yet experienced the 410 errors !

We'll surely need to address it soon !

leob reacted with thumbs up emoji

Copy link
Collaborator

leob commented Dec 7, 2024
edited
Loading

Well I did not yet experienced the 410 errors !

We'll surely need to address it soon !

@Adesin-fr I've seen these "410" errors more than once in our app, but today I finally found out what causes it: "stale" connections in the DynamoDB "Connections" table! That should never happen, but apparently it did ... clearing the table solved the issue, but if it happens again then we should indeed address it.

Copy link
Collaborator

leob commented Dec 7, 2024
edited
Loading

Well I did not yet experienced the 410 errors !

We'll surely need to address it soon !

@Adesin-fr Look at this change from another PR (#25):

image

Maybe this change makes it more robust - checking for

e->getStatusCode() === Response::HTTP_GONE || $e->getAwsErrorCode() === 'GoneException'

instead of ONLY for e->getAwsErrorCode() === 'GoneException' ...

This could be a reason why in some cases the "stale" connection isn't getting cleared ... I've added this to the latest release.

Copy link
Collaborator

leob commented Dec 7, 2024
edited
Loading

@Adesin-fr So, after upgrading my app to the latest version of laravel-echo-api-gateway (with the import axios statement), I was getting this:

image

But, it was easy to fix - I had to remove this line from my frontend app's initialization code:

window.axios.defaults.headers.common['Authorization'] = 'Bearer ' + token;

and instead I had to pass the bearerToken option to Echo:

window.Echo = new Echo({
 authEndpoint: ....................,
 bearerToken: token,
 broadcaster,
 host,
 debug: isStaging === true
});

Works now! And we're no longer using yarn, only npm ... productive day today :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

AltStyle によって変換されたページ (->オリジナル) /