-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement DTLS restart #1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DTLS restart #1846
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@ ## master #1846 +/- ## ========================================== - Coverage 76.89% 69.98% -6.92% ========================================== Files 87 64 -23 Lines 8937 3258 -5679 ========================================== - Hits 6872 2280 -4592 + Misses 1647 860 -787 + Partials 418 118 -300
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@Antonito This is absolutely amazing work, your PRs are the best I have seen in months. Gives me the energy to get back into it :)
I think we could make the test a lot simpler.
- Create two PeerConnections and connect them.
- Send one RTP packet
- Create a new PeerConnection and generate an offer, and send it to one of the PeerConnections in the first flow.
- Send one RTP packet
I think that gets us the coverage we need and keeps the LoC low still? Push back if I am wrong!
@Antonito This is absolutely amazing work, your PRs are the best I have seen in months. Gives me the energy to get back into it :)
I think we could make the test a lot simpler.
- Create two PeerConnections and connect them.
- Send one RTP packet
- Create a new PeerConnection and generate an offer, and send it to one of the PeerConnections in the first flow.
- Send one RTP packet
I think that gets us the coverage we need and keeps the LoC low still? Push back if I am wrong!
So, I tried to rewrite the unit test in a simpler way, and it uncovered several problems. I'm not sure if there are caused by the DTLS restart or were already there (I'm thinking the later, for most of them).
- Sending a single RTP packet doesn't trigger an
OnTrackevent on the other peer. Sending two in a row doesn't either, but sending 1 / waiting a few ms / sendign another one works. - When ran multiple times in a row, the test sometimes logs
pc ERROR: 2021年06月28日 13:30:33 Incoming unhandled RTP ssrc(1595441956), OnTrack will not be fired. single media section has an explicit SSRC. I initially thought this was a difference in behavior with the test I already pushed (since with your method, the first peer connection doesn't go todisconnectedorfailedbefore the other one connects), but turns out it's also happening with this one – I just had lucky runs. It doesn't seem to be a race condition, tsan didn't detect anything. - I found out weird behaviors with DataChannels
- I added a call to
CreateDataChannel, behavior unchanged (as expected) - I then set an empty
OnDataChannelcallback -> this causes the test to sometimes hang (again I'm not doing anything more with DataChannels here) - I then tried to send data from
ClientBand read withClientA1andClientA2, sometime it hangs, but when it doesn't:- As expected, ClientA1 can read
- After ClientA2 connects, ClientB tries to send data – which causes a
io: read/write on closed pipeerror log - ClientA2 can't read, since ClientB can't send anything (this is most likely an issue with DTLS restart)
- I added a call to
b0d56c6 to
0d1e50c
Compare
I rewrote the unit test without pion/transport, and split it in two very similar tests, for debugging purpose:
- TestPeerConnection_DTLS_Restart_MediaOnly
- TestPeerConnection_DTLS_Restart_MediaAndDataChannel
Before merging, we should only keep TestPeerConnection_DTLS_Restart_MediaAndDataChannel.
SCTP restart is now implemented and DataChannels are re-opened, if needed.
There seems to be some kind of race condition, but I really don't think it's related to the stuff this PR modifies – I think the unit tests just show an already existing problem.
It's what I was describing above, and now have a bit more validation of what's really happening.
The TestPeerConnection_DTLS_Restart_MediaOnly behaves correctly, in go test. To make it work with tsan, I had to add an extra:
webrtc/peerconnection_media_test.go
Line 1329 in 0d1e50c
I think this is a consequence of the first point I was developing in my previous message (about RTP packets).
TestPeerConnection_DTLS_Restart_MediaAndDataChannel behaves correctly with go test, but hangs with tsan. I also tried to add a few extra RTP packets, but it doesn't change any thing. Somehow, the hang seems to be caused by the simple fact of calling CreateDataChannel and OnDataChannel.
0d1e50c to
a10ddec
Compare
a10ddec to
13df7ae
Compare
bb27b83 to
2e55618
Compare
2e55618 to
ce2fd9b
Compare
Following up discussion with @davidzhao, I updated both this PR and pion/srtp#148.
In order to get some progress on this, I think we should first focus on merging pion/srtp#148 – we need to figure out @Sean-Der 's concerns about rolloverCounter and lastSequenceNumber. I was careful and really don't think this one introduces any security issue, but wouldn't mind a second opinion :)
As for this PR, it requires a bit more work, there are still issues with TestPeerConnection_DTLS_Restart_MediaAndDataChannel, and I'm far less confident about security issues here. Unfortunately I don't have much time to dedicate to this for now, I'd gladly welcome any help 😁
Thanks for picking this back up @Antonito! We'd love to collaborate on this. Will have a look in a few weeks.
hey @Antonito, do you still want to work together to get this in?
Uh oh!
There was an error while loading. Please reload this page.
pion/srtp#148 is needed before being able to merge
(削除) The unit test needs pion/transport#141 (削除ここまで)Fixes #1636
(削除) I'm not sure aboutif t.state != DTLSTransportStateNew && t.state != DTLSTransportStateClosed, one other option would be to set the state toNewbefore restarting, but I don't want to trigger a StateChanged event – which could be wrong. (削除ここまで)Turns out that's what libwebrtc does
Also the condition on which DTLS is restarted might be too permissive – but it doesn't seem to break any test so far.