Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Implement DTLS restart #1846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Antonito wants to merge 2 commits into master
base: master
Choose a base branch
Loading
from fix/issue-#1636
Open

Implement DTLS restart #1846

Antonito wants to merge 2 commits into master from fix/issue-#1636

Conversation

@Antonito
Copy link
Member

@Antonito Antonito commented Jun 26, 2021
edited
Loading

pion/srtp#148 is needed before being able to merge

(削除) The unit test needs pion/transport#141 (削除ここまで)

Fixes #1636

(削除) I'm not sure about if t.state != DTLSTransportStateNew && t.state != DTLSTransportStateClosed, one other option would be to set the state to New before restarting, but I don't want to trigger a StateChanged event – which could be wrong. (削除ここまで)
Turns out that's what libwebrtc does

Also the condition on which DTLS is restarted might be too permissive – but it doesn't seem to break any test so far.

Copy link

codecov bot commented Jun 26, 2021
edited
Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.98%. Comparing base (1765e9b) to head (ce2fd9b).
⚠️ Report is 589 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (1765e9b) and HEAD (ce2fd9b). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (1765e9b) HEAD (ce2fd9b)
go 2 0
Additional details and impacted files
@@ Coverage Diff @@
## master #1846 +/- ##
==========================================
- Coverage 76.89% 69.98% -6.92% 
==========================================
 Files 87 64 -23 
 Lines 8937 3258 -5679 
==========================================
- Hits 6872 2280 -4592 
+ Misses 1647 860 -787 
+ Partials 418 118 -300 
Flag Coverage Δ
go ?
wasm 69.98% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@Antonito This is absolutely amazing work, your PRs are the best I have seen in months. Gives me the energy to get back into it :)

I think we could make the test a lot simpler.

  • Create two PeerConnections and connect them.
  • Send one RTP packet
  • Create a new PeerConnection and generate an offer, and send it to one of the PeerConnections in the first flow.
  • Send one RTP packet

I think that gets us the coverage we need and keeps the LoC low still? Push back if I am wrong!

Antonito reacted with heart emoji

Copy link
Member Author

@Antonito This is absolutely amazing work, your PRs are the best I have seen in months. Gives me the energy to get back into it :)

I think we could make the test a lot simpler.

  • Create two PeerConnections and connect them.
  • Send one RTP packet
  • Create a new PeerConnection and generate an offer, and send it to one of the PeerConnections in the first flow.
  • Send one RTP packet

I think that gets us the coverage we need and keeps the LoC low still? Push back if I am wrong!

So, I tried to rewrite the unit test in a simpler way, and it uncovered several problems. I'm not sure if there are caused by the DTLS restart or were already there (I'm thinking the later, for most of them).

  • Sending a single RTP packet doesn't trigger an OnTrack event on the other peer. Sending two in a row doesn't either, but sending 1 / waiting a few ms / sendign another one works.
  • When ran multiple times in a row, the test sometimes logs pc ERROR: 2021年06月28日 13:30:33 Incoming unhandled RTP ssrc(1595441956), OnTrack will not be fired. single media section has an explicit SSRC. I initially thought this was a difference in behavior with the test I already pushed (since with your method, the first peer connection doesn't go to disconnected or failed before the other one connects), but turns out it's also happening with this one – I just had lucky runs. It doesn't seem to be a race condition, tsan didn't detect anything.
  • I found out weird behaviors with DataChannels
    • I added a call to CreateDataChannel, behavior unchanged (as expected)
    • I then set an empty OnDataChannel callback -> this causes the test to sometimes hang (again I'm not doing anything more with DataChannels here)
    • I then tried to send data from ClientB and read with ClientA1 and ClientA2, sometime it hangs, but when it doesn't:
      • As expected, ClientA1 can read
      • After ClientA2 connects, ClientB tries to send data – which causes a io: read/write on closed pipe error log
      • ClientA2 can't read, since ClientB can't send anything (this is most likely an issue with DTLS restart)

Copy link
Member Author

I rewrote the unit test without pion/transport, and split it in two very similar tests, for debugging purpose:

  • TestPeerConnection_DTLS_Restart_MediaOnly
  • TestPeerConnection_DTLS_Restart_MediaAndDataChannel

Before merging, we should only keep TestPeerConnection_DTLS_Restart_MediaAndDataChannel.

SCTP restart is now implemented and DataChannels are re-opened, if needed.

There seems to be some kind of race condition, but I really don't think it's related to the stuff this PR modifies – I think the unit tests just show an already existing problem.
It's what I was describing above, and now have a bit more validation of what's really happening.

The TestPeerConnection_DTLS_Restart_MediaOnly behaves correctly, in go test. To make it work with tsan, I had to add an extra:

triggerMedia()

I think this is a consequence of the first point I was developing in my previous message (about RTP packets).

TestPeerConnection_DTLS_Restart_MediaAndDataChannel behaves correctly with go test, but hangs with tsan. I also tried to add a few extra RTP packets, but it doesn't change any thing. Somehow, the hang seems to be caused by the simple fact of calling CreateDataChannel and OnDataChannel.

Copy link
Member Author

Antonito commented Mar 4, 2022

Following up discussion with @davidzhao, I updated both this PR and pion/srtp#148.

In order to get some progress on this, I think we should first focus on merging pion/srtp#148 – we need to figure out @Sean-Der 's concerns about rolloverCounter and lastSequenceNumber. I was careful and really don't think this one introduces any security issue, but wouldn't mind a second opinion :)

As for this PR, it requires a bit more work, there are still issues with TestPeerConnection_DTLS_Restart_MediaAndDataChannel, and I'm far less confident about security issues here. Unfortunately I don't have much time to dedicate to this for now, I'd gladly welcome any help 😁

davidzhao reacted with thumbs up emoji

Copy link
Member

Thanks for picking this back up @Antonito! We'd love to collaborate on this. Will have a look in a few weeks.

Antonito reacted with rocket emoji

Copy link
Member

hey @Antonito, do you still want to work together to get this in?

@Sean-Der Sean-Der added this to the V4.1.0 milestone Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

V4.1.0

Development

Successfully merging this pull request may close these issues.

DTLS should be restarted when the remote DTLS fingerprint has been changed or the tls-id has changed during applying an answer

AltStyle によって変換されたページ (->オリジナル) /