New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generateKeyframe() API #37
Conversation
I like the API! How does it behave with simulcast and SVC (K-SVC)? Can an key frame be requested for particular layer/stream (like the RTCP LRR)? |
Good question. I think it needs a layer argument (with default being "all layers"). |
@aboba is there a way in some SVC spec to refer consistently to a specific layer? |
@alvestrand related to e2ee stuff? IIRC a SRD/SLD also triggers a keyframe (or rather: encoder reset) |
What is this API needed for? |
@alvestrand The FIR message utilizes SSRC(s) to indicate the target(s) of the request. So the FIR is assumed to apply to all layers on each targeted SSRC, as well as to the SSRCs associated with other layers in the case of layered codecs. The LRR utilizes an SSRC, but also includes other fields such as the target temporal and spatial layers. Note the updated definition of "decoder refresh point" in RFC 8082. |
@murillo128 How it behaves will depend on how it is interpreted (e.g like an FIR or LRR). For an FIR, the encoder(s) associated with the target SSRC(s) will generate a refresh point. However, if a single encoder is generating simulcast, this will result in all of the simulcast streams generating a key frame, as described here. LRR behavior is described here. |
It would be good to understand the use case. A few additional questions:
|
@youennf the use case that came up this time is to make sure a keyframe is sent each time you change the encryption key on a group call. Perhaps this is better called "produceKeyframe"? |
This is for the sender, not the receiver. So it's not really a request, it's more like "generateKeyFrame". |
I see, so it might make sense for audio if codecs use some prediction. Maybe the promise should not reject then? An alternative solution is for the e2e JS transform module to state that it needs a key frame as part of the additional frame metadata given to the packetizer for each processed frame. That might be also useful for a non JS e2e module for which the key changes might not be visible to the script. |
Let me insist, please. What is this API needed for? And if the answer is e2ee, why? |
the e2ee key is exchanged in a different path than sfu signaling, so it may
be updated on the sender shortly after the new participant has joined and
the sfu jas already requested the key frame.
Updating the key on a key frame makes a lot of sense to me.
Anyway, I am always in favor of more APIs that allow lower level control of
the underlaying objects.
Do you foresee any way this API could cause any issue to existing code?
El jue., 14 may. 2020 18:17, Iñaki Baz Castillo <notifications@github.com>
escribió:
… Let me insist, please. What is this API needed for? And if the answer is
e2ee, why?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#37 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIFN43V35GF5LLGQSVQW43RRQKQDANCNFSM4M63A53A>
.
|
No, but neither I can figure out how your response explains what this API is needed for :) |
Forward and Post-Compromise Security requires that the e2ee keys are updated anytime a participant joins/leave the call. The key exchange happens async and on a different path than the webrtc signaling and media (by definition). So it may happen that when a new participant joins the SFU side request a key frame and the sender generates the e2ee encrypted frame with a key not known by the receiver, so it will be discarded. When the sender updates his sending key with the new key, it will send it in a non-key frame frame, so the receiver will be able to decrypt it, but not decode it. Receiver will re-request an key frame then, but due to sender and sfu policies, that new key frame could take some time to be generated. It the sender sends a key frame when the new e2ee key is in use, the time required for the new participant to display the video is minimized. Also, there are several optimizations (specially on authenticating and signing) that can be done on key frames, so it is always interesting to be able to control the key frame generation from the e2ee side. Hope it is clearer now. |
In particular in the context of insertable streams I don't care about when the next keyframe happens, no? I could request one to be generated and would rotate the key at the next keyframe that the transform sees. Now if there are two changes to the e2ee key in rapid succession... then what? I don't think resolving with a single timestamp works for simulcast where we have three different timestamp sequences. |
Do we need a requestKeyFrame on the receiver? Mainly to handle decrypt errors. |
Given the use case, this API might better fit with insertable streams.
Seems to make sense, for metadata as well in theory. That is a bit more tricky since this API is trying to influence the encoder of the other side, communication channel being potentially lossy. |
@alverstrand, given the usecase is insertable streams, should we move this issue to webrtc-insertable-streams repo? |
@alvestrand , I think we should resurrect this work. RTCRtpSender::replaceTrack workaround is not working reliably (seems like a race), and other workarounds (e.g. scaleResolutionBy toggle) are fundamentally flawed too. It seems that time is right for adding a first party API to request key frames on sender. |
@solmaks, can you share your usecase? Is it tied to using webrtc encoded transforms? |
Actually it is not. Scenario is dead simple - adjust send stream bitrate and trigger key frame right after. We do have a mechanism for adjusting bitrate of individual encodings, per encoding key frame triggering is missing fully. It would be great if we could avoid bundling KF API with insertable streams or any other mechanisms involving heavy per-packet processing, as the latter has impact on power and performance. |
I see. In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.
I agree it would be cumbersome to use, not sure about power impact if frames are not mutated. Cost might be reduced to two thread hops. |
Yes, this is what I was thinking of as well. This new parameter might need a special treatment though, as it is transient in nature. I.e. configuring it on an encoding and reading it back would not make much sense.
Unfortunately active=false/true might have undesirable side-effects, such as actually suspending video stream prior to generating a key frame. Idea is to be able to generate a key frame on a continuous stream. Let me reiterate the scenario - update send bitrate on an already active stream and request KF to be generated with new bitrate constraint being effective. |
|
@youennf , I am not sure how experimentation can help. Observing some behavior in current implementation does not mean it will not change or break in subsequent version. Unless of course we explicitly want to update specification, demanding for active=true/false toggle in a single even loop should not result in stream getting suspended (which I hope we are not willing to do). |
API needs to be robust and produce defined behavior regardless of the frequency it is called. Be it once per minute or once per second. My scenario does not require sub-second processing. However, 1s+ is something that is pretty much required. |
I quickly looked at active usage in WebRTC backend. It seems it currently triggers recreation of the whole encoder, so it probably triggers keyframe (good for a short term fix), but for all simulcasted encoders (not great...). |
@youennf , @alvestrand , any recommendation on how we can proceed with the proposed extension? |
#72 is probably the natural place to continue the discussion. |
No. Simulcast is (at least in chrome) not able to do keyframes for individual layers, see https://bugs.chromium.org/p/webrtc/issues/detail?id=10107 |
@fippo, I think it is OK to have an API enabling the flexibility to control individual layers, and individual user agents to choose not to do this optimization. |
That would be sad if different browsers have different behaviours for this API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest adding an optional argument to specify the streams on which a refresh point should be generated. Inclusion of streams would be interpretted the same as reception of an FIR (which can specify multiple SSRCs). Since RIDs are exposed and SSRCs may not be, it seems like the best way to accomplish this would be by providing an optional argument containing a sequence of RIDs. If the optional argument is not provided, then it would apply to all streams sent by the RTCRtpSender
.
Good idea, @aboba . This seems to be a better place for it, as opposed to having it on top of encoding configuration. |
Closed due to merger of PR 125. |
Fix for Issue #72
This was the simplest API I could think of.
The use of the operations chain (which I haven't been able to cite correctly) is to make it not racy with replaceTrack(), which is also on the operations chain.
I'd like to have comments from hbos and jib before proposing this to the WG.
Preview | Diff