Add generateKeyframe() API #37

alvestrand · 2020-05-12T13:51:13Z

Fix for Issue #72

This was the simplest API I could think of.
The use of the operations chain (which I haven't been able to cite correctly) is to make it not racy with replaceTrack(), which is also on the operations chain.

I'd like to have comments from hbos and jib before proposing this to the WG.

Preview | Diff

murillo128 · 2020-05-12T13:55:09Z

I like the API!

How does it behave with simulcast and SVC (K-SVC)? Can an key frame be requested for particular layer/stream (like the RTCP LRR)?

alvestrand · 2020-05-12T14:09:04Z

Good question. I think it needs a layer argument (with default being "all layers").

alvestrand · 2020-05-12T14:15:48Z

@aboba is there a way in some SVC spec to refer consistently to a specific layer?
Does the question even make sense?
For simulcast, indexing the layers in Encodings should be sufficient.

fippo · 2020-05-12T20:30:47Z

@alvestrand related to e2ee stuff?
Please note the opinion of the google team on PLIs/keyframes in relation to simulcast and per-stream keyframes, see https://bugs.chromium.org/p/webrtc/issues/detail?id=10107 (also 👋 @ibc)

IIRC a SRD/SLD also triggers a keyframe (or rather: encoder reset)

ibc · 2020-05-12T20:39:27Z

What is this API needed for?

aboba · 2020-05-12T22:55:47Z

@alvestrand The FIR message utilizes SSRC(s) to indicate the target(s) of the request. So the FIR is assumed to apply to all layers on each targeted SSRC, as well as to the SSRCs associated with other layers in the case of layered codecs. The LRR utilizes an SSRC, but also includes other fields such as the target temporal and spatial layers.

Note the updated definition of "decoder refresh point" in RFC 8082.

aboba · 2020-05-12T23:36:53Z

@murillo128 How it behaves will depend on how it is interpreted (e.g like an FIR or LRR). For an FIR, the encoder(s) associated with the target SSRC(s) will generate a refresh point. However, if a single encoder is generating simulcast, this will result in all of the simulcast streams generating a key frame, as described here. LRR behavior is described here.

youennf · 2020-05-14T08:44:38Z

It would be good to understand the use case.
For instance, would applications like to control the maximum time between key frames an encoder uses, with two extreme cases being key frame only and no key frame except in case of FIR?

A few additional questions:

Would it also be useful for some audio codecs?
Is it WebRTC specific or is there a use case for MediaRecorder as well?
What does it mean for muted content? I guess the next frame would be a key frame, which might require waiting for 1 sec, so the promise would resolve after 1 sec. Is it correct?
What should be the behaviour if this method is called 10 times synchronously? Should there be one key frame or should the next 10 frames be key frames?
Is it an issue to be able to compute how long it will take to generate a key frame (say I call this API on a canvas track and after some time, I generate a canvas frame and compute the time between the canvas frame generation and the promise resolution)?
This is using the operation chain, probably to keep consistency with the replaceTrack trick. Is that so? Is there any other reason? Is there a future need to control exactly which frame would get encoded as a key frame?

index.html

alvestrand · 2020-05-14T14:35:57Z

@youennf the use case that came up this time is to make sure a keyframe is sent each time you change the encryption key on a group call.

Perhaps this is better called "produceKeyframe"?

aboba · 2020-05-14T14:36:15Z

This is for the sender, not the receiver. So it's not really a request, it's more like "generateKeyFrame".

youennf · 2020-05-14T16:14:35Z

I see, so it might make sense for audio if codecs use some prediction. Maybe the promise should not reject then?

An alternative solution is for the e2e JS transform module to state that it needs a key frame as part of the additional frame metadata given to the packetizer for each processed frame. That might be also useful for a non JS e2e module for which the key changes might not be visible to the script.

ibc · 2020-05-14T16:16:50Z

Let me insist, please. What is this API needed for? And if the answer is e2ee, why?

murillo128 · 2020-05-14T16:27:44Z

the e2ee key is exchanged in a different path than sfu signaling, so it may be updated on the sender shortly after the new participant has joined and the sfu jas already requested the key frame. Updating the key on a key frame makes a lot of sense to me. Anyway, I am always in favor of more APIs that allow lower level control of the underlaying objects. Do you foresee any way this API could cause any issue to existing code? El jue., 14 may. 2020 18:17, Iñaki Baz Castillo <notifications@github.com> escribió:

…

Let me insist, please. What is this API needed for? And if the answer is e2ee, why? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIFN43V35GF5LLGQSVQW43RRQKQDANCNFSM4M63A53A> .

ibc · 2020-05-14T16:30:40Z

Do you foresee any way this API could cause any issue to existing code?

No, but neither I can figure out how your response explains what this API is needed for :)

murillo128 · 2020-05-14T17:17:21Z

Forward and Post-Compromise Security requires that the e2ee keys are updated anytime a participant joins/leave the call.
https://tools.ietf.org/html/draft-ietf-mls-architecture-04#section-3.2.2.1

The key exchange happens async and on a different path than the webrtc signaling and media (by definition). So it may happen that when a new participant joins the SFU side request a key frame and the sender generates the e2ee encrypted frame with a key not known by the receiver, so it will be discarded. When the sender updates his sending key with the new key, it will send it in a non-key frame frame, so the receiver will be able to decrypt it, but not decode it.

Receiver will re-request an key frame then, but due to sender and sfu policies, that new key frame could take some time to be generated.

It the sender sends a key frame when the new e2ee key is in use, the time required for the new participant to display the video is minimized.

Also, there are several optimizations (specially on authenticating and signing) that can be done on key frames, so it is always interesting to be able to control the key frame generation from the e2ee side.

Hope it is clearer now.

index.html

fippo · 2020-05-14T18:32:44Z

In particular in the context of insertable streams I don't care about when the next keyframe happens, no? I could request one to be generated and would rotate the key at the next keyframe that the transform sees. Now if there are two changes to the e2ee key in rapid succession... then what?

I don't think resolving with a single timestamp works for simulcast where we have three different timestamp sequences.

fippo · 2020-05-14T18:39:49Z

Didn't realize this was for the sender, not the receiver. So it's not really a request, it's more like "generateKeyFrame".

Do we need a requestKeyFrame on the receiver? Mainly to handle decrypt errors.

youennf · 2020-05-15T10:09:18Z

Given the use case, this API might better fit with insertable streams.
To reduce race issues, this API should be tied as close as possible to the encoder, the ReadableStream being the closest we have.

Do we need a requestKeyFrame on the receiver? Mainly to handle decrypt errors.

Seems to make sense, for metadata as well in theory. That is a bit more tricky since this API is trying to influence the encoder of the other side, communication channel being potentially lossy.

youennf · 2021-03-05T13:30:17Z

@alverstrand, given the usecase is insertable streams, should we move this issue to webrtc-insertable-streams repo?

solmaks · 2021-03-22T11:53:56Z

@alvestrand , I think we should resurrect this work. RTCRtpSender::replaceTrack workaround is not working reliably (seems like a race), and other workarounds (e.g. scaleResolutionBy toggle) are fundamentally flawed too. It seems that time is right for adding a first party API to request key frames on sender.

youennf · 2021-03-22T12:43:03Z

@solmaks, can you share your usecase? Is it tied to using webrtc encoded transforms?
Safari has experimental support for this requestKeyFrame API as part of its insertable streams API experiment (in RTCRtpScriptTransformer) since there are known use cases there.
Using an identity transform just for getting access to this API would be cumbersome and might prove that this is not the right place (or only place) for such an API.

solmaks · 2021-03-22T21:02:03Z

Actually it is not. Scenario is dead simple - adjust send stream bitrate and trigger key frame right after. We do have a mechanism for adjusting bitrate of individual encodings, per encoding key frame triggering is missing fully. It would be great if we could avoid bundling KF API with insertable streams or any other mechanisms involving heavy per-packet processing, as the latter has impact on power and performance.

youennf · 2021-03-23T09:08:31Z

adjust send stream bitrate and trigger key frame right after.

I see. In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.
The closest API surface we have here is RTCRtpSendParameters.encodings.
I would have hoped that, when we go from active=false to active=true, video encoders would start with a key frame.
Maybe this is something we could agree on.

It would be great if we could avoid bundling KF API with insertable streams or any other mechanisms involving heavy per-packet processing, as the latter has impact on power and performance.

I agree it would be cumbersome to use, not sure about power impact if frames are not mutated. Cost might be reduced to two thread hops.

solmaks · 2021-03-23T11:12:09Z

I see. In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.
The closest API surface we have here is RTCRtpSendParameters.encodings.

Yes, this is what I was thinking of as well. This new parameter might need a special treatment though, as it is transient in nature. I.e. configuring it on an encoding and reading it back would not make much sense.

I would have hoped that, when we go from active=false to active=true, video encoders would start with a key frame.
Maybe this is something we could agree on.

Unfortunately active=false/true might have undesirable side-effects, such as actually suspending video stream prior to generating a key frame. Idea is to be able to generate a key frame on a continuous stream. Let me reiterate the scenario - update send bitrate on an already active stream and request KF to be generated with new bitrate constraint being effective.

youennf · 2021-03-23T11:26:52Z

actually suspending video stream prior to generating a key frame
The sender could probably stop sending during one event loop, so the suspension might not be that big a deal in practice.
This would need experimentation.
Also, generating key frame is probably not something that should happen often.

solmaks · 2021-03-23T11:31:16Z

@youennf , I am not sure how experimentation can help. Observing some behavior in current implementation does not mean it will not change or break in subsequent version. Unless of course we explicitly want to update specification, demanding for active=true/false toggle in a single even loop should not result in stream getting suspended (which I hope we are not willing to do).

solmaks · 2021-03-23T11:34:26Z

Also, generating key frame is probably not something that should happen often.

API needs to be robust and produce defined behavior regardless of the frequency it is called. Be it once per minute or once per second. My scenario does not require sub-second processing. However, 1s+ is something that is pretty much required.

youennf · 2021-03-25T12:41:20Z

I quickly looked at active usage in WebRTC backend. It seems it currently triggers recreation of the whole encoder, so it probably triggers keyframe (good for a short term fix), but for all simulcasted encoders (not great...).

youennf · 2021-03-25T12:43:57Z

@solmaks, I filed #72 to keep track of this issue given this thread is related to a PR.

solmaks · 2021-03-31T21:33:35Z

@youennf , @alvestrand , any recommendation on how we can proceed with the proposed extension?

youennf · 2021-04-01T09:25:09Z

#72 is probably the natural place to continue the discussion.
I think we would need to build a proposal and present it at next WebRTC interim.
Would you be able to carry that forward?
A look at how it could be implemented in browsers in the simulcast case would also be valuable.

fippo · 2021-04-04T12:55:21Z

In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.

No. Simulcast is (at least in chrome) not able to do keyframes for individual layers, see https://bugs.chromium.org/p/webrtc/issues/detail?id=10107

solmaks · 2021-04-07T07:50:12Z

@fippo, I think it is OK to have an API enabling the flexibility to control individual layers, and individual user agents to choose not to do this optimization.

youennf · 2021-04-07T08:38:11Z

I think it is OK to have an API enabling the flexibility to control individual layers, and individual user agents to choose not to do this optimization.

That would be sad if different browsers have different behaviours for this API.
This seems like a good topic for next interim.

aboba

I'd suggest adding an optional argument to specify the streams on which a refresh point should be generated. Inclusion of streams would be interpretted the same as reception of an FIR (which can specify multiple SSRCs). Since RIDs are exposed and SSRCs may not be, it seems like the best way to accomplish this would be by providing an optional argument containing a sequence of RIDs. If the optional argument is not provided, then it would apply to all streams sent by the RTCRtpSender.

index.html

solmaks · 2021-04-08T09:44:47Z

Good idea, @aboba . This seems to be a better place for it, as opposed to having it on top of encoding configuration.

aboba · 2022-02-02T18:40:05Z

Closed due to merger of PR 125.

Add a requesKeyframe() API

7fe99cf

alvestrand requested review from jan-ivar and henbos May 12, 2020 13:51

alvestrand commented May 14, 2020

View reviewed changes

index.html Outdated Show resolved Hide resolved

ibc mentioned this pull request May 14, 2020

Support End-to-End Encryption with WebRTC Insertable Streams versatica/mediasoup#383

Closed

fippo reviewed May 14, 2020

View reviewed changes

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

murillo128 mentioned this pull request May 14, 2020

Add video key frame recommendation when updating e2ee keys sframe-wg/sframe#26

Merged

alvestrand added the Icebox Not worked on at present label Jan 7, 2021

Base automatically changed from master to main February 15, 2021 07:35

youennf mentioned this pull request Mar 25, 2021

Add API to control WebRTC encoder, in particular to request key frame #72

Closed

aboba added 2 commits April 7, 2021 20:57

Fix typos

a933ec7

Fix typo

eba1c0c

aboba requested changes Apr 8, 2021

View reviewed changes

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

aboba added 2 commits July 16, 2021 08:56

request -> generate

0a336ac

Clarify side effects of potential polyfill

34cca5d

aboba changed the title ~~Add a requestKeyframe() API~~ Add generateframe() API Nov 14, 2021

aboba changed the title ~~Add generateframe() API~~ Add generateKeyframe() API Nov 14, 2021

aboba added 2 commits November 14, 2021 18:44

Add rids argument

e17134a

Move to prod2

7878543

aboba approved these changes Nov 18, 2021

View reviewed changes

youennf mentioned this pull request Nov 19, 2021

Add API to request key frames w3c/webrtc-encoded-transform#125

Merged

aboba closed this Feb 2, 2022

alvestrand deleted the keyframe-request branch September 13, 2023 10:01

Add generateKeyframe() API #37

Add generateKeyframe() API #37

Conversation

alvestrand commented May 12, 2020 • edited by pr-preview bot

murillo128 commented May 12, 2020

alvestrand commented May 12, 2020

alvestrand commented May 12, 2020

fippo commented May 12, 2020

ibc commented May 12, 2020

aboba commented May 12, 2020 • edited

aboba commented May 12, 2020 • edited

youennf commented May 14, 2020

alvestrand commented May 14, 2020

aboba commented May 14, 2020 • edited

youennf commented May 14, 2020

ibc commented May 14, 2020

murillo128 commented May 14, 2020 via email

ibc commented May 14, 2020

murillo128 commented May 14, 2020

fippo commented May 14, 2020

fippo commented May 14, 2020

youennf commented May 15, 2020

youennf commented Mar 5, 2021

solmaks commented Mar 22, 2021

youennf commented Mar 22, 2021

solmaks commented Mar 22, 2021

youennf commented Mar 23, 2021

solmaks commented Mar 23, 2021 • edited

youennf commented Mar 23, 2021

solmaks commented Mar 23, 2021

solmaks commented Mar 23, 2021

youennf commented Mar 25, 2021

youennf commented Mar 25, 2021

solmaks commented Mar 31, 2021

youennf commented Apr 1, 2021

fippo commented Apr 4, 2021

solmaks commented Apr 7, 2021

youennf commented Apr 7, 2021

aboba left a comment • edited

Choose a reason for hiding this comment

solmaks commented Apr 8, 2021

aboba commented Feb 2, 2022

alvestrand commented May 12, 2020 •

edited by pr-preview bot

aboba commented May 12, 2020 •

edited

aboba commented May 12, 2020 •

edited

aboba commented May 14, 2020 •

edited

solmaks commented Mar 23, 2021 •

edited

aboba left a comment •

edited