Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generateKeyframe() API #37

Closed
wants to merge 7 commits into from
Closed

Conversation

alvestrand
Copy link
Collaborator

@alvestrand alvestrand commented May 12, 2020

Fix for Issue #72

This was the simplest API I could think of.
The use of the operations chain (which I haven't been able to cite correctly) is to make it not racy with replaceTrack(), which is also on the operations chain.

I'd like to have comments from hbos and jib before proposing this to the WG.


Preview | Diff

@murillo128
Copy link

I like the API!

How does it behave with simulcast and SVC (K-SVC)? Can an key frame be requested for particular layer/stream (like the RTCP LRR)?

@alvestrand
Copy link
Collaborator Author

Good question. I think it needs a layer argument (with default being "all layers").

@alvestrand
Copy link
Collaborator Author

@aboba is there a way in some SVC spec to refer consistently to a specific layer?
Does the question even make sense?
For simulcast, indexing the layers in Encodings should be sufficient.

@fippo
Copy link
Contributor

fippo commented May 12, 2020

@alvestrand related to e2ee stuff?
Please note the opinion of the google team on PLIs/keyframes in relation to simulcast and per-stream keyframes, see https://bugs.chromium.org/p/webrtc/issues/detail?id=10107 (also 👋 @ibc)

IIRC a SRD/SLD also triggers a keyframe (or rather: encoder reset)

@ibc
Copy link

ibc commented May 12, 2020

What is this API needed for?

@aboba
Copy link
Contributor

aboba commented May 12, 2020

@alvestrand The FIR message utilizes SSRC(s) to indicate the target(s) of the request. So the FIR is assumed to apply to all layers on each targeted SSRC, as well as to the SSRCs associated with other layers in the case of layered codecs. The LRR utilizes an SSRC, but also includes other fields such as the target temporal and spatial layers.

Note the updated definition of "decoder refresh point" in RFC 8082.

@aboba
Copy link
Contributor

aboba commented May 12, 2020

@murillo128 How it behaves will depend on how it is interpreted (e.g like an FIR or LRR). For an FIR, the encoder(s) associated with the target SSRC(s) will generate a refresh point. However, if a single encoder is generating simulcast, this will result in all of the simulcast streams generating a key frame, as described here. LRR behavior is described here.

@youennf
Copy link
Contributor

youennf commented May 14, 2020

It would be good to understand the use case.
For instance, would applications like to control the maximum time between key frames an encoder uses, with two extreme cases being key frame only and no key frame except in case of FIR?

A few additional questions:

  • Would it also be useful for some audio codecs?
  • Is it WebRTC specific or is there a use case for MediaRecorder as well?
  • What does it mean for muted content? I guess the next frame would be a key frame, which might require waiting for 1 sec, so the promise would resolve after 1 sec. Is it correct?
  • What should be the behaviour if this method is called 10 times synchronously? Should there be one key frame or should the next 10 frames be key frames?
  • Is it an issue to be able to compute how long it will take to generate a key frame (say I call this API on a canvas track and after some time, I generate a canvas frame and compute the time between the canvas frame generation and the promise resolution)?
  • This is using the operation chain, probably to keep consistency with the replaceTrack trick. Is that so? Is there any other reason? Is there a future need to control exactly which frame would get encoded as a key frame?

index.html Outdated Show resolved Hide resolved
@alvestrand
Copy link
Collaborator Author

@youennf the use case that came up this time is to make sure a keyframe is sent each time you change the encryption key on a group call.

Perhaps this is better called "produceKeyframe"?

@aboba
Copy link
Contributor

aboba commented May 14, 2020

This is for the sender, not the receiver. So it's not really a request, it's more like "generateKeyFrame".

@youennf
Copy link
Contributor

youennf commented May 14, 2020

I see, so it might make sense for audio if codecs use some prediction. Maybe the promise should not reject then?

An alternative solution is for the e2e JS transform module to state that it needs a key frame as part of the additional frame metadata given to the packetizer for each processed frame. That might be also useful for a non JS e2e module for which the key changes might not be visible to the script.

@ibc
Copy link

ibc commented May 14, 2020

Let me insist, please. What is this API needed for? And if the answer is e2ee, why?

@murillo128
Copy link

murillo128 commented May 14, 2020 via email

@ibc
Copy link

ibc commented May 14, 2020

Do you foresee any way this API could cause any issue to existing code?

No, but neither I can figure out how your response explains what this API is needed for :)

@murillo128
Copy link

Forward and Post-Compromise Security requires that the e2ee keys are updated anytime a participant joins/leave the call.
https://tools.ietf.org/html/draft-ietf-mls-architecture-04#section-3.2.2.1

The key exchange happens async and on a different path than the webrtc signaling and media (by definition). So it may happen that when a new participant joins the SFU side request a key frame and the sender generates the e2ee encrypted frame with a key not known by the receiver, so it will be discarded. When the sender updates his sending key with the new key, it will send it in a non-key frame frame, so the receiver will be able to decrypt it, but not decode it.

Receiver will re-request an key frame then, but due to sender and sfu policies, that new key frame could take some time to be generated.

It the sender sends a key frame when the new e2ee key is in use, the time required for the new participant to display the video is minimized.

Also, there are several optimizations (specially on authenticating and signing) that can be done on key frames, so it is always interesting to be able to control the key frame generation from the e2ee side.

Hope it is clearer now.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@fippo
Copy link
Contributor

fippo commented May 14, 2020

In particular in the context of insertable streams I don't care about when the next keyframe happens, no? I could request one to be generated and would rotate the key at the next keyframe that the transform sees. Now if there are two changes to the e2ee key in rapid succession... then what?

I don't think resolving with a single timestamp works for simulcast where we have three different timestamp sequences.

@fippo
Copy link
Contributor

fippo commented May 14, 2020

Didn't realize this was for the sender, not the receiver. So it's not really a request, it's more like "generateKeyFrame".

Do we need a requestKeyFrame on the receiver? Mainly to handle decrypt errors.

@youennf
Copy link
Contributor

youennf commented May 15, 2020

Given the use case, this API might better fit with insertable streams.
To reduce race issues, this API should be tied as close as possible to the encoder, the ReadableStream being the closest we have.

Do we need a requestKeyFrame on the receiver? Mainly to handle decrypt errors.

Seems to make sense, for metadata as well in theory. That is a bit more tricky since this API is trying to influence the encoder of the other side, communication channel being potentially lossy.

@alvestrand alvestrand added the Icebox Not worked on at present label Jan 7, 2021
Base automatically changed from master to main February 15, 2021 07:35
@youennf
Copy link
Contributor

youennf commented Mar 5, 2021

@alverstrand, given the usecase is insertable streams, should we move this issue to webrtc-insertable-streams repo?

@solmaks
Copy link

solmaks commented Mar 22, 2021

@alvestrand , I think we should resurrect this work. RTCRtpSender::replaceTrack workaround is not working reliably (seems like a race), and other workarounds (e.g. scaleResolutionBy toggle) are fundamentally flawed too. It seems that time is right for adding a first party API to request key frames on sender.

@youennf
Copy link
Contributor

youennf commented Mar 22, 2021

@solmaks, can you share your usecase? Is it tied to using webrtc encoded transforms?
Safari has experimental support for this requestKeyFrame API as part of its insertable streams API experiment (in RTCRtpScriptTransformer) since there are known use cases there.
Using an identity transform just for getting access to this API would be cumbersome and might prove that this is not the right place (or only place) for such an API.

@solmaks
Copy link

solmaks commented Mar 22, 2021

Actually it is not. Scenario is dead simple - adjust send stream bitrate and trigger key frame right after. We do have a mechanism for adjusting bitrate of individual encodings, per encoding key frame triggering is missing fully. It would be great if we could avoid bundling KF API with insertable streams or any other mechanisms involving heavy per-packet processing, as the latter has impact on power and performance.

@youennf
Copy link
Contributor

youennf commented Mar 23, 2021

adjust send stream bitrate and trigger key frame right after.

I see. In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.
The closest API surface we have here is RTCRtpSendParameters.encodings.
I would have hoped that, when we go from active=false to active=true, video encoders would start with a key frame.
Maybe this is something we could agree on.

It would be great if we could avoid bundling KF API with insertable streams or any other mechanisms involving heavy per-packet processing, as the latter has impact on power and performance.

I agree it would be cumbersome to use, not sure about power impact if frames are not mutated. Cost might be reduced to two thread hops.

@solmaks
Copy link

solmaks commented Mar 23, 2021

I see. In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.
The closest API surface we have here is RTCRtpSendParameters.encodings.

Yes, this is what I was thinking of as well. This new parameter might need a special treatment though, as it is transient in nature. I.e. configuring it on an encoding and reading it back would not make much sense.

I would have hoped that, when we go from active=false to active=true, video encoders would start with a key frame.
Maybe this is something we could agree on.

Unfortunately active=false/true might have undesirable side-effects, such as actually suspending video stream prior to generating a key frame. Idea is to be able to generate a key frame on a continuous stream. Let me reiterate the scenario - update send bitrate on an already active stream and request KF to be generated with new bitrate constraint being effective.

@youennf
Copy link
Contributor

youennf commented Mar 23, 2021

actually suspending video stream prior to generating a key frame
The sender could probably stop sending during one event loop, so the suspension might not be that big a deal in practice.
This would need experimentation.
Also, generating key frame is probably not something that should happen often.

@solmaks
Copy link

solmaks commented Mar 23, 2021

@youennf , I am not sure how experimentation can help. Observing some behavior in current implementation does not mean it will not change or break in subsequent version. Unless of course we explicitly want to update specification, demanding for active=true/false toggle in a single even loop should not result in stream getting suspended (which I hope we are not willing to do).

@solmaks
Copy link

solmaks commented Mar 23, 2021

Also, generating key frame is probably not something that should happen often.

API needs to be robust and produce defined behavior regardless of the frequency it is called. Be it once per minute or once per second. My scenario does not require sub-second processing. However, 1s+ is something that is pretty much required.

@youennf
Copy link
Contributor

youennf commented Mar 25, 2021

I quickly looked at active usage in WebRTC backend. It seems it currently triggers recreation of the whole encoder, so it probably triggers keyframe (good for a short term fix), but for all simulcasted encoders (not great...).

@youennf
Copy link
Contributor

youennf commented Mar 25, 2021

@solmaks, I filed #72 to keep track of this issue given this thread is related to a PR.

@solmaks
Copy link

solmaks commented Mar 31, 2021

@youennf , @alvestrand , any recommendation on how we can proceed with the proposed extension?

@youennf
Copy link
Contributor

youennf commented Apr 1, 2021

#72 is probably the natural place to continue the discussion.
I think we would need to build a proposal and present it at next WebRTC interim.
Would you be able to carry that forward?
A look at how it could be implemented in browsers in the simulcast case would also be valuable.

@fippo
Copy link
Contributor

fippo commented Apr 4, 2021

In the case of simulcast, you would like to trigger a key frame for one encoding but not the others.

No. Simulcast is (at least in chrome) not able to do keyframes for individual layers, see https://bugs.chromium.org/p/webrtc/issues/detail?id=10107

@solmaks
Copy link

solmaks commented Apr 7, 2021

@fippo, I think it is OK to have an API enabling the flexibility to control individual layers, and individual user agents to choose not to do this optimization.

@youennf
Copy link
Contributor

youennf commented Apr 7, 2021

I think it is OK to have an API enabling the flexibility to control individual layers, and individual user agents to choose not to do this optimization.

That would be sad if different browsers have different behaviours for this API.
This seems like a good topic for next interim.

Copy link
Contributor

@aboba aboba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding an optional argument to specify the streams on which a refresh point should be generated. Inclusion of streams would be interpretted the same as reception of an FIR (which can specify multiple SSRCs). Since RIDs are exposed and SSRCs may not be, it seems like the best way to accomplish this would be by providing an optional argument containing a sequence of RIDs. If the optional argument is not provided, then it would apply to all streams sent by the RTCRtpSender.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@solmaks
Copy link

solmaks commented Apr 8, 2021

Good idea, @aboba . This seems to be a better place for it, as opposed to having it on top of encoding configuration.

@aboba aboba changed the title Add a requestKeyframe() API Add generateframe() API Nov 14, 2021
@aboba aboba changed the title Add generateframe() API Add generateKeyframe() API Nov 14, 2021
@aboba
Copy link
Contributor

aboba commented Feb 2, 2022

Closed due to merger of PR 125.

@aboba aboba closed this Feb 2, 2022
@alvestrand alvestrand deleted the keyframe-request branch September 13, 2023 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Icebox Not worked on at present
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants