Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generateKeyFrame takes a "rid" argument, but is invoked with "rids" #143

Open
alvestrand opened this issue Aug 30, 2022 · 12 comments
Open
Labels
TPAC 2022 For discussion at TPAC

Comments

@alvestrand
Copy link
Contributor

The generateKeyFrame algorithm takes a single RID, but the generateKeyFrame API function invokes it with a sequence of RIDs.

The rtcRtpScriptTransformer invokes it with a single RID.

This should be just a matter of wordsmithing; the obvious decision to be taken is behavior of generateKeyFrame([valid rid, invalid rid]).

@fippo
Copy link
Collaborator

fippo commented Aug 30, 2022

The algorithm doesn't deal with the lack of rids either.
The IDL might need to specify a default of = [] but that might just be how it needs to be implemented in Chrome.

@youennf
Copy link
Collaborator

youennf commented Aug 30, 2022

Oh right, I missed it in #132.
Should we align RTCRtpSender.generateKeyFrame to RTCRtpScriptTransformer.generateKeyFrame?

@youennf
Copy link
Collaborator

youennf commented Sep 1, 2022

#146

@aboba aboba added the TPAC 2022 For discussion at TPAC label Sep 2, 2022
@youennf youennf changed the title Editorial: generateKeyFrame takes a "rid" argument, but is invoked with "rids" generateKeyFrame takes a "rid" argument, but is invoked with "rids" Sep 6, 2022
@alvestrand
Copy link
Contributor Author

(uplevel from PR comment)
there are arguments for making sender.generateKeyFrame take a list of RIDs. One of them is ensuring all the keyframes are sent at the same timestamp; another is dealing with encoders that can generate multiple keyframes from one encoder.

I'm not sure how valid those are for transformer.generateKeyFrame, however.

@solmaks
Copy link

solmaks commented Sep 29, 2022

(36c965e) removed ability to request keyframes on a sequence of RIDs. This imposes some unnecessary limitations, as now when applications need to request keyframes on N (N>1 && N<MAX_RIDS) RIDs, to guarantee video fluidity for a wide range of receivers, they can no longer do this optimally. With currently proposed API, application willing to request keyframes on 2 RIDs will need to either call into API 2 times (providing RID N and RID M as arguments), or 1 time (providing no arguments). Choosing between the 2 will require applications to know implementation details and limitations, which is not always possible. E.g. if user agent emits keyframes on all RIDs even when asked for a single RID, applications should prefer option 1. If user agent emits keyframes only on specific RIDs, application should prefer option 2. Having applications to implement such heuristic is something we can easily avoid, by offering a more flexible API that ALWAYS takes sequence of RIDs.

@youennf
Copy link
Collaborator

youennf commented Sep 29, 2022

At last TPAC, the room feeling was that supporting N=1 or N=ALL_ACTIVE was good enough as it covered the envisioned use cases.
@solmaks, I assume you have a use case for N > 1 && N < ALL_ACTIVE.
Can you share it?

@alvestrand
Copy link
Contributor Author

The resolution as recorded in https://www.w3.org/2022/09/13-webrtc-minutes.html#t07 was "go with proposal 3 without returning a timestamp", where proposal 3 was "providing no rid -> Trigger key frames for all rids".

So that would mean making the argument to requestKeyFrame optional.

@solmaks
Copy link

solmaks commented Sep 29, 2022

What I am after is optimal operation in cases where application would like to dynamically maximise video fidelity for every receiver group, by having tight control over the target bitrates of individual RIDs. A practical example:

  • 2+ RIDs are configured for low fidelity encodings, that target low capability receivers, running on poor/lossy networks
  • 1+ RIDs are configured for high fidelity encodings, that target high capability receivers, running on good networks
  • Target bitrates for all RIDs are dynamically configured based on the ever changing capabilities of receiver groups to maximise video fidelity across all N receivers.

With N growing big, the optimization exercise requires frequent changes both in target bitrates of every RID, but also in the number of receivers specific RID is targeted to serve. Consider a scenario that involves introducing receivers at a high pace (e.g. start of a larger conference). In this scenario there will be a number of receivers capable of consuming high fidelity encodings (due to their estimators having time to ramp up and stabilize), as well as continuous inflow of receivers capable of consuming different variants of low fidelity encodings (these groups will change a lot due to estimators ramping up and stabilizing around some values). Since applications would want to minimise the number of generated key frames on any encoding/RID, they would want to pace/aggregate the changes. In practice this would mean having multiple changes to apply in one go. In this example, most of the changes would target lower fidelity encodings (due to higher instability of these groups). Under these conditions it would be great to have ability not to penalise receivers consuming high fidelity stream(s). Hence the ask to not artificially limit the API surface.

@solmaks
Copy link

solmaks commented Sep 29, 2022

@alvestrand , do I read correctly that TPAC consensus was to:

  1. Remove the timestamp altogether
  2. Make argument optional, and have "ALL" semantic if it is missing
  3. Keep a single RID argument

@alvestrand
Copy link
Contributor Author

@solmaks that is how I read the minutes, and it fits with my memory of the meeting.
If compelling new arguments are raised, this can be revisited; the scenario you mention might be one such.

I do wonder - if the RID groups are fairly independent, is there really a need to synchronize the sending of keyframes?
If not, requestKeyframe(RID) would seem to be sufficient.

@fippo
Copy link
Collaborator

fippo commented Sep 29, 2022

@alvestrand see my objection to the consensus on the list.

This isn't a new request, see here

  • some encoders (libvpx?) prefer synchronizing because it allows sharing some work, see here, not prohibiting this seems useful
  • while PLIs are per-ssrc they may come as part of a compound packet (servers may aggregate even though because of libwebrtc bugs currently few do probably) hence you need some kind of API for those. Aggregating into a single call to the encoder seems useful for this too.

We're essentially talking about
image
(where "enc" is an abstract encoder on the left, SEA is a "simulcast encoder adapter" which resolves rids or ssrcs to layers and "enc" is a concrete encoder implementation on the right) vs
image
vs, with some encoder implementation that understands what this simulcast thing is
image

Please note that there is heavy threadhopping on the left of the first "enc". Leaving how to fulfill the request for keyframes to the encoder implementation seems preferable.

@aboba
Copy link
Contributor

aboba commented Sep 29, 2022

There were multiple process problems during the discussion. There were more than 3 proposals in the slide deck, but only the first 3 were presented. Then the "resolution" was determined without polling all the participants in the discussion, so as to allow remote participants to participate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TPAC 2022 For discussion at TPAC
Projects
None yet
Development

No branches or pull requests

5 participants