New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generateKeyFrame takes a "rid" argument, but is invoked with "rids" #143
Comments
The algorithm doesn't deal with the lack of rids either. |
Oh right, I missed it in #132. |
(uplevel from PR comment) I'm not sure how valid those are for transformer.generateKeyFrame, however. |
(36c965e) removed ability to request keyframes on a sequence of RIDs. This imposes some unnecessary limitations, as now when applications need to request keyframes on N (N>1 && N<MAX_RIDS) RIDs, to guarantee video fluidity for a wide range of receivers, they can no longer do this optimally. With currently proposed API, application willing to request keyframes on 2 RIDs will need to either call into API 2 times (providing RID N and RID M as arguments), or 1 time (providing no arguments). Choosing between the 2 will require applications to know implementation details and limitations, which is not always possible. E.g. if user agent emits keyframes on all RIDs even when asked for a single RID, applications should prefer option 1. If user agent emits keyframes only on specific RIDs, application should prefer option 2. Having applications to implement such heuristic is something we can easily avoid, by offering a more flexible API that ALWAYS takes sequence of RIDs. |
At last TPAC, the room feeling was that supporting N=1 or N=ALL_ACTIVE was good enough as it covered the envisioned use cases. |
The resolution as recorded in https://www.w3.org/2022/09/13-webrtc-minutes.html#t07 was "go with proposal 3 without returning a timestamp", where proposal 3 was "providing no rid -> Trigger key frames for all rids". So that would mean making the argument to requestKeyFrame optional. |
What I am after is optimal operation in cases where application would like to dynamically maximise video fidelity for every receiver group, by having tight control over the target bitrates of individual RIDs. A practical example:
With N growing big, the optimization exercise requires frequent changes both in target bitrates of every RID, but also in the number of receivers specific RID is targeted to serve. Consider a scenario that involves introducing receivers at a high pace (e.g. start of a larger conference). In this scenario there will be a number of receivers capable of consuming high fidelity encodings (due to their estimators having time to ramp up and stabilize), as well as continuous inflow of receivers capable of consuming different variants of low fidelity encodings (these groups will change a lot due to estimators ramping up and stabilizing around some values). Since applications would want to minimise the number of generated key frames on any encoding/RID, they would want to pace/aggregate the changes. In practice this would mean having multiple changes to apply in one go. In this example, most of the changes would target lower fidelity encodings (due to higher instability of these groups). Under these conditions it would be great to have ability not to penalise receivers consuming high fidelity stream(s). Hence the ask to not artificially limit the API surface. |
@alvestrand , do I read correctly that TPAC consensus was to:
|
@solmaks that is how I read the minutes, and it fits with my memory of the meeting. I do wonder - if the RID groups are fairly independent, is there really a need to synchronize the sending of keyframes? |
@alvestrand see my objection to the consensus on the list. This isn't a new request, see here
We're essentially talking about Please note that there is heavy threadhopping on the left of the first "enc". Leaving how to fulfill the request for keyframes to the encoder implementation seems preferable. |
There were multiple process problems during the discussion. There were more than 3 proposals in the slide deck, but only the first 3 were presented. Then the "resolution" was determined without polling all the participants in the discussion, so as to allow remote participants to participate. |
The generateKeyFrame algorithm takes a single RID, but the generateKeyFrame API function invokes it with a sequence of RIDs.
The rtcRtpScriptTransformer invokes it with a single RID.
This should be just a matter of wordsmithing; the obvious decision to be taken is behavior of generateKeyFrame([valid rid, invalid rid]).
The text was updated successfully, but these errors were encountered: