Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle transforms largely changing frame size #50

Open
youennf opened this issue Dec 7, 2020 · 9 comments
Open

How to handle transforms largely changing frame size #50

youennf opened this issue Dec 7, 2020 · 9 comments
Labels
TPAC 2022 For discussion at TPAC

Comments

@youennf
Copy link
Collaborator

youennf commented Dec 7, 2020

Transforms may be able to introduce large changes in frame size (decrease or increase).
It seems interesting to understand how to handle these cases.

I can see different variations of these cases:

  1. Metadata size is not really negotiable by the JS transform, size change is more or less fixed.
    The user agent can handle it.
    User Agent can detect the size of the metadata and update the encoder bitrate according to the average of the transformed data.

  2. Transform may decided to decrease metadata size
    The transform may add more or less metadata based on available bandwidth.
    The transform could be made aware of the target bit rate from network side, maybe the encoder target bit rate as well.
    The transform would then compute how much space it can add to the frame.

  3. Transform might want to trade media quality to include more metadata
    In that case, the transform can decide whether to reduce the encoder bit rate or the size of the metadata, or both.
    It seems useful to notify the JS whenever change of the encoder bit rate is planned and potentially allow the JS to override the default behavior.

Case 1 requires no new API.
Case 2 could be implemented as getter APIs.
Case 3 can be implemented in various ways: ReadableStream/WritableStream pair, transform, events, maybe even through frame dedicated fields). It seems all these variants should provide roughly the same functionality support.

I feel like a single object that the JS could use for all of this when processing a frame might be the more convenient from a web developer perspective.

@youennf
Copy link
Collaborator Author

youennf commented Feb 2, 2021

One good example there might be RED (https://tools.ietf.org/html/rfc2198).
IIUIC, it is integrated as a codec similarly to Opus, the transform being applied on the result of the RED encoder.

RED encoder is using an underlying encoder (say Opus) to compress data.
It will then generate content by taking this compressed data and by appending previously encoded data to it until a byte limit is reached. Ideally, the content will be serialised as one packet.

If the transform increases the size of the content by too much, things might actually break.

@fippo
Copy link
Collaborator

fippo commented Feb 3, 2021

Correct, the audio RED encoder is wrapping the opus one, stores a packet (or two, three, four, ...) and then essentially concatenates them.

Since audio typically isn't split into multiple packets that could create a problem with insertable streams but I think we have a lot of margin here. The audio encoder does know the transport overhead which is calculated from RTP header size or RTP header extensions currently (see here for details)

@alvestrand
Copy link
Contributor

Generic problem is transforms where the input is very different from the output.
"Support RED as a transform" may be a separate topic - this would require that the transform has available the desired packet size from the packetizer. Also gets into the whole issue of how an app that inserts a transform can also do the correct SDP actions (preferably without having to munge everything in SDP).

@alvestrand
Copy link
Contributor

If we switch to using EncodedVideoChunk, which is immutable, the problem becomes a different problem.

@fippo
Copy link
Collaborator

fippo commented Sep 10, 2021

I actually ran into this again with a audio/RED encoder. Since audio packets don't get packetized they are size-limited (1200 bytes - transport overhead). This ended up being dropped silently deep down in libwebrtc.
Should
1/ enqueue throw for some definition of "too large audioframe"
2/ the original frame specify a maximum size?

@youennf
Copy link
Collaborator Author

youennf commented Sep 12, 2023

@alvestrand, @aboba, I was initially thinking we could add an API for the web page to expose the target bitrate overhead of the transform.
Rethinking about this issue though, in case transforms add a lot of overhead, I wonder now whether we need to provide any API or whether this could be an unexpected consequence of transforms that implementations need to deal with.

1/ enqueue throw for some definition of "too large audioframe"

If write is failing, it usually means the WritableStream will be errored, which we do not want.
I guess an error event in RTCRtpScriptTransformer is a possibility if we want the web app to get that info fast, or a stat otherwise.

2/ the original frame specify a maximum size?

That is one possibility, another possibility is to put it at RTCRtpScriptTransformer level:

partial interface RTCRtpScriptTransformer {
    readonly attribute unsigned maxAudioFrameSize;
}

with possibly an event in case the value changes.
One question though is how to compute this value (MTU - estimated RTP packetization overhead?)

@fippo
Copy link
Collaborator

fippo commented Sep 13, 2023

One question though is how to compute this value (MTU - estimated RTP packetization overhead?)

The question is what the MTU is even (and we have no PMTU). For video a common assumption in libWebRTC is 1200 but as https://bugs.chromium.org/p/chromium/issues/detail?id=1470254& has shown, stereo opus at 510kbps can result in an RTP payload of 1275 bytes. Which was lower than the (arbitrary) limit of 1000. It is unclear if what works on all networks even without adding more bytes.

@youennf
Copy link
Collaborator Author

youennf commented Sep 13, 2023

Right, given it might be hard for UA to give a good value and the web app could probe this info on its own, it does not seem worth exposing such a value.

@alvestrand
Copy link
Contributor

The comment about audio frame size limitation came up in a different context today.
I think the proper venue for that is AVTCORE in the IETF - if we want to add a packetizer to an audio codec (and I think there are cases where we need it), I think that's the place it should be defined.

Moving the arbitrary limit from 1000 to 1200 is probably pretty safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TPAC 2022 For discussion at TPAC
Projects
None yet
Development

No branches or pull requests

4 participants