Encoded frame IDLs share definitions with WebCodecs #99

youennf · 2021-04-12T08:41:30Z

Some members of audio/video encoded frames are shared with WebCodecs, like data or timestamp.
Some other members like getMetadata might be specific to WebRTC encoded transform.
We should converge and explicitly mark what is webrtc-specific and what is shared with WebCodecs.

dogben · 2021-04-12T10:10:14Z

@chcunningham @sandersdan

aboba · 2021-04-15T14:14:54Z

@youennf metadata relating to the frame (such as type, dependencies and TID/SID) is provided by WebCodecs.

youennf · 2021-04-15T16:00:15Z

From looking at the spec, here is the list of obvious things that differ right now:

data is settable here, not in WebCodecs. This is expected to change in WebCodecs
'empty' is a valid video frame here, not in WebCodecs. Not sure why we have 'empty' here so I hope we can drop it.
getMetadata is defined here, not in WebCodecs. This is expected to change in WebCodecs.
Some metadata fields are RTP specific and would stay here.

The plan might be in this spec to:

Directly use EncodedVideoChunk and EncodedAudioChunk
Define the additional metadata fields using partial dictionaries

To do so, we need WebCodecs to first fix attribute data setter and add getMetadata.
@chcunningham, any thoughts on when these will happen in WebCodecs?

If we directly use WebCodecs structures, we also need to ensure they stay aligned with our model and constraints here.
For instance, we probably want to stick with readonly timestamps.

chcunningham · 2021-04-20T00:24:24Z

HI @youennf, sorry for the delay.

Encoded*Chunk currently exposes a readonly data attribute, but we were planning to remove it entirely in favor of a readInto() method that copies the data out (PR for this here). The reasoning is that we wish for all I/O types in webcodecs to be immutable to avoid surprising users and (critically) to avoid TOCTOU security issues.

I saw your comment here

Now that I look at it, data is readonly here but we want to be able to change it in RTCRtpSFrameTransform transforms.

Rather than setting the .data attribute, is it possible to instead construct a new chunk w/ the transformed data? We could design a copy constructor or similar to make this easy for authors.

Define the additional metadata fields using partial dictionaries

We've had some other requests for metadata as well. I'd like to weigh the options of subclassing, vs wrapping, vs extending (with partial dictionaries) vs having an open ended metadata bucket (maybe a Map) on the Encoded*Chunk interfaces. I don't have an opinion just yet.

For the metadata you're seeking, is it important that the metadata be carried through decoding/encoding, such that a VideoFrame corresponding to an EncodedVideoChunk has all the same values?

For instance, we probably want to stick with readonly timestamps.

No objection.

youennf · 2021-04-21T17:28:36Z

I haven't looked at the readInto proposal.
In general, I would prefer we use move/neuter behavior here, instead of relying on garbage collection.
This is true for encoded chunks as well as raw data. There might be cases where we could also potentially reuse a buffer (say decryption).

The typical flow is: I read a frame F from encoder, I change the frame F, I write the frame F in the packetizer.
After writing the frame F, the slimmer frame F is, the better.
Similar to transferring the frame.

For the metadata you're seeking, is it important that the metadata be carried through decoding/encoding, such that a VideoFrame corresponding to an EncodedVideoChunk has all the same values?

For the typical RTC metadata we want, we do not need to get it exposed past the decoder.

We might in the future have metadata that would go from MediaStreamTrack to WebRTC transform through encoder (or vice versa on decoder side). That could be a separate bucket of metadata.

chcunningham · 2021-04-22T02:33:15Z

In general, I would prefer we use move/neuter behavior here, instead of relying on garbage collection.
This is true for encoded chunks as well as raw data.

For raw frames, we definitely do not want anyone to rely on garbage collection. Both AudioFrame and (more importantly) VideoFrame offer a close() method (similar to ImageBitmap) that releases the object's reference to underlying resources (and may immediately free them if that was the only reference). The spec encourages users to close() immediately when they no longer need the frame (and Chrome actually warns users if it GC's a frame that wasn't closed()). I'm adding further clarity to the ref-counting behavior in w3c/webcodecs#167.

For encoded chunks, GC has been working fine since the data is relatively small, but I'd be open to adding a close() there as well.

There might be cases where we could also potentially reuse a buffer (say decryption).

Sorry, I didn't quite follow this. Can you say more?

The typical flow is: I read a frame F from encoder, I change the frame F, I write the frame F in the packetizer.
After writing the frame F, the slimmer frame F is, the better.
Similar to transferring the frame.

With the Encoded*Chunk readInto() PR, this entails one copy when reading from the encoder output. I think it strikes a good balance between keeping I/O types immutable (making it easy to reason about TOCTOU issues) and not being too onerous.

Additionally, for VideoFrame, the pixel data will often reside in GPU memory. We don't want to copy to CPU for the sake of making an ArrayBuffer unless the user explicitly requests that. We achieve this by again offering a readInto() API (this time, making it async to avoid requiring synchronization that hurts GPU pipeline perf). For folks that just want to go from MediaStreamTrack -> Canvas or from VideoDecoder -> Canvas, this API would never be called and everything stays on the GPU. This lazy async readback option isn't feasible with an interface that just offers an ArrayBuffer.

In the long run, we hope the web platform will introduce a type of read-only ArrayBuffer, which enables us to do things like BYOB and expose buffer attributes (except for the GPU VideoFrame case mentioned above) without fear of TOCTOU issues. Extensive discussion on that in WICG/reducing-memory-copies#1

youennf · 2021-04-26T19:05:34Z

The spec encourages users to close() immediately when they no longer need the frame (and Chrome actually warns users if it GC's a frame that wasn't closed()). I'm adding further clarity to the ref-counting behavior in w3c/webcodecs#167.

This is the kind of things I am worried.
It would be nice to have most web dev flows not needing to call close(), enqueuing a frame in a sink can call close() directly.
That will make web pages actually call clone() when they really need to.
But this is really outside the scope of this current thread.

I think I agree with you on raw frames vs. encoded frames.
It is nice if the same model applies to both frames but this is not a must.

With the Encoded*Chunk readInto() PR, this entails one copy when reading from the encoder output. I think it strikes a good balance between keeping I/O types immutable (making it easy to reason about TOCTOU issues) and not being too onerous.

Making it easy to reason about TOCTOU issues is mostly a concern from browser implementors.
We just need to make sure that a packetizer or a decoder will not read data that can be modified by JS during its processing.
Transferring data like done in WebRTC encoded transform spec solves this issue AFAIUI.

alvestrand · 2021-05-12T12:03:12Z

For the metadata issue, I filed w3c/webcodecs#245 to track this in the WebCodec repo.
Things like the dependency relationships are generated by the encoder, so it seems logical that they need to be in the encoder output somehow.

youennf · 2021-08-26T14:23:17Z

Discussed at editor's meeting, it seems we could reuse WebCodecs metadata and extend it for RTC.

alvestrand · 2023-10-05T14:52:45Z

Given the deployment of the RTCEncoded* types, it seems more Web-compatible to define a well-documented way to turn a WebCodecs encoded frame into an RTC encoded frame and vice versa.

Metadata is important.

youennf mentioned this issue Apr 15, 2021

Reuse WebCodec audio/video chunk definitions #101

Draft

chcunningham mentioned this issue Apr 22, 2021

Memory management for incoming frames w3c/mediacapture-transform#6

Closed

youennf mentioned this issue Jul 26, 2022

Relationship to WebCodecs #141

Open

aboba added the TPAC 2022 For discussion at TPAC label Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoded frame IDLs share definitions with WebCodecs #99

Encoded frame IDLs share definitions with WebCodecs #99

youennf commented Apr 12, 2021

dogben commented Apr 12, 2021

aboba commented Apr 15, 2021

youennf commented Apr 15, 2021

chcunningham commented Apr 20, 2021

youennf commented Apr 21, 2021

chcunningham commented Apr 22, 2021

youennf commented Apr 26, 2021

alvestrand commented May 12, 2021

youennf commented Aug 26, 2021

alvestrand commented Oct 5, 2023

Encoded frame IDLs share definitions with WebCodecs #99

Encoded frame IDLs share definitions with WebCodecs #99

Comments

youennf commented Apr 12, 2021

dogben commented Apr 12, 2021

aboba commented Apr 15, 2021

youennf commented Apr 15, 2021

chcunningham commented Apr 20, 2021

youennf commented Apr 21, 2021

chcunningham commented Apr 22, 2021

youennf commented Apr 26, 2021

alvestrand commented May 12, 2021

youennf commented Aug 26, 2021

alvestrand commented Oct 5, 2023