Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded frame IDLs share definitions with WebCodecs #99

Open
youennf opened this issue Apr 12, 2021 · 10 comments
Open

Encoded frame IDLs share definitions with WebCodecs #99

youennf opened this issue Apr 12, 2021 · 10 comments
Labels
TPAC 2022 For discussion at TPAC

Comments

@youennf
Copy link
Collaborator

youennf commented Apr 12, 2021

Some members of audio/video encoded frames are shared with WebCodecs, like data or timestamp.
Some other members like getMetadata might be specific to WebRTC encoded transform.
We should converge and explicitly mark what is webrtc-specific and what is shared with WebCodecs.

@dogben
Copy link

dogben commented Apr 12, 2021

@chcunningham @sandersdan

@aboba
Copy link
Contributor

aboba commented Apr 15, 2021

@youennf metadata relating to the frame (such as type, dependencies and TID/SID) is provided by WebCodecs.

@youennf
Copy link
Collaborator Author

youennf commented Apr 15, 2021

From looking at the spec, here is the list of obvious things that differ right now:

  • data is settable here, not in WebCodecs. This is expected to change in WebCodecs
  • 'empty' is a valid video frame here, not in WebCodecs. Not sure why we have 'empty' here so I hope we can drop it.
  • getMetadata is defined here, not in WebCodecs. This is expected to change in WebCodecs.
  • Some metadata fields are RTP specific and would stay here.

The plan might be in this spec to:

  • Directly use EncodedVideoChunk and EncodedAudioChunk
  • Define the additional metadata fields using partial dictionaries

To do so, we need WebCodecs to first fix attribute data setter and add getMetadata.
@chcunningham, any thoughts on when these will happen in WebCodecs?

If we directly use WebCodecs structures, we also need to ensure they stay aligned with our model and constraints here.
For instance, we probably want to stick with readonly timestamps.

@chcunningham
Copy link

HI @youennf, sorry for the delay.

Encoded*Chunk currently exposes a readonly data attribute, but we were planning to remove it entirely in favor of a readInto() method that copies the data out (PR for this here). The reasoning is that we wish for all I/O types in webcodecs to be immutable to avoid surprising users and (critically) to avoid TOCTOU security issues.

I saw your comment here

Now that I look at it, data is readonly here but we want to be able to change it in RTCRtpSFrameTransform transforms.

Rather than setting the .data attribute, is it possible to instead construct a new chunk w/ the transformed data? We could design a copy constructor or similar to make this easy for authors.

Define the additional metadata fields using partial dictionaries

We've had some other requests for metadata as well. I'd like to weigh the options of subclassing, vs wrapping, vs extending (with partial dictionaries) vs having an open ended metadata bucket (maybe a Map) on the Encoded*Chunk interfaces. I don't have an opinion just yet.

For the metadata you're seeking, is it important that the metadata be carried through decoding/encoding, such that a VideoFrame corresponding to an EncodedVideoChunk has all the same values?

For instance, we probably want to stick with readonly timestamps.

No objection.

@youennf
Copy link
Collaborator Author

youennf commented Apr 21, 2021

I haven't looked at the readInto proposal.
In general, I would prefer we use move/neuter behavior here, instead of relying on garbage collection.
This is true for encoded chunks as well as raw data. There might be cases where we could also potentially reuse a buffer (say decryption).

The typical flow is: I read a frame F from encoder, I change the frame F, I write the frame F in the packetizer.
After writing the frame F, the slimmer frame F is, the better.
Similar to transferring the frame.

For the metadata you're seeking, is it important that the metadata be carried through decoding/encoding, such that a VideoFrame corresponding to an EncodedVideoChunk has all the same values?

For the typical RTC metadata we want, we do not need to get it exposed past the decoder.

We might in the future have metadata that would go from MediaStreamTrack to WebRTC transform through encoder (or vice versa on decoder side). That could be a separate bucket of metadata.

@chcunningham
Copy link

In general, I would prefer we use move/neuter behavior here, instead of relying on garbage collection.
This is true for encoded chunks as well as raw data.

For raw frames, we definitely do not want anyone to rely on garbage collection. Both AudioFrame and (more importantly) VideoFrame offer a close() method (similar to ImageBitmap) that releases the object's reference to underlying resources (and may immediately free them if that was the only reference). The spec encourages users to close() immediately when they no longer need the frame (and Chrome actually warns users if it GC's a frame that wasn't closed()). I'm adding further clarity to the ref-counting behavior in w3c/webcodecs#167.

For encoded chunks, GC has been working fine since the data is relatively small, but I'd be open to adding a close() there as well.

There might be cases where we could also potentially reuse a buffer (say decryption).

Sorry, I didn't quite follow this. Can you say more?

The typical flow is: I read a frame F from encoder, I change the frame F, I write the frame F in the packetizer.
After writing the frame F, the slimmer frame F is, the better.
Similar to transferring the frame.

With the Encoded*Chunk readInto() PR, this entails one copy when reading from the encoder output. I think it strikes a good balance between keeping I/O types immutable (making it easy to reason about TOCTOU issues) and not being too onerous.

Additionally, for VideoFrame, the pixel data will often reside in GPU memory. We don't want to copy to CPU for the sake of making an ArrayBuffer unless the user explicitly requests that. We achieve this by again offering a readInto() API (this time, making it async to avoid requiring synchronization that hurts GPU pipeline perf). For folks that just want to go from MediaStreamTrack -> Canvas or from VideoDecoder -> Canvas, this API would never be called and everything stays on the GPU. This lazy async readback option isn't feasible with an interface that just offers an ArrayBuffer.

In the long run, we hope the web platform will introduce a type of read-only ArrayBuffer, which enables us to do things like BYOB and expose buffer attributes (except for the GPU VideoFrame case mentioned above) without fear of TOCTOU issues. Extensive discussion on that in WICG/reducing-memory-copies#1

@youennf
Copy link
Collaborator Author

youennf commented Apr 26, 2021

The spec encourages users to close() immediately when they no longer need the frame (and Chrome actually warns users if it GC's a frame that wasn't closed()). I'm adding further clarity to the ref-counting behavior in w3c/webcodecs#167.

This is the kind of things I am worried.
It would be nice to have most web dev flows not needing to call close(), enqueuing a frame in a sink can call close() directly.
That will make web pages actually call clone() when they really need to.
But this is really outside the scope of this current thread.

I think I agree with you on raw frames vs. encoded frames.
It is nice if the same model applies to both frames but this is not a must.

With the Encoded*Chunk readInto() PR, this entails one copy when reading from the encoder output. I think it strikes a good balance between keeping I/O types immutable (making it easy to reason about TOCTOU issues) and not being too onerous.

Making it easy to reason about TOCTOU issues is mostly a concern from browser implementors.
We just need to make sure that a packetizer or a decoder will not read data that can be modified by JS during its processing.
Transferring data like done in WebRTC encoded transform spec solves this issue AFAIUI.

@alvestrand
Copy link
Contributor

For the metadata issue, I filed w3c/webcodecs#245 to track this in the WebCodec repo.
Things like the dependency relationships are generated by the encoder, so it seems logical that they need to be in the encoder output somehow.

@youennf
Copy link
Collaborator Author

youennf commented Aug 26, 2021

Discussed at editor's meeting, it seems we could reuse WebCodecs metadata and extend it for RTC.

@aboba aboba added the TPAC 2022 For discussion at TPAC label Sep 2, 2022
@alvestrand
Copy link
Contributor

Given the deployment of the RTCEncoded* types, it seems more Web-compatible to define a well-documented way to turn a WebCodecs encoded frame into an RTC encoded frame and vice versa.

Metadata is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TPAC 2022 For discussion at TPAC
Projects
None yet
Development

No branches or pull requests

5 participants