Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit metadata (SPS,VUI,SEI,...) during decoding #198

Open
chcunningham opened this issue Apr 28, 2021 · 8 comments
Open

Emit metadata (SPS,VUI,SEI,...) during decoding #198

chcunningham opened this issue Apr 28, 2021 · 8 comments
Labels
extension Interface changes that extend without breaking.

Comments

@chcunningham
Copy link
Collaborator

Splitting off an idea from @ytakio in #94 (comment)

I think WebCodecs needs a method to get Metadata (including feature check method, as informative in spec?)

Can you clarify what's meant by "including feature check method, as informative in spec"? Do you mean you'd like to query for what types of metadata could potentially be emitted?

It may be very useful even if WebCodecs just pass a start point of each Metadata byestream (like SPS,VUI,SEI...) to WebAPPs.

Can you elaborate on how it would be useful to apps?

@ytakio
Copy link

ytakio commented Apr 29, 2021

My explanation is too bad... I'm so sorry for my poor English 😅

Can you clarify what's meant by "including feature check method, as informative in spec"? Do you mean you'd like to query for what types of metadata could potentially be emitted?

I wanted to say that WebApp may want to know whether WebCodecs have a capability of detecting and parsing each metadata and pass them, before decoding.
(parsing means that generating a metadata Object from bitstream, in here)

And that capability of parse may not be mandatory. (I wanted to say it may be "Informative" in W3C's spec, though)

But I think it is hard to have feature of parsing metadata for every codec stream. So...

It may be very useful even if WebCodecs just pass a start point of each Metadata byestream (like SPS,VUI,SEI...) to WebAPPs.

Can you elaborate on how it would be useful to apps?

Some codec bitstream have some metadata in like the following.

  • aspect ratio, framerate
  • color information(range, coefficients to derive RGB)
  • HDR parameter
  • Time code
  • user data

Almost of above are for post processing. I think WebApp may want to handle by themselves.
And almost metadata block of each codec are designed easy to be parsed. But it's a little bit tough to seek boundary of each block for JavaScript App, I think.

So, I think it may be useful even if it just passes byte array of metadata block when WebCodecs finds metadata block in bitstream. (I think it may figure a register-type notifier with label such a "sps", "sei" like (.on("sei")) 🤔)

I'd appreciate it if you would confirm.

@chcunningham
Copy link
Collaborator Author

My explanation is too bad... I'm so sorry for my poor English 😅

Your English is good. I wish I spoke Japanese!

It may be very useful even if WebCodecs just pass a start point of each Metadata byestream (like SPS,VUI,SEI...) to WebAPPs.

Thanks, I follow the proposal now.

Would it generally make sense for such metadata to accompany a frame in the output callback? With semantics being: this metadata describes the current and subsequent frames)?

@chcunningham
Copy link
Collaborator Author

Triage note: marking 'extension', as the proposal would likely be implemented with additional callbacks or arguments.

@chcunningham chcunningham added the extension Interface changes that extend without breaking. label May 12, 2021
@ytakio
Copy link

ytakio commented May 13, 2021

Thank you for encouraging me ;)

Would it generally make sense for such metadata to accompany a frame in the output callback?

It seems enough to work, I think.
A frame may have multiple metadata blocks (SPS contained VUI includes an aspect ratio information; SEI NALs). It may be good to have an Array of metadata bytes ( BufferSource?) in VideoFrame.

On the other hand, if VideoDecoderConfig.description include SPS NAL, WebApp can parse it by themselves.
(e.g. In case of initialization segment received)

With semantics being: this metadata describes the current and subsequent frames)?

Your recognitions are no problem, I think :)
(In detail, a few metadata will be assigned to specified sequence ID as whole presentation, but WebApp may be in charge of controlling it)

@cvanwinkle
Copy link

cvanwinkle commented Nov 16, 2021

Regarding reading SEI information, one use case for SEI information is to read pre-existing CEA-608/CEA-708 closed captions. A tool could then modify them, retime them, and then re-export or convert to TTML or something. I had previously worked on a (desktop) tool that needed to read raw closed captions from media files which had to do this for other file formats. In that scenario, emitting the data as part of the frame callback could work, but if there's a way to just retrieve the SEI information without having to do the work of decoding the actual frames that would be even better. The reason for that is the video frames may only be requested on-demand (i.e. starting playback mid-way through on a video file in a video editing application) but the entirety of the SEI information may be good to know up-front for the captions scenario above or perhaps others (but not a deal breaker). In other words, be able to scan the entire file for SEI data without also doing the work to decode each frame.

@darkvertex
Copy link

This would have been useful for me. I have a use case where I worked around the fact WebRTC didn't let me do frame-accurate synced A+V+data by embedding small JSON metadata into SEI subtype 5 (aka "unregistered user data".)

If the WebCodecs API had a callback or some way to consume the SEIs, I could have made a webapp that debugged it instead of an inconvenient separate standalone software.

@leonardoFu
Copy link

leonardoFu commented Jun 21, 2022

I have several use cases related to this feature, we use SEI information to render effects on video, and also calculate the end to end delay from the broadcaster to web client. I am pushing an proposal which allows web app to get SEI information from video element, if we use webcodecs, we can have the SEI in a frame level accuracy, which is really helpful in video edit scenario

@sandersdan
Copy link
Contributor

The ability to attach metadata to a frame has recently been discussed in #189, and it seems likely that progress will be made there. This would solve one of the blockers here, a convenient way to expose the metadata.

What we are missing is primarily certainty that all future WebCodecs applications are able and willing to extract this metadata from the bytestream. Since it's possible to implement the extraction in JS, there needs to be a compelling reason to have the WebCodecs implementation do it.

Would it help significantly to support passing through user-provided metadata from EncodedVideoChunk to VideoFrame? I anticipate Chrome would do so purely by matching timestamps, so it may not be any more powerful than what JS can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Interface changes that extend without breaking.
Projects
None yet
Development

No branches or pull requests

6 participants