New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is using ReadableStream/WritableStream the right approach? #4
Comments
It's not clear to me what "not as safe as" refers to here - what's the threat model that makes this a safety difference? This is important for several applications, including using WebCodecs to encode/decode, and for using other transports than PeerConnection to move data around. |
That is indeed an important point. While anyone can construct streams however they want, streams are not consumable by any existing media API (HTMLMediaElement, MediaRecorder, RTCPeerConnection).
These are new APIs. Do they plan to use streams, frames or both as input? |
At the risk of sounding like I have not been paying attention, but can someone explain to me, what is the difference between approach a) a separate read and a write stream, and b) a TransformStream. The latter looks like it contains a read and a write stream anyway, isn't this the same thing? |
A TransformStream can be implemented natively or in JS. TransformStreams have been used to implement native transforms like compression or text encoding/decoding. If you expose a read and write stream, it is easy to use a transform, using something like read.pipeThrough(transform).pipeTo(write). The issue I am investigating here is whether we want to expose ReadableStream/WritableStream as a sort of replacement to MediaStreamTrack. I would derive some use-cases/requirements for video:
|
An advantage of the readable/writable approach here is that processors and generators allow the creation of custom sources and sinks. |
This seems to be use case 1:
https://wicg.github.io/web-codecs/#mediastreamtrack-monitoring can be used for the same purpose. |
VideoTrackReader in WebCodecs is being removed in favor of MediaStreamTrackProcessor because they have the same purpose, but MediaStreamTrackProcessor has better support for running on workers. |
I think separating readable/writable to allow implementation of custom sources, sinks and allowing transformations provides more flexibility than the transform-only approach. It's also easy to understand in terms of the familiar source-track-sink model. |
ScriptProcessorNode is an evidence of this issue for audio and the plan is to obsolete it, not complement it with Audio Worklet. Why should we reintroduce this potential issue?
There are other potential issues that might get solved (or be more acute) once the relationship between a track and its streams is made more precise. Currently the spec is light there so it is difficult to assess it:
Also, this somehow creates two APIs doing roughly the same thing: MediaStreamTrack and ReadableStream/WritableStream pair. API-wise, we would prefer sticking with MediaStreamTrack producers and consumers. Or do we anticipate other APIs to get raw content directly from streams and replace MediaStreamTrack progressively? Talking of the signal control API, I am not clear about what happens if the processor does not process the signals or is not prepared to understand some signals (given it is extensible). I could think of some transforms just forgetting about sending the signals up in the chain. Maybe I should file separate issues but I want to ensure we pick the right design before we spend time on editing work. |
I don't think it's exactly the same case. Could you send a ScriptProcessorNode to a worker to free up the main thread?
Yes, the processor is able to specify the number of buffered frames (we recently updated the spec to reflect this).
Cloning a track in general gives you another track backed by the same source. Cloning a MediaStreamTrackGenerator (which is a track) returns a regular MediaStreamTrack that will carry the same frames written to the MediaStreamTrackGenerator via its writable. Minor correction: tee() sends the same VideoFrame object to both streams, not 2 different objects backed by the same underlying frame.
I don't anticipate replacing MediaStreamTracks with streams, since they do not even function as tracks. Streams allow applications to define custom MediaStreamTrack producers and consumers (i.e., sources and sinks). These custom sources and sinks defined in JS are not intended to replace platform sources and sinks, let alone tracks.
The purpose of writableControl and readableControl is to expose a control plane that already exists behind the scenes between sinks and sources. The idea is to allow control feedback to flow between custom and platform sinks and sources. They are neither a replacement nor an alternative to enabled/disabled/applyConstraints. There might be some overlap with applyConstraints, but the concept is different. For example, a peer connection platform implementation cannot disable a track, but it can (and in some practical implementations does) sometimes ask a source for a new frame.
Nothing should happen if a source (custom or platform) does not understand a signal, or if a sink (custom or platform) does not send a signal. They are intended to be hints to which sources can (but are not required to) react. If a signal does not make sense for a particular source or sink, they do not need to use it. For example, today some platform sinks can request a new frame from a platform source. Some sources are designed to produce a new frame when that signal is received and other sources just ignore the signal. This happens behind the scenes using a regular MediaStreamTrack with platform sources and sinks, independent from the API proposed here.
The spec currently does not describe what to do in this case, which means it's up to the implementation. I agree that the spec should be clearer here. Signals are just hints, so we should not provide delivery guarantees, but we could allow the application to specify buffer sizes as is the case for media.
An application producing frames using a generator can interpret the signals it receives via readableControl in any way it wants, so overriding is the only possible way to proceed there. Platform sources in practice already handle signals, either by ignoring them or by acting on them if it makes sense. For example, a camera capturer could ignore signals for new frames, while a screen capturer might produce a new frame if requested.
The WebCodecs group compared VideoTrackReader with MediaStreamTrackProcessor and it was decided to remove VideoTrackReader from the WebCodecs spec (it's gone already). VideoTrackReader was basically the same as MediaStreamTrackProcessor, but always running on the main thread and without control signals. Processing in workers required the application to transfer the VideoFrames to the worker. |
Another use case the separate processor/generator enables is that it allows an application to combine frames from different tracks into a single track. A possible use case is a "weather report" application, where one track contains a camera track with the person making a presentation and the other track contains the map/presentation. An application could merge both tracks (each with a processor) into a single track (using a generator). |
To summarize the answer to the question if ReadableStream/WritableStream is the right approach: b) Is separating readable and writable the right approach? I believe it is. Unlike a transform-only approach, separating readable and writable allows us to support more valid use cases such as:
c) Should we allow processing on the main thread?. Streams make it convenient to do processing both on the main thread and on workers. The fact that many applications are currently using video element + canvas + canvas capture with much (or all) of the processing on the main thread means that the main thread works well in many production scenarios, so it's not necessarily something we should try to forbid or make excessively difficult. |
A Transform is what you plug in betwen a MediaTrackProcessor and a MediaTrackGenerator. If there are native transforms defined, we can plug them straight in, and use PipeThrough. For a callback based API, no such pluggability is possible except on a case-by-case basis. So separating Readable and Writable is a prerequisite for using native transforms, not an alternative to them. |
I've been asked to present to this group about WebCodecs' experience using We wrote a public document outlining our reasons for moving away from WebCodecs also built a non- |
@sandersdan Your experience implementing |
interface VideoTrackReader {
constructor(MediaStreamTrack track);
void start(OutputCallback callback);
void stop();
}; At the time interface VideoTrackWriter {
constructor(VideoTrackWriterInit init);
readonly attribute MediaStreamTrack track;
void write(VideoFrame frame);
}; I will review the discussion leading to the abandonment. My recollection is that the Insertable Streams proposal was developed in parallel, and it was uncontroversial to defer to WebRTC experts for |
The decision was publicly documented in w3c/webcodecs#131 (comment). |
The drawbacks of VTR/VTW seem like they will apply to any callback API, due to basic properties of the problem statement (conversion between tracks and
With respect to major criticisms of the current API (e.g. memory leaks and lack of support for mute/unmute), the Callback approach does not appear more promising:
Overall, it seems to me that the Callback approach will at best be "different but not better". |
This issue is somehow a duplicate of https://github.com/w3c/mediacapture-extensions/issues/23. |
@aboba, I am wondering what are your thoughts after the presentation made on promise-based callback (a la TransformStream). |
The current proposed API is based on ReadableStream of frames.
It does not seem that pros and cons of this approach have been documented.
It would also be interesting to list what other approaches could be envisioned and evaluate them.
For WebRTC insertable streams, we anticipate defining native transforms so it is convenient to be able to use streams algorithms like pipeTo/pipeThrough for free. And there is no stream-like object we can hook to.
If we look at media capture, we already have the MediaStreamTrack which is a kind of ReadableStream.
For instance, it can piped into web audio or generated from web audio or canvas.
It can also be cloned.
If we want to have a native transform that does WebGL stuff, we could simply have a method that takes a MediaStreamTrack, some WebGL stuff and create a new MediaStreamTrack.
Downsides mentioned for webrtc insertable streams can also be mentioned here.
For instance, cloning a ReadableStream is not as safe as cloning a MediaStreamTrack.
Or the fact that this API makes it convenient to do processing in the main thread, which might be a real foot gun.
The text was updated successfully, but these errors were encountered: