New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional Focus (When Display-Capture Starts) #190
Comments
This looks amazing, been waiting for this for a long time! In tella.tv we have the following flow:
If i'm looking at this proposal right, our new flow could be:
A much nicer flow. Would love to be in the Origin Trial when it's available :) |
Ah, reading the spec again, missed this point: That means we would not be able to do "We fire .focus on the stream to bring them where they need to be automatically." after countdown. |
Please note that 1s is the hard-limit, and normally the window of opportunity to calling The rationale for these limitations is preventing attacks where the capturing application induces the user to click at it chosen location in the captured application by switching focus at just the right time. As a somewhat contrived example, imagine the capturing application presenting at just the right location a button labelled "double-click here to confirm" and then switching focus after it registers the first click. Are these concerns realistic? I'm not yet sure, but it should be easier to reach consensus if we err on the side of caution. |
Thanks for explaining! Preventing the focus will already help us a lot. |
@jan-ivar and @youennf, I'd love to carry on from where we stopped at the end of the WebRTC WG Interim meeting yesterday. @youennf
Please note that the intention here is that if focus() is not called by the time MT terminates, then the browser immediately switches focus - without a 1s delay. The 1s delay is a backup in case MT takes an inordinately long time to finish.
Please let me know if I've forgotten any other open issues. |
Btw, following the interim it became clearer to me which parts of my proposed spec were in need of clarification. I have made adjustments, mostly around (a) explaining the window of opportunity and (b) providing examples. PTAL. |
A user's focus is a global property, so a per-track API simply isn't necessary, and we can avoid the subclassing headache.
I think you're looking at this the wrong way. How does a tighter window of opportunity benefit slow apps? What matters is the amount of code that runs up to the point of invoking the focus method. In your model, JS gets a whole second as long as it stays synchronous, but a single microtask switch (e.g. That seems too harsh to me, making things harder on apps, not easier. It also seems biased against apps that use promises. Microtask-switching is sometimes unavoidable when writing promise code (and arguably why microtasks exist — their whole point is that they're not tasks, and operate within a task). For example: async function getUserMedia(constraints) {
const stream = await navigator.mediaDevices.getUserMedia(constraints);
// store track ids in localStorage
return stream;
} This introduces a microtask switch on code calling this function as a drop-in for the real thing — but it's on the same task.🤔 From a security perspective, tasks and microtasks are equally vulnerable to busy-looping. |
@alvestrand asked me to outline the API shape I propose. I'd prefer: console.log(navigator.mediaDevices.displayMediaFocusMode); // "focus"
const stream = await navigator.mediaDevices.getDisplayMedia();
navigator.mediaDevices.displayMediaFocusMode = "no-focus"; // stream's target won't be focused It's a simple read-write enum attribute with two values. It never changes value except when JS changes it. UA reads it in a task queued from The app can set it ahead of time as well, if it doesn't care what the user picks: navigator.mediaDevices.displayMediaFocusMode = "no-focus"; // stream's target won't be focused
const stream = await navigator.mediaDevices.getDisplayMedia(); TL;DR: it affects subsequent calls, but if you set it immediately, it affects what just resolved as well. |
I don't think this is going to "play nice" if the document is engaged in multiple concurrent captures.
I don't think that's simple, because the association between what gets captured and what gets focused is not there, and concurrent captures become clouded by questions of timing.
How is
One does not need a lot of code to place a "Click here to win a million dollars" button and then guess how long to wait before switching focus to the other tab and getting the user to click something there. (Moreover, if you use tasks rather than microtasks, I think it's easier to wait for an on-over event, making guesses about when the click is coming easier. But I could be wrong there. Let's call this a secondary argument.) I don't think we can measure and base a decision off of how much code runs, regardless of whether the window-of-opportunity is closed by a task or a microtask. Either could run an arbitrary amount of code. The user's reaction time, though...
Some of my best friends are apps that use Promises. I have found them to be capable of switching between using Promises and doing things synchronously, as the need arises. |
It should: There's only one user, who can only accept one prompt at a time. The association is with the prompt — the user interaction — not the track. Example: Imagine a hypothetical
This should suffice: if ('displayMediaFocusMode' in navigator.mediaDevices) { /* UA feature detected */ } All browsers today automatically focus all focusable surfaces ( I'd recommend we strongly encourage UAs to allow apps to override this automatic focus for all focusable surfaces. This removes any need for per-surface-type detection of policies.
We discussed offline that gDM can queue two tasks A and B, guaranteeing no other task runs in between, so events aren't an issue. The promise is resolved in A, creating microtask A₁ where JS runs. Your proposal is to limit calls to between A₁-A₂, whereas I'm saying A₁-B is fine and superior as it avoids tripping up promise-use. The 1 second timeout is orthogonal. |
I also prefer an attribute over an explicit call for each track. As of event task vs. micro task, there are preexisting examples. The main advantage I see is that it is a clear web developer contract: the boundary to call focus is the next await. From a pure spec/developer point of view, it should be also easier to specify and implement: queue a task where the promise is resolved and a promise resolution callback to read the value and apply focus is registered. |
👍
Event listeners must respond synchronously to prevent bubbling/avoid default for historical reasons. Those "properties" appear to be fetch trying to support promise code with
I'll try to explain the shimming problem in more detail: Say a JS library (e.g. adapter.js) needs to shim gDM for some reason: const nativeGDM = window.navigator.mediaDevices.getDisplayMedia;
window.navigator.mediaDevices.getDisplayMedia = async function getDisplayMedia(constraints) {
// Here we're on Task A
const stream = await nativeGDM.apply(this, arguments);
// Here we're on Task B, Microtask B₁
stream.newFeatureX = 3;
return stream; // the implicit promise returned by this async function is resolved with stream
} The shim is careful not to await anything else, yet a microtask checkpoint is unavoidable, because the promise returned by the shim is not the same as the one from // Here we're on Task A
const stream = await navigator.mediaDevices.getDisplayMedia({video: true});
// Here we'd be on Task B, Microtask B₁ without the shim, but with the shim we're on Microtask B₂ instead
navigator.mediaDevices.displayMediaFocusMode = "no-focus"; // focuses with the shim but not without! |
Global or per-surface controls
These are two distinct preferences:
The browser can operate in modes which skip the prompt. Mechanisms to trigger these include extensions, enterprise policies and command-line arguments. The spec may be agnostic of these, but the fact remains that the user can accept multiple different captures within a very short span of time. At any rate, if the application fires off two calls to getDisplayMedia and wants to focus exactly one of these, then it's a lot more ergonomic to call focus() on the right track, than to to manipulate a global attribute at just the right time, ensuring it's the intended value when the UA reads it for the one display-surface and the other value when the UA reads it for another display-surface. It requires of the Web-developer much more in-depth understanding. Method vs. AttributeAssume, for the sake of argument, that my previous section convinced you to use per-surface controls. Do we want a method or attribute then? An application that can read the value may just as well set its own preferred value. An attribute for However, before even calling Btw, one challenge with writable attributes is that Web-developers would not as readily expect setting of attributes to potentially raise an exception. Subclassing MediaStreamTrackI think we have seen multiple cases where subclassing MediaStreamTrack would have confered benefits, but each time a discussion arose over whether it's enough to sub-class just for that. The results of having everything on MediaStreamTrack is sub-optimal. Some immediate beneficiary APIs of a decision to sub-class would be:
IMHO, this list is sufficiently long and the benefits are sufficient. When someone calls Tasks vs. MicrotasksIIANM, the only argument for tasks is that they are shim-friendly. (Please correct me if I'm wrong.) An argument against tasks is that in addition to shimming, it allows an application to This trade-off is easy to reason about (IMHO) because we can have both. If we use microtasks, shimming is possible with an adapter: const nativeGDM = window.navigator.mediaDevices.getDisplayMedia;
function focusCallback(stream) {
// Return "no-focus-change" or "focus-captured-surface"
}
window.navigator.mediaDevices.getDisplayMedia = async function getDisplayMedia(constraints, focusCallback) {
const stream = await nativeGDM.apply(this, arguments);
const [track] = stream.getVideoTracks();
if (!!focusCallback && !!track.focus) {
const shouldFocus = focusCallback(stream);
track.focus(shouldFocus);
}
return stream;
} The code outside the shim just plugs their callback. Note that there are natural limits on what the app can do anyway until the window-of-opportunity closes, so I expect the code would easily and naturally fit inside of a synchronous callback. |
It is avoidable by having the shim returning the native promise returned by getDisplayMedia. Looking at the HTML event loop, we can also see that after the micro task checkpoint, there are additional tasks that need to be done, some of which might trigger firing events.
I was meaning a global attribute, somewhere in navigator. Let's say the application wants a behavior for all getDisplayMedia calls. It will set the attribute once. A method is less intuitive to me.
I do not think the plan for setting this attribute is to raise an exception. |
A Web-developer would need non-trivial understanding of the feature in order to become convinced that there'd be no carry-over effect of focusing the wrong surface. They might explain it to themselves as "only the last capture gets focus or not" - but then, why does it sometimes not seem to work? There is no exception raised when they set the new value too late (e.g. |
@youennf How? Please be specific.
Agreed (not beyond WebIDL's |
Contrast: 1.const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await someOtherPromisesThatResolvesMuchLater;
track.focus("no-focus-change"); 2.const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await someOtherPromisesThatResolvesMuchLater;
navigator.mediaDevices.focusPolicy = "no-focus-change"; 3.const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await immediatelyResolvedPromise;
navigator.mediaDevices.focusPolicy = "no-focus-change";
const otherStream = await navigator.mediaDevices.getDisplayMedia();
Only the first option produces consistent results (and raises a clear exception when used incorrectly). |
That is out of scope for this working group.
@eladalon1983 Not a problem. JS is single-threaded, and getDisplayMedia implicitly queues a task to resolve a promise from in parallel steps. My proposal would be to queue two tasks, guaranteed to happen in succession. There is no "span of time" short enough for two
You have to call focus() at just the right time as well. Also, if ergonomics is the issue, why isn't it |
I have foreseen this response and added the text which you quoted immediately below (agnosticism etc.). The topic is not whether we can handle it spec-wise, but rather whether it produces ergonomic results for the application. Consider: const stream1 = await navigator.mediaDevices.getDisplayMedia();
doSomething(); // Maybe ends the task, maybe doesn't.
navigator.mediaDevices.focusPolicy = "no-focus"; // Who is affected? s1? s2? Both? Neither?
doSomethingElse(); // Maybe ends the task, maybe doesn't.
const stream2 = await navigator.mediaDevices.getDisplayMedia(); The global-attribute API produces code which is hard to reason about. Does it affect
My assertion is not that it's impossible to spec this properly. I argue that the result is not ergonomic and not simple.
Well, if you think ergonomics would be improved by moving |
|
Of the three listed, which is a false choice. We don't need to subclass anything to throw. See below:
So you'd introduce an attribute and a method? Contrast:
vs.
|
@youennf Sorry, but each |
No. We're not on the same page. Maybe a quick summary of my position would help. I am only interested in a per-track method, but I am analyzing all options in my attempt to convince you that the option I favor is superior. When I discuss inability to raise an exception, I do not mean it's because the attribute is an attribute. Rather, the problem arises because that global-attribute API is always valid to manipulate. Namely, an application could always be setting the value for the next gDM to follow. I claim that inability to reliably raise an exception is a drawback. It leaves Web-developers in the dark when their applications behave inconsistently. It fails silently. That's a problem. |
Why, when it is less ergonomic than
I showed an option 1 above that would throw outside the same envelope, to show there's no need to subclass MST here. From my end, the conversation has led to the options I show (1 and 2) which I think capture the remaining open question: Would we rather offer JS a default value, or a clear exception when JS misses an (obvious) time window, or both (at the cost of double the API surface)? |
The original motivation I heard for using a global attribute, was that it allows to:
So:
Do you now propose a version of the global that does raise exceptions? If so - that loses benefit (2). The UA cannot simultaneously, with a single API, allow the Web-developer to influence the future as well as warn them when they're too late to influence the past. So my question is - which API are you now proposing, and what are its benefits over |
It would be more like this:
(Where you place step 1.3 doesn't really matter as it'll only be observable after 1.1 and 1.2) |
I am not sure I understand your proposal. It seems like the one I classified as "before" in an earlier comment, but I could be misunderstanding. function callback(mediaStream) {
console.log('b');
mediaStream.removeTrack(mediaStream.getVideoTracks()[0]);
console.log('c');
}
console.log('a');
const mediaStream = navigator.mediaDevices.getDisplayMedia({video: true}, callback);
console.log('d');
const [track] = (await mediaStream).getVideoTracks();
console.log('e'); What's going to happen with the code above? |
Order would be a, d, b, c, e if I'm not mistaken. You can indeed remove tracks, but this can also happen if you hand out the promise in multiple places. That doesn't really seem problematic to me. |
So you were indeed referring to the before option from this comment, which means I was right to edit away my misunderstanding in I think giving access to the MediaStream to something that executes before the Promise resolves violates POLA. Are you aware of precedents, perhaps? |
Yeah, e.g., https://notifications.spec.whatwg.org/#dom-notification-requestpermission. I'm pretty sure we also have places that dispatch an event and resolve a promise in the same task. Stems from how promises work. (And to be pedantic, the promise can be resolved before, but it being resolved cannot be observed until after.) |
How about this: const stream = await navigator.mediaDevices.getDisplayMedia();
navigator.mediaDevices.addEventListener("focus", e => decide(stream) || e.preventDefault(), {once: true}); This would be the entire API surface. No need for exceptions. I'm open to bikeshedding the event name. |
The use of |
Yes. |
FWIW, as I understand it the task responsible for resolving |
I'm wondering if we could in the future run into an issue where some operating systems require a separate permission before they allow the UA to switch focus between applications. We might want to get around that by specifying the default behavior as "the user agent MUST try to focus the captured surface." Wdyt? |
I would also prefer just one task.
I do not see why we should delay the focus algorithm. |
I ran a test to disprove this https://jsfiddle.net/jib1/q75yb8pf/ but I was also surprised to learn that Firefox blurs the capturer window 15 ms before it resolves the promise (I thought it happened much later):
But if we're worried about web compat here we have a much bigger problem: Chrome's prompt steals focus.
We've never specified exactly when (non-browser) window is focused before, and I'm not sure we need to here. What we're specifying here is a JS control opportunity by requiring the browser to fire a warning ping "hey, I plan to focus another window, you ok with that?", which to me doesn't inherently need to match up with any user-visible behavior timing wise. |
I don't see a problem with returning the original. Clones are trouble.
I'd call it the "focus decision algorithm", since it doesn't seem necessary to specify when exactly actual focusing takes place.
Let's say I did this: async function getDisplayMedia() {
const stream = await navigator.mediaDevices.getDisplayMedia();
console.log("A");
return stream;
}
const p = getDisplayMedia();
await new Promise(r => navigator.mediaDevices.onfocus = e => r(decide(e.stream) || e.preventDefault()));
console.log("B");
const stream = await p; Would this produce
Tasks have cleaner semantics than messing with microtask order. |
It would produce B, A. As there's no JavaScript on the stack, the microtask is queue is drained after event callback invocation. I think that indeed makes the order observable and therefore it's good to use the order @youennf proposed. |
Another approach to consider is to use a callback given to getDisplayMedia as input parameter. |
Problem
When an application starts capturing a display-surface, the user agent faces a decision - should the captured display-surface be brought to the forefront, or should the capturing application retain focus.
The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore ill-positioned to make an informed decision.
In contrast, the capturing application is familiar with its own properties, and is therefore better suited to make this decision. Moreover, by reading displaySurface and/or using Capture Handle, the capturing application can learn about the captured display-surface, driving an even more informed decision.
Sample Use Case
For example, a video conferencing application may wish to:
Suggested Solution and Demo
Sample Code
Security Concerns
One noteworthy security concerns is that allowing switching focus at an arbitrary moment could allow clickjacking attacks. The suggested spec addresses this concern by limiting the time when focus-switching may be triggered/suppressed - the application may only decide about focus immediately[*] upon the resolution of the
Promise<MediaStream>
. (See the spec-draft for more details about what "immediately" means and how I suggest various edge-cases be handled.)The text was updated successfully, but these errors were encountered: