Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getViewportMedia(): Let pages opt-in to capture #1

Closed
jan-ivar opened this issue Dec 15, 2020 · 24 comments · Fixed by #3
Closed

getViewportMedia(): Let pages opt-in to capture #1

jan-ivar opened this issue Dec 15, 2020 · 24 comments · Fixed by #3

Comments

@jan-ivar
Copy link
Member

Compared to native, capturing a web page has been plagued by security concerns since this spec's inception. Can we imagine a web that's safer to screen-capture, similar to a web where SharedArrayBuffer is safer to use?

What if we came across a page that met these conditions? We could remove restrictions added for the legacy web, both for some choices in the existing getDisplayMedia and new APIs like the proposed getTabMedia getViewportMedia.

One idea for such opt-in is Cross-Origin-Embedder-Policy: require-corp; html-capture covered in slides here.

I'm opening an issue to discuss this independent of those APIs. Note it only covers the cross-origin issue. Permissions are still needed to cover user information harvesting.

I've also created a COEP playground in glitch ([1], [2]) for those unfamiliar with the concept and want to experiment with it.

@jan-ivar
Copy link
Member Author

@eladalon1983 @youennf

@eladalon1983
Copy link
Member

eladalon1983 commented Mar 17, 2021

Chrome Security shared the view that a confirmation-only flow would require security measures roughly in line with what you suggest (that is - opt-in). I'd love to resume this conversation and try to agree on the details.

The way I see it, we have two new requirements - (1) site-isolation and (2) a new header. We think the new header should apply to documents. Site-isolation will allow sub-resources to opt into being captured by their document, and the new header will allow one document to opt into to being captured by another. Embedded documents will also be opting in on behalf of the sub-resources which they embed.

Rationale for applying the header only to documents:

  1. The site-isolation requirement suffices to receive some opt-in from resources.
  2. Resources are same-process with their embedding document, and therefore cannot reasonably expect to be fully protected from readability by those documents.
  3. Resources will typically be embedded from further afield than documents. The onus of COOP/COEP is already great enough. Requiring an additional header might stymie adoption.
  4. Headers have a non-zero bandwidth cost.

For the shape of the new header, I think it would be useful to first decide what we expect the behavior to be. I think that it's important that failure of a document to opt-in using the header would not prevent loading the document, but rather, it would (1) break off on-going captures and (2) prevent the initiation of new captures. One rationale for this choice is the expectation that for most applications, flows involving screen-capture are not the primary flows. Especially screen-capture initiated via gCBCM. For example, in Google Slides, editing slide decks, reading decks written by others, commenting on decks, etc., are much more common use cases than sharing using gCBCM. It would be preferrable to keep these flows intact 100% of the time, at the cost of (rarely) breaking off a gCBCM-initiated capture. I expect the majority of other applications will see things similarly.

If we agree on the above (suppress capture instead of prevent loading), then we need a new header that lends itself to such behavior. This rules out document policies, at least in their current form, because they prevent loading of non-conforming documents. Feature-policies, I think, could be bent to work here. It would require that we use two policies, and that we use the second one creatively, but I think it will work, and will spare us the need to define something new.

  1. First, we'll use a feature-policy like display-capture, probably with a new name, to allow embedders to give gCBCM-calling permission to embedded frames. For this feature-policy, we'll only care about it when it's specified on the iframe-tag; it only goes "down" the tree.
  2. Second, we'll define a new feature-policy called capturable-by or something similar. We'll only care about it when it appears in the HTTP header; it only goes "up" the tree. It will allow a document to opt into being embedded by specific embedders.
  3. Finally, we'll define capturability to be transitive. This means that if the top-level document (TL) has two frames F1 and F2, and if F1 has opt-into being captured by TL using capturable-by, then F2 can call gCBCM if-and-only-if TL gives F2 permission to do so using display-capture. This means TL can give F2 permission on behalf of F1, too. This requires some custom code that maintains a permission tree, with permissions of type 1 going "down" the tree and permissions of type 2 going "up" the tree.

Wdyt?

@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 24, 2021

two new requirements - (1) site-isolation and (2) a new header.

@elad Great to hear! — To clarify, By "site-isolation" do you mean COOP+COEP or just COEP? — I don't think COOP matters here, except making it simpler to talk about perhaps (mapping 1-1 with window.crossOriginIsolated).

Resources are same-process with their embedding document, and therefore cannot reasonably expect to be fully protected from readability by those documents.

I shudder using Spectre as a reason, but probably true. 👻

I think I'm ok with it. I wasn't planning to require headers for resources either. cc @annevk @mystor

To clarify what we're talking about, these are resources that set Cross-Origin-Resource-Policy = "cross-origin" but not Access-Control-Allow-Origin, keeping their opaqueness, yet would be captured by this feature.

E.g. "image2 canvas data CANNOT be read" in this example (/corp/logo.png served here).

But I'm not 100% sure this isn't a problem. Are there sites out there serving personal info from cookies this way, relying on it being opaque? — A malicious site tricking a user for this permission could quickly steal personal info from all over the web this way. But as you point out, the same malicious site could use Spectre for this today without permission.

@jan-ivar jan-ivar reopened this Mar 24, 2021
@jan-ivar
Copy link
Member Author

jan-ivar commented Mar 24, 2021

... For example, in Google Slides, editing slide decks, reading decks written by others, commenting on decks, etc., are much more common use cases than sharing using gCBCM. It would be preferrable to keep these flows intact 100% of the time, at the cost of (rarely) breaking off a gCBCM-initiated capture. ...

Most common maybe, but != most important IMHO. Having just spent days prepping a presentation, I can confidently say the time most important to me to not break would be during the final presentation in front of an online audience. 🙂 I think I'd much prefer it break during preview, when there is still time to fix things.

At the risk of sounding presumptuous, if this feature works out, it might quickly become the most important target for Google Slides, at least during this pandemic.

But behavior seems orthogonal to the shape of headers to me. E.g. I don't see why we couldn't treat Cross-Origin-Embedder-Policy: require-corp; html-capture as Cross-Origin-Embedder-Policy: require-corp and simply update some window.isCapturable accordingly and terminate captures if that's what we decide is best (though as I said above, I personally doubt it is).

... Feature-policies, I think, could be bent to work here. ...

It's "Permissions policy" now, and the Mozilla position on the HTTP headers part is still lukewarm. We only implement the iframe part FWIW.

it only goes "up" the tree. It will allow a document to opt into being embedded by specific embedders.

Is this a requirement? Wouldn't you use Content-Security-Policy: frame-ancestors uri instead?

In any case, going "up" the tree would go against every other permissions policy, and there would be no analogous allow= version. Why do we need to fit in here?

Why not Cross-Origin-Embedder-Policy: require-corp; html-capture or just make up a header?

@annevk
Copy link
Member

annevk commented Mar 24, 2021

https://github.com/w3c/mediacapture-screen-share/issues/155#issuecomment-801493527 talks about site isolation, but do you mean cross-origin isolation, i.e., requiring COOP+COEP? It would help to see something more flushed out.

Once you have COOP+COEP the main risk that remains (as Nika mentioned) is controls and such that might capture user state. Or :visited for instance.

@jan-ivar
Copy link
Member Author

From meeting (slide):

  • Assume the stricter COOP+COEP for now, to move forward. Can always be relaxed later.
  • No objection to resources being on their own
  • No objection to name getViewportMedia
  • Continue discussion on blocking loading vs killing capture.

@youennf
Copy link

youennf commented Mar 25, 2021

  • No objection to name getViewportMedia

We did not really talk about that.
It is unclear whether this API should expose the whole tab or just the context (and children) in which this API is called.
If we go down that road, I wonder whether we might not want to go to the granularity of any element: clipping will be computed based on element plus viewport.

@jan-ivar
Copy link
Member Author

clipping will be computed based on element plus viewport.

So element.getViewportMedia()? The name still seems to hold even if we do that. See also w3c/mediacapture-screen-share#148 (comment).

@youennf
Copy link

youennf commented Mar 25, 2021

We have HTMLMediaElement.captureStream.
I guess we could have captureViewPort.

@eladalon1983
Copy link
Member

eladalon1983 commented Apr 1, 2021

Cross-Origin-Embedder-Policy: require-corp; html-capture

  1. I think html-capture is more appropriate for element-level capture. We should probably go with embedder-allowed-viewport-capture, viewport-capture, or something similar. Wdyt?
  2. I'm generally OK with this approach, and I believe Chrome Security would be too. (I got a positive signal for both this approach as well as for using a required document policy.) One potential problem I see with this approach is that AFAICT, this mixes the concepts of requiring a feature and permitting it. IIUC, a document that wishes to consent to embedder-allowed-viewport-capture when embedded, but which does not require it when loaded as a top-level document, would need the server to conditionally set the field, and I am not sure this is ideal. Wdyt?

To clarify, By "site-isolation" do you mean COOP+COEP or just COEP?

I see your point about COOP not being strictly necessary, but Chrome Security believes it would be less confusing to developers if feature requirements fall into one of a small number of buckets, and are not mixed-and-matched on a per-feature basis. I suggest we start out with COOP+COEP and relax later if appropriate, and I think you suggest the same.

To clarify what we're talking about, these are resources that set Cross-Origin-Resource-Policy = "cross-origin" but not Access-Control-Allow-Origin, keeping their opaqueness, yet would be captured by this feature.
...
But I'm not 100% sure this isn't a problem. Are there sites out there serving personal info from cookies this way, relying on it being opaque?

The Glitch sample is very useful for people who, like me, are making their first steps with COOP/COEP/etc. I'll be referring others to it. Thank you for setting it up!

I understand how CORP-without-ACAO, meaning the resource is embeddable but not readable, sounds problematic. But IIUC, a malicious document could use Spectre to read any embedded resource anyway, meaning CORP-without-ACAO would not really protect against a malicious document - only make legitimate sites' lives harder. Please correct me if I am wrong, though.

Continue discussion on blocking loading vs killing capture.

My own opinion is still that killing capture would be preferrable, but Chrome Security has expressed the opinion that, should embedder-allowed-viewport-capture be joined by feature2, feature3, etc., then these different features would all have to separately support unexpected break-off, doing so in a race-condition free way, etc. I understand this consideration. If the consensus (modulo me) is that we should suppress loading, then lets do that.

Why not Cross-Origin-Embedder-Policy: require-corp; html-capture or just make up a header?

Please forget my earlier reference to permission-policy; it's no longer relevant. I am not entirely opposed to COEP: require-corp; feature1... But I still see potential value in using document-policy instead. I think it would create a clear separation between requiring a feature and permitting it. And we won't have to standardize another header, which is a plus.

Or :visited for instance.

There is a finite set of user-data mining opportunities presented by either gVM/gDM. I think we should, separately, identify them and come up with proposals for what a user agent SHOULD/MAY do while a capture is active in order to protect the user's privacy. Note, however, that the user might actually be intending to share these, e.g. if receiving support from a trusted relative. I think it would be good to leave the final decision to individual UAs.

It is unclear whether this API should expose the whole tab or just the context (and children) in which this API is called.

Exposing just the context (and its children) would make the feature equivalent to element-level capture, modulo "element" being only frames. Element-level capture comes with some benefits and some drawbacks, and it's often a matter of perspective which is which. For example, it would capture obscured content but would not capture obscuring content - either a blessing (avoid capturing app-based menus hanging over a presentation) or a curse (subvert the user's expectations of what is being captured). My position is that this is a separate, desirable feature. @jan-ivar, IIUC, you're also now of this opinion?

Clipping to the dimensions of a given element, on the other hand, is something Chrome is very much interested in following up on later as issue w3c/mediacapture-screen-share#158. (Please ignore crop-ID vs. transferability vs. other solutions for the time being.)

Btw, perhaps we should save non-security topics for issue w3c/mediacapture-screen-share#148 (everything getViewportMedia-related other than security) and issue w3c/mediacapture-screen-share#158 (cropping/clipping). Wdyt?

@camillelamy
Copy link
Member

To expand a bit on Chrome security position regarding capture opt-in, we see this as a larger problem of APIs that might leak data from cross-origin resources at the page-level. So whatever we come up with, we would really want the mechanism to be reusable for other APIs with similar security impact.

We have determined that requiring crossOriginIsolation is not enough for such APIs, as the threat model for crossOriginIsolated is APIs that might leak data at the agent cluster level, and not the page level (because the threat is a Spectre attack that is mitigated by out-of-process iframes and we don't want to lower this protection). Concretely, this means we need an additional opt-in from cross-origin frame. Note that this rests upon the assumption that COEP ensures that subresources loaded by the cross-origin iframe have either shared their content with the cross-origin iframe through CORS or opted-into being loaded in a potentially dangerous environment by setting CORP cross-origin. Because of this, we can take the opt-in of the cross-origin frame as an opt-in by the subresources without them having to individually opt them in as well. The iframe could after all just send the content of subresources loaded through CORS to other frames in the page through postMessage, and resources with CORP cross-origin have opted into being loaded into a dangerous environment (ie an attacker could load them directly in a crossOriginIsolated environment and do a Spectre attack on them). What's important for us is not leaking the iframe content, which is why we need an opt-in.

We also believe that the enforcement of the opt-in should be blocking load rather than disabling the feature. If we imagine extending this mechanism to other APIs, going with a disabling model means every single feature must implement a way to disable itself in the middle of execution if a cross-origin frames loads that has not opted into the API. This is likely to result in bugs leading to security issues as opposed to blocking the load, which is safe. Not to mention that we have issues with race conditions, as we must prevent execution of an API in one frame in response to a navigation in a cross-origin frame, two events that are asynchronous. That enforcement must be applied to every frame in the page, which essentially means that it needs to be required by the top-level frame in its response headers and will not change over the lifetime of the document (or we run into potential race conditions between frame creation and requesting access to the API).

With this in mind, we see two potential solutions.

  1. Extending the COEP header as you propose

We would be a bit more partial to something like:
Cross-Origin-Embedder-Policy: require-corp; required-apis html-capture x-bikeshed-other-api
This leaves the possibility of defining a wildcard to allow all APIs we put behind the mechanism.

If we go that route, we should also define a Fetch Metadata request header to inform the server of the COEP required by the parent, like:
Sec-Fetch-COEP: require-corp; required-apis html-capture
This way, the server can evaluate whether they want to opt-into the API and answer accordingly.

Our main issue with this solution is that it ends up mixing two concepts in the same COEP header: the policy to apply to the resources a document embed and an opt-in for APIs.

  1. Use required DocumentPolicy

Basically, we define an html-capture DocumentPolicy (off by default) and only allow the API to be used if the top-level frame sends a
Require-Document-Policy: html-capture header. DocumentPolicy then does the enforcement of the opt-in (ie no frame in the page can load unless it sends a Document-Policy: html-capture header, and no need for a Fetch Metadata request header because we will send a Required-Document-Policy header with each document request.

The issue with that solution is that the opt into the APIs is now split over two mechanism, crossOriginIsolated and DocumentPolicy.

Now, do we require just COEP for page capturing, or crossOriginIsolated?

From a security perspective, for page capture, COEP is sufficient. Since it is scopped to a page, COOP doesn't apply. However, as Elad mentioned, we are concerned about creating very different ways to have access to various APIs. Right now, we have:

  • secure context
  • crossOriginIsolated
  • DocumentPolicy
  • PermissionsPolicy
    and various combinations of the above. We'd like if we could avoid adding yet another one. In case we are concerned about the additional burden on developers, COEP seems to be a lot harder to deploy compared to COOP.

Finally, crossOriginIsolated is currently exposed to the web, while the COEP status of the page isn't. If we go with extending COEP headers, we probably need to expose it so that developers can query which API they have access to.

@arturjanc and @letitz who have also been looking at this.

@jan-ivar
Copy link
Member Author

jan-ivar commented Apr 2, 2021

@camillelamy thanks for the concise writeup! I think I agree with most of it.

Our main issue with this solution is that it ends up mixing two concepts in the same COEP header: the policy to apply to the resources a document embed and an opt-in for APIs.

If it helps, I still see it as a policy applied to iframes, creating different (risk) environments, not tied to any one feature, much like it's not called COEP: shared-array-buffer today.

Consider if we renamed it: COEP: require-corp; require-non-opaque.

I.e. pages opt into different risk profiles, enabling one or perhaps more features like getViewportMedia.

But I'll concede that the original premise behind COEP might have been solely to create less risky environments. cc @annevk

@camillelamy
Copy link
Member

@jan-ivar I agree that the goal of COEP is to get pages to opt into environments with different risk profiles. I am just wondering how easy it is to communicate this risk profile to the developer. Probably, we need to work on the wording, as I am not super fan of what COEP: require-corp; require-non-opaque implies. To me it implies that the embedder could directly read the HTML of the embedded iframe, which it cannot do.

To be clear, we are ok with a COEP based solution, as long as we envision several risk profiles being supported there. For example COEP: require-corp; require-non-opaque-output for getViewportMedia or COEP: require-corp; require-non-opaque-resource-size if we were to release a version of the memory measurement API that applies to the whole page instead of just the Agent Cluster.

@camillelamy
Copy link
Member

@annevk Do you have some opinion on the right mechanism here? In Chrome, I think we're a bit more in favor of using required Document Policy + crossOriginIsolated. However, an approach using COEP + crossOriginIsolated can also work.

@annevk
Copy link
Member

annevk commented Apr 21, 2021

I think if it doesn't affect subresources (as seems to be the case), something like Document Policy might be more appropriate. Mozilla hasn't really taken a stance on Document Policy though and it's currently in WICG for that reason. I'll see if others have thoughts.

I have a question though, what about user state leaked in other ways, such as form controls or :visited? What's the story for that?

@jan-ivar
Copy link
Member Author

what about user state leaked in other ways, such as form controls or :visited? What's the story for that?

The plan is to rely on permission prompting and privacy indicators, and a list of other items here — Any kind of sanitizing risks creating an ad-blocker blocker mode.

There's some pushback on the visibility requirement, since there are use cases of capturing e.g. a Google slides presentation in another open tab, without leaving the Google Meet tab, and basically remote-control the presentation with "next page", "previous page" commands out of band from there.

Both Chrome and Firefox (behind a pref) support capture of tabs that are in the background today FWIW.

@eladalon1983
Copy link
Member

eladalon1983 commented Apr 27, 2021

There's some pushback on the visibility requirement, since there are use cases of capturing e.g. a Google slides presentation in another open tab, without leaving the Google Meet tab, and basically remote-control the presentation with "next page", "previous page" commands out of band from there.

Pushback on visibility requirements placed on getViewportMedia could not be motivated by the behavior of getDisplayMedia. Unless the two had to have the same visibility requirements, but I don't think that's the case.

Speaking of requiring visibility for gVM, I do think it would be counter-productive.

  • Attacks are generally possible within the span of a single frame, escaping the user's notice¹.
  • The tab is visible at the moment capture starts.

Taken together, these mean that attacks are possible immediately or never. Capture persisting after loss of visibility, does not enable new attacks.

But maintaining the capture when visibility is lost does increase usability, even for gVM. The user can tab away and double-check a fact on Wikipedia without interrupting the call or alerting remote participants to their interaction with other tabs.


¹ Attacks that have been discussed thus far, that is. If new attacks are presented, I will recant.

@camillelamy
Copy link
Member

I don't think a requirement that the capture stops when the tab goes into the background brings that much in terms of security. Attacks that we are worried about can be accomplished by embedding an iframe at very low level of visibility, making it unnoticeable for the user even if the tab is in the foreground. I think it is important that the tab is in the foreground when the permission to share it is asked though.

@eladalon1983
Copy link
Member

eladalon1983 commented May 3, 2021

I think if it doesn't affect subresources (as seems to be the case), something like Document Policy might be more appropriate. Mozilla hasn't really taken a stance on Document Policy though and it's currently in WICG for that reason. I'll see if others have thoughts.

Has this been discussed out-of-band by any chance? (@annevk?)

@jan-ivar jan-ivar changed the title Let pages opt-in to capture. getViewportMedia(): Let pages opt-in to capture May 13, 2021
@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 2, 2021

Update on blockers:

  1. Cropping: @eladalon1983 will present a navigator.mediaDevices.getViewportMedia() that will share the entire tab. This lets us revisit cropping in Add ability to crop a MediaStream obtained through the share-this-tab API mediacapture-screen-share#158.
  2. Document Policy: I'm fine with Require-Document-Policy: html-capture that @camillelamy proposed. @annevk?

@annevk
Copy link
Member

annevk commented Sep 3, 2021

Once Document Policy has a proper name (that would also affect the name of that header) I would be okay with it as well. See WICG/document-policy#26.

@jan-ivar
Copy link
Member Author

@camillelamy @annevk Note, the WG went with viewport-capture for document policy name. See #4. If you feel strongly about html-capture let us know, and we can raise it again on Tuesday.

@annevk
Copy link
Member

annevk commented Mar 14, 2022

I don't care particularly strongly. What's not entirely clear to me is whether Document Policy is still something Google is pursuing. E.g., w3c/webappsec-permissions-policy#444 (comment) does not inspire much confidence.

@camillelamy
Copy link
Member

I am ok with the new name. Required Document Policy is something we are interested in pursuing, provided there is some use case that requires it. Capture could be such a use case. In particular, we do believe that the mechanism could potentially be useful for security checks like opt into tab capture, but it may be true that it is less useful for performance optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants