Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities #17

Open
pranay-prabhat opened this issue Apr 30, 2021 · 8 comments

Comments

@pranay-prabhat
Copy link

It's understandable why Parakeet ads need to be rendered in a Fenced Frame. However publishers are still responsible for the quality of such ads. I agree a lot of quality checks can be done on the server side but malicious behaviors are typically exhibited by ads while rendering in the browser.

Such malicious ads detection are typically done by scripts provided by specialized vendors like GeoEdge, Clean.IO, The Media Trust, Confiant etc.

We propose that such malicious ads detection companies should be allowed to scan Parakeet ads rendering into fenced frames in same way as done today.

@darobin
Copy link

darobin commented Apr 30, 2021

It would be useful to understand what kind of processing runtime safety scripts need, as well as what reporting needs they have.

My assumption is that ad creatives are going to be some form of locked bundle, such that they cannot access the network at all or communicate to the hosting page, except perhaps through some tiny holes with a small set of predetermined values. (I've been calling such creatives SLIC, for Safe Locally-Inlined Content.) Runtime safety scripts need to be able to work within these constraints.

One thing to keep in mind is that given no network access and strongly sandboxed framing, the attack surface is significantly reduced. (I would even favour an <ad> element over more general-purpose fenced frames so that there isn't a temptation to make it possible to poke holes in that system for other use cases.)

We might use this opportunity to actually improve how these scripts work. Right now it can be difficult for them to operate without being detectable by malicious scripts, and I know of cases of malicious creatives that wouldn't trigger when monitoring was present but would otherwise. Could they for instance be injected as a form of workers, with read-only access to the SLIC's DOM and resources, and the ability to flag content as inappropriate using a small number of error codes?

@pranay-prabhat
Copy link
Author

pranay-prabhat commented Apr 30, 2021

I agree with you that ad creatives as lock bundles and not making additional network calls greatly reduces the complexity behind malicious behavior detection.

Safe-frame API works on a similar logic where the iframe in secondary domain communicates with parent page in a set of predefined ways which are locked and limited. Something similar can be done here too but i truly think we need specialists from malicious ads detection companies to vet this concept.

@mehulparsana
Copy link
Contributor

In PARAKEET flow, set of ads responded by DSPs are flowing through SSP. Would it possible to scan these ad creatives in the SSP server to reduce need for script? We are assuming that malicious ad scanning does not need access to accurate user information S or publisher context C.

We can discuss this in upcoming meeting on Wednesday.

@darobin
Copy link

darobin commented May 2, 2021 via email

@erik-anderson
Copy link
Contributor

Is the typical obfuscation pattern something like having the ad present one piece of content during approval time and then switching to different content later, e.g. once the device's clock is past some predefined date and time, or the browser has some set of APIs answers that implies it's a real user vs. a pre-validation environment?

I would like to better understand what logic these scripts typically have to make a determination (e.g. mostly focused on reading the DOM contents? Something else?) and how they typically report answers today (a boolean answer? a risk rating?).

There are also things that will likely reduce risk vs. today's ads:

  1. With an ad bundle needing its resources in one go and being loaded in a fenced frame without network access, there are fewer vectors for it to dynamically change the resources it uses as a part of its rendering.
  2. Since it's within a fenced frame, it will have less attack surface available to it, e.g. no ability to abuse an inadequately fuzzed postMessage listener in the parent page.

These mitigating factors are similar to how the iframe sandbox attribute and Feature Policy can be used today to limit the impact a bad ad can have.

Given we're looking at how to make this work in a new environment, I agree we should consider possible improvements. If reading the DOM is the primary thing it needs to be done, perhaps there could be a "isolated DOM reader script" concept that could be loaded within the fenced frame rendering the ad to allow it to inspect the ad and report a result to the browser.

As far as reporting goes, we likely need to think about this in a similar way to the Aggregate Reporting API proposal. If the detection script could report a result to a JavaScript API and have the browser report the specific ad bundle URL to be aggregated with other users' reports, would that be sufficient if the turnaround time from classification to the publisher/SSP/DSP getting an aggregate report is on the order of minutes?

One thing to keep in mind is that given no network access and strongly sandboxed framing, the attack surface is significantly reduced. (I would even favour an <ad> element over more general-purpose fenced frames so that there isn't a temptation to make it possible to poke holes in that system for other use cases.)

Since the "opaque src" of the fenced frame will be provided by calling an ad-centric API, the browser should have a very confident understanding of which fenced frames are tied to ads and apply existing resource caps on the ads as multiple browsers do today (which is currently heuristic based on looking at the URL loaded in the iframe).

That said, if there are "native ad" or similar use cases that would benefit from having those resource caps, that's probably interesting as a more general web primitive ("resource budgets for frames"?) that isn't ads-specific.

@AramZS
Copy link

AramZS commented May 11, 2021

Going to break my response in two here for simplicity's sake. First let's talk about the logic scripts handle to determine "bad ads" and I'm using that term deliberately because not all bad ads are explicitly malicious. Ad Safety vendors intervene on a variety of issues and they generally break down as follows (from lowest to highest priority):

  1. Heavy ads: especially an issue since Chrome started blocking overly heavy ads, the concept of a "heavy ad" can be viewed from two ways.

    • The first is an ad that is memory heavy (it loads big images, script files, etc...). These are harder to capture, but will usually be seen in one location, have some version of the ad saved to a database, and blocked on future detection.
    • The second is CPU heavy ads. These are easier to detect via a scan as they can be seen via unterminated Interval use, access of specific APIs like document dot write, infinite mutation loops etc... and can be intervened in live or recorded and blocked in future instances.
  2. Malformed ads: Malformed ads are those which do not properly use HTML and Javascript code due to a lack of quality control on the ad builder's side. This means that sometimes it is easily detectable via observing specific code in the creative and sometimes it is detectable only after a period of execution and then captured and blocked when detected after the initial capture. Like above, sometimes these issues can be handled with active NOOP or other interventions hooked into the ad code actively. Some examples of what triggers interventions and potential active interventions currently in use:

    • lack of playsinline on iOS that can cause videos to go fullscreen. (Active intervention can be handled with mutation observer)
    • failure to mute on autoplay of video (Can scan an ad and add mute to the video tags in the moment)
    • document.write erasing the containing window (Can NOOP document.write)
  3. Malicious ads: These are ads that may be actively trying to harm the user through redirection, fraud, or non-standard practices like crypto-mining. These can be addressed in a variety of ways, usually the approach is to blocklist specific source URLs for ads, blocklist the appearance of certain urls or code patterns via Regex and stop the ad from appearing when they are in the code of the ad, and the access of particular APIs. Usually these interventions are handled at the level of ad execution. There is a specific reason for this, which is the best countermeasure for malicious ads is to force the ad to get to the point of the user and then block its execution, this requires malicious operators to pay for the ads' display without getting results, and de-incentivizes further fraud.

    There are a variety of folks who do more active work on this and have documented specific cases including how these ads hide from developers attempting to stop them. I'll link them below and note that many of these are from Eliya Stein from Confiant who has done a great job of documenting specific problems and including code examples of what is occurring. He has identified cases where the ad hides malicious executions or code based on specific geo, console being open, specific browser types, etc...

@erik-anderson
Copy link
Contributor

Thanks for the info, Aram!

As we discussed on our last call, some concerns are reduced or even eliminated when we have the more sandboxed ad hosting environment.

For heavy ads, as you noted, Chrome and Edge (not sure about other browsers) currently unload iframes containing ads (where it has sufficient context to know its an ad) if it exceeds reasonable resource usage as determined by an on-device calculation of CPU and network activity. There is also Reporting API functionality tied to the interventions to allow ad networks to know what ads are triggering the logic (and sites can observe it as well via the ability to observe the iframe unloading). The use fenced frames in new ad serving APIs as described in PARAKEET and FLEDGE/TURTLEDOVE will have some interesting interactions here that we need to think further about w.r.t. ensuring adequate reporting of in-browser "heavy ad" mitigations.

Similarly, some browsers have done various things in the past to reduce the ability to autoplay video with audio enabled. Browsers could consider ensuring that autoplay will not happen if it's an ad iframe and the video is unmuted (and forcibly pausing it if it becomes unmuted without user interaction); this is an example of something where we could build more durable built-in primitives.

Re: both malformed and malicious ads, if ads need to become a bundle with all resources provided up front, does that reduce the burden for static scanning to be sufficiently effective? For instance, a vendor might provide tooling to statically analyze an ad bundle and validate that it doesn't contain any JavaScript that has known code obfuscation patterns.

If the ad also can't perform network exhaust, that also theoretically means there will be no incentive to do cryptomining given it wouldn't be able to extract out the result.

And beyond all of this, ad networks could still consider forcing the addition of a script with each ad bundle to do additional on-device enforcement against emerging patterns.

It sounds like we need to have some additional folks weigh in on if there are still perceived blockers here that would need additional browser functionality to resolve.

@darobin
Copy link

darobin commented May 17, 2021

I understand the desire to avoid having to do anything browser-side, and if we can get there that'd be great. However, I would like to caution against the idea that this can be solved at the ad network level. They aren't necessarily incentivised to be good citizens here and might not put more effort into it than is required to claim they tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants