Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities #17

pranay-prabhat · 2021-04-30T16:00:30Z

It's understandable why Parakeet ads need to be rendered in a Fenced Frame. However publishers are still responsible for the quality of such ads. I agree a lot of quality checks can be done on the server side but malicious behaviors are typically exhibited by ads while rendering in the browser.

Such malicious ads detection are typically done by scripts provided by specialized vendors like GeoEdge, Clean.IO, The Media Trust, Confiant etc.

We propose that such malicious ads detection companies should be allowed to scan Parakeet ads rendering into fenced frames in same way as done today.

darobin · 2021-04-30T18:14:26Z

It would be useful to understand what kind of processing runtime safety scripts need, as well as what reporting needs they have.

My assumption is that ad creatives are going to be some form of locked bundle, such that they cannot access the network at all or communicate to the hosting page, except perhaps through some tiny holes with a small set of predetermined values. (I've been calling such creatives SLIC, for Safe Locally-Inlined Content.) Runtime safety scripts need to be able to work within these constraints.

One thing to keep in mind is that given no network access and strongly sandboxed framing, the attack surface is significantly reduced. (I would even favour an <ad> element over more general-purpose fenced frames so that there isn't a temptation to make it possible to poke holes in that system for other use cases.)

We might use this opportunity to actually improve how these scripts work. Right now it can be difficult for them to operate without being detectable by malicious scripts, and I know of cases of malicious creatives that wouldn't trigger when monitoring was present but would otherwise. Could they for instance be injected as a form of workers, with read-only access to the SLIC's DOM and resources, and the ability to flag content as inappropriate using a small number of error codes?

pranay-prabhat · 2021-04-30T18:59:02Z

I agree with you that ad creatives as lock bundles and not making additional network calls greatly reduces the complexity behind malicious behavior detection.

Safe-frame API works on a similar logic where the iframe in secondary domain communicates with parent page in a set of predefined ways which are locked and limited. Something similar can be done here too but i truly think we need specialists from malicious ads detection companies to vet this concept.

mehulparsana · 2021-05-01T23:52:06Z

In PARAKEET flow, set of ads responded by DSPs are flowing through SSP. Would it possible to scan these ad creatives in the SSP server to reduce need for script? We are assuming that malicious ad scanning does not need access to accurate user information S or publisher context C.

We can discuss this in upcoming meeting on Wednesday.

darobin · 2021-05-02T00:56:22Z

The problem with doing this in the server is that the ads will obfuscate their behaviour and act innocuously with a variety of tactics, until they switch behaviour at runtime. A bundle helps, but it seems unlikely that the need for runtime checking will disappear, unless we rule out JS and go to some variation on declarative. It's tempting! But probably too much to ask.

…

On May 1, 2021 19:52:20 mehulparsana ***@***.***> wrote: In PARAKEET flow, set of ads responded by DSPs are flowing through SSP. Would it possible to scan these ad creatives in the SSP server to reduce need for script? We are assuming that malicious ad scanning does not need access to accurate user information S or publisher context C. We can discuss this in upcoming meeting on Wednesday. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

erik-anderson · 2021-05-03T23:40:08Z

Is the typical obfuscation pattern something like having the ad present one piece of content during approval time and then switching to different content later, e.g. once the device's clock is past some predefined date and time, or the browser has some set of APIs answers that implies it's a real user vs. a pre-validation environment?

I would like to better understand what logic these scripts typically have to make a determination (e.g. mostly focused on reading the DOM contents? Something else?) and how they typically report answers today (a boolean answer? a risk rating?).

There are also things that will likely reduce risk vs. today's ads:

With an ad bundle needing its resources in one go and being loaded in a fenced frame without network access, there are fewer vectors for it to dynamically change the resources it uses as a part of its rendering.
Since it's within a fenced frame, it will have less attack surface available to it, e.g. no ability to abuse an inadequately fuzzed postMessage listener in the parent page.

These mitigating factors are similar to how the iframe sandbox attribute and Feature Policy can be used today to limit the impact a bad ad can have.

Given we're looking at how to make this work in a new environment, I agree we should consider possible improvements. If reading the DOM is the primary thing it needs to be done, perhaps there could be a "isolated DOM reader script" concept that could be loaded within the fenced frame rendering the ad to allow it to inspect the ad and report a result to the browser.

As far as reporting goes, we likely need to think about this in a similar way to the Aggregate Reporting API proposal. If the detection script could report a result to a JavaScript API and have the browser report the specific ad bundle URL to be aggregated with other users' reports, would that be sufficient if the turnaround time from classification to the publisher/SSP/DSP getting an aggregate report is on the order of minutes?

One thing to keep in mind is that given no network access and strongly sandboxed framing, the attack surface is significantly reduced. (I would even favour an <ad> element over more general-purpose fenced frames so that there isn't a temptation to make it possible to poke holes in that system for other use cases.)

Since the "opaque src" of the fenced frame will be provided by calling an ad-centric API, the browser should have a very confident understanding of which fenced frames are tied to ads and apply existing resource caps on the ads as multiple browsers do today (which is currently heuristic based on looking at the URL loaded in the iframe).

That said, if there are "native ad" or similar use cases that would benefit from having those resource caps, that's probably interesting as a more general web primitive ("resource budgets for frames"?) that isn't ads-specific.

AramZS · 2021-05-11T16:17:35Z

Going to break my response in two here for simplicity's sake. First let's talk about the logic scripts handle to determine "bad ads" and I'm using that term deliberately because not all bad ads are explicitly malicious. Ad Safety vendors intervene on a variety of issues and they generally break down as follows (from lowest to highest priority):

Heavy ads: especially an issue since Chrome started blocking overly heavy ads, the concept of a "heavy ad" can be viewed from two ways.
- The first is an ad that is memory heavy (it loads big images, script files, etc...). These are harder to capture, but will usually be seen in one location, have some version of the ad saved to a database, and blocked on future detection.
- The second is CPU heavy ads. These are easier to detect via a scan as they can be seen via unterminated Interval use, access of specific APIs like document dot write, infinite mutation loops etc... and can be intervened in live or recorded and blocked in future instances.
Malformed ads: Malformed ads are those which do not properly use HTML and Javascript code due to a lack of quality control on the ad builder's side. This means that sometimes it is easily detectable via observing specific code in the creative and sometimes it is detectable only after a period of execution and then captured and blocked when detected after the initial capture. Like above, sometimes these issues can be handled with active NOOP or other interventions hooked into the ad code actively. Some examples of what triggers interventions and potential active interventions currently in use:
- lack of playsinline on iOS that can cause videos to go fullscreen. (Active intervention can be handled with mutation observer)
- failure to mute on autoplay of video (Can scan an ad and add mute to the video tags in the moment)
- document.write erasing the containing window (Can NOOP document.write)
Malicious ads: These are ads that may be actively trying to harm the user through redirection, fraud, or non-standard practices like crypto-mining. These can be addressed in a variety of ways, usually the approach is to blocklist specific source URLs for ads, blocklist the appearance of certain urls or code patterns via Regex and stop the ad from appearing when they are in the code of the ad, and the access of particular APIs. Usually these interventions are handled at the level of ad execution. There is a specific reason for this, which is the best countermeasure for malicious ads is to force the ad to get to the point of the user and then block its execution, this requires malicious operators to pay for the ads' display without getting results, and de-incentivizes further fraud.

There are a variety of folks who do more active work on this and have documented specific cases including how these ads hide from developers attempting to stop them. I'll link them below and note that many of these are from Eliya Stein from Confiant who has done a great job of documenting specific problems and including code examples of what is occurring. He has identified cases where the ad hides malicious executions or code based on specific geo, console being open, specific browser types, etc...

erik-anderson · 2021-05-17T17:24:37Z

Thanks for the info, Aram!

As we discussed on our last call, some concerns are reduced or even eliminated when we have the more sandboxed ad hosting environment.

For heavy ads, as you noted, Chrome and Edge (not sure about other browsers) currently unload iframes containing ads (where it has sufficient context to know its an ad) if it exceeds reasonable resource usage as determined by an on-device calculation of CPU and network activity. There is also Reporting API functionality tied to the interventions to allow ad networks to know what ads are triggering the logic (and sites can observe it as well via the ability to observe the iframe unloading). The use fenced frames in new ad serving APIs as described in PARAKEET and FLEDGE/TURTLEDOVE will have some interesting interactions here that we need to think further about w.r.t. ensuring adequate reporting of in-browser "heavy ad" mitigations.

Similarly, some browsers have done various things in the past to reduce the ability to autoplay video with audio enabled. Browsers could consider ensuring that autoplay will not happen if it's an ad iframe and the video is unmuted (and forcibly pausing it if it becomes unmuted without user interaction); this is an example of something where we could build more durable built-in primitives.

Re: both malformed and malicious ads, if ads need to become a bundle with all resources provided up front, does that reduce the burden for static scanning to be sufficiently effective? For instance, a vendor might provide tooling to statically analyze an ad bundle and validate that it doesn't contain any JavaScript that has known code obfuscation patterns.

If the ad also can't perform network exhaust, that also theoretically means there will be no incentive to do cryptomining given it wouldn't be able to extract out the result.

And beyond all of this, ad networks could still consider forcing the addition of a script with each ad bundle to do additional on-device enforcement against emerging patterns.

It sounds like we need to have some additional folks weigh in on if there are still perceived blockers here that would need additional browser functionality to resolve.

darobin · 2021-05-17T17:48:24Z

I understand the desire to avoid having to do anything browser-side, and if we can get there that'd be great. However, I would like to caution against the idea that this can be solved at the ad network level. They aren't necessarily incentivised to be good citizens here and might not put more effort into it than is required to claim they tried.

This was referenced May 3, 2021

Scheduled calls for PARAKEET #3

Closed

How do we expect Programmatic Guaranteed and other Private auctions to work through Parakeet #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities #17

Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities #17

pranay-prabhat commented Apr 30, 2021

darobin commented Apr 30, 2021

pranay-prabhat commented Apr 30, 2021 •

edited

mehulparsana commented May 1, 2021

darobin commented May 2, 2021 via email

erik-anderson commented May 3, 2021

AramZS commented May 11, 2021

erik-anderson commented May 17, 2021

darobin commented May 17, 2021

Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities #17

Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities #17

Comments

pranay-prabhat commented Apr 30, 2021

darobin commented Apr 30, 2021

pranay-prabhat commented Apr 30, 2021 • edited

mehulparsana commented May 1, 2021

darobin commented May 2, 2021 via email

erik-anderson commented May 3, 2021

AramZS commented May 11, 2021

erik-anderson commented May 17, 2021

darobin commented May 17, 2021

pranay-prabhat commented Apr 30, 2021 •

edited