WatchFor video stream pattern

Watch For

Analyze, identify, and surface the most interesting parts of your media content in real-time

Watch For is a large-scale, low-cost, highly programmable media AI platform from Microsoft Research that analyzes images, videos, live streams, and other media content in real-time. Our infrastructure is designed to address a broad range of verticals from digital safety to media analytics. The platform is currently used in production by several large organizations at Microsoft, processing over 400M+ minutes of video content and 4B+ frames every month with a reach of hundreds of millions of users.

Fortnite game play screenshot with overlapping labels identifying what information Watch For is processing
Fortnite game play screenshot with overlapping labels identifying what information Watch For is processing

Watch For production releases

  • Digital Safety solutions for several organizations including Xbox, Flipgrid, Bing, MSN, and LinkedIn
  • Powering Bing’s live stream search serving hundreds of millions of user queries
  • Powering the AI experiences in MSN Esports Hub, including Search, Spotlight and Highlights
  • Powering Mixer’s HypeZone, monitoring and analyzing live video streams on behalf of tens of millions of users and notifying them when specified events occur

How does Watch For work?

Watch For runs analysis pipelines at scale and efficiently across large volumes of content.

There is no silver bullet to achieving high efficiency. A combination of different techniques in different parts of the pipeline provide significant savings. The techniques in Watch For can be broadly classified into three buckets.

  1. End-to-end resource utilization optimizations: Watch For optimizes resources at cluster-level, node-level, and process-level. At the cluster-level, Watch For efficiently manages both spot and dedicated instances and orchestrates between them to process content at low cost and low latency. At the node-level, Watch For makes sure network and CPUs are effectively utilized. The system’s workload is a combination of network and processing, and it applies a few techniques to keep utilization high.
  2. ML optimizations: Watch For does ML optimizations such as batching, model cascades, and bit-width optimizations. Watch For team has been working closely with OctoML and piloting Apache TVM as a model runtime to achieve high inference efficiency.
  3. Efficient programming templates: Watch For exposes efficient programming templates for various content types and pipelines written using those templates are efficiently executed at large scale. Watch For also exposes as many knobs to the developer as possible so that optimizations such as adaptive sampling and deduping can be easily implemented.