Microsoft and Harvard’s Institute for Quantitative Social Science Collaboration Develops Open Data Differential Privacy Platform, Opens New Research

Microsoft and Harvard’s Institute for Quantitative Social Science Collaboration Develops Open Data Differential Privacy Platform, Opens New Research

New Microsoft and Harvard’s Institute for Quantitative Social Science Collaboration Develops Open Data Differential Privacy Platform, Opens New Research 

 September 26, 2019 by John Kahan – Chief Data Analytics Officer 

We live in a world that is swimming in data, and every day we create more and more of it. However, data often contains sensitive personal information that can be manipulated in a way that allows for re-identification of the data’s owner. So, we need to find a way to analyze data, to unlock its full potential, in a way that doesn’t risk the privacy of those who own the data. Recent advances in AI and data science make that possible, which is why I’m excited to announce that Microsoft is collaborating with Harvard’s Gary King, Weatherhead University Professor and founder and director of Harvard’s Institute for Quantitative Social Sciences, to build a platform that uses differential privacy to ensure data is kept private, while enabling researchers across sectors including academia, government and the private sector to gain new and novel insights that can rapidly advance human knowledge.

“We are thrilled to be working with Microsoft on this important project. Instead of balancing the interests of individuals in their own privacy and the interests the public has in researchers discovering knowledge to advance the public good, we aim to remove the conflict and achieve both,” said Gary King.

Differential privacy, a technology tailored to privacy-preserving statistical analysis of large datasets, was invented in 2006, the culmination of a 4-year research effort spearheaded by Cynthia Dwork, Microsoft Research Distinguished Scientist and Gordon McKay Professor of Computer Science at Harvard, in collaboration with Kobbi Nissim, Frank McSherry, and Adam Smith. Differential privacy enables researchers and analysts to extract useful insights for public good from data sets containing personal information and, at the same time, offers the strongest possible privacy protections available today. This seemingly contradictory outcome is achieved by introducing relatively small inaccuracies, or statistical noise, into the calculations. These inaccuracies are large enough to protect privacy, but small enough that the answers provided to analysts and researchers are still useful. Our goal is to build a differential privacy platform for a more general solution to which a wide range of researchers and companies may eventually contribute.

 “Differential privacy exemplifies the rewards of investing in fundamental research and demonstrates the vast potential of theoretical computer science as a vehicle for social change. Creation of an industrial-strength, publicly available, platform will advance the practice and the art,” said Cynthia Dwork.

Previously, researchers relied on techniques like de-identification, a process that strips out sensitive data in larger data sets. Unfortunately, this method isn’t secure and can be easily broken by repeat and complex queries and by combining it with other data.   

The math behind differential privacy is complex, but if you want to read more about it, I recommend Cynthia Dwork’s brief A Firm Foundation for Private Data Analysis, written for a broad technical audience, and Differential Privacy: A Primer for Non-technical Audiences, co-written by Salil Vadhan, Vicky Joseph Professor of Computer Science and Applied Mathematics at Harvard, who will also be partnering with us on this project.  

These papers nicely summarize the benefits differential privacy offers to anyone looking to analyze sensitive data: 

  • Differential privacy essentially protects an individual’s information as if her information were not used in the analysis at all. 
  • Differential privacy essentially ensures that using an individual’s data will not reveal any personally identifiable information that is specific to her. Here, specific refers to information that cannot be inferred unless the individual’s information is used in the analysis. 
  • Differential privacy essentially masks the contribution of any single individual, making it impossible to infer any information specific to an individual, including whether the individual’s information was used at all. 

On the Microsoft side, the development of our platform is being led by our Azure AI team, run by Eric Boyd, Microsoft’s CVP for AI Platform, who said, “This collaboration brings together Microsoft’s vast engineering resources and Azure AI with some of the most talented data scientists, engineers and researchers to develop a data sharing tool that will fundamentally change how we do research. We’re working through real scenarios from government, health care, academic and business sectors that will show how differential privacy provides the strongest possible privacy protections available, and we’re excited to see the deeper insights and new solutions that will emerge as a result.”

Once the basic architecture is built and the governance is in place, we will open the platform and the algorithms to all developers, researchers, and companies worldwide to participate in building and supporting it for the future. This open approach we believe will be critical for success as it ensures transparency so all can trust the output.

Our project also builds on Microsoft’s work around homomorphic encryption and confidential computing, which looks to advance security in the cloud. When combining homomorphic encryption to keep data secure with differential privacy to ensure the data remains private, users will be able to be unlock the full potential of their data and be confident their data will remain safe and in their control.

Once the platform is available, researchers will be able to use it to make their own data sets available to other researchers all around the world. As a result, we can combine various and previously unconnected or even unrelated data sets into massive data sets that can be analyzed by AI, which will further unlock the power of data. Maybe even more importantly, the resulting insights will open new avenues of research that will allow us to develop new solutions to some of the most pressing problems we face. 

The objective is to use our collective innovation and make the kinds of breakthroughs that serve everyone: fighting cancer and other diseases, designing tools to help with learning disabilities, helping refugees find a place to live, and protecting our planet from the harms of climate change, all while protecting the privacy of individuals whose data helps us determine our results. 

This project will also be a key component of the Cascadia Data Discovery Initiative, a partnership that seeks to build a robust health data ecosystem that focuses on enabling collaboration, data sharing, and data-driven cancer research in the Northwest. I will be attending the Cascadia Innovation Corridor Conference next week where we will discuss our work on differential privacy and how it can advance the work our collaborator, Fred Hutchinson Cancer Research Center, and the other CDDI collaborators are doing.

We will announce further details this fall about how developers and researchers can participate along with us. 

Mohamed.S. Magassouba

President of VistaBank in Worldwide!

3y

Je suis le coeur avec vous,et je tiens as vous tres immensément,en vertu Coopération ou Parténariat dans un cadre D"Investissément,pour l"élucider les objectifs des autres humains quasiment cette Terre.et je vous en remercie tres Amble ment.E_mail:magas113@yahoo.fr

Like
Reply

I sincerely hope this initiative leads to innovation and more breakthroughs in cancer research as our family like so many others has been touched by that evil thing. Thanks John!

Yuan Zhong

Executive Leadership in Data Engineering and Applied Science, Ex-Amazon, AI Hobbyist

4y

Very important work. Hope your team the best in getting there, fast!

This is a great initiative. I look forward to hearing more about its successful outcomes.

Julie Brill

Chief Privacy Officer and Corporate Vice President for Global Privacy, Safety, and Regulatory Affairs at Microsoft

4y

Great to see this partnership come to life, John. Looking forward to hearing more about how this platform will be a win-win for individual privacy and scientific discovery.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics