Sage | DP Infrastructure

Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data’s leakage through these models. Sage is a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacy-adaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developer-configured quality criteria. Sage’s methods are designed to integrate with TensorFlow-Extended, Google’s open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.

References

2019

Privacy accounting and quality control in the sage differentially private ML platform

Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, and Daniel Hsu

In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, 2019

Abstract arXiv Bib

Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data’s leakage through these models. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacyadaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developerconfigured quality criteria. Sage’s methods are designed to integrate with TensorFlow-Extended, Google’s open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.
@inproceedings{sage, author = {L{\'{e}}cuyer, Mathias and Spahn, Riley and Vodrahalli, Kiran and Geambasu, Roxana and Hsu, Daniel}, editor = {Brecht, Tim and Williamson, Carey}, title = {Privacy accounting and quality control in the sage differentially private {ML} platform}, booktitle = {Proceedings of the 27th {ACM} Symposium on Operating Systems Principles, {SOSP} 2019, Huntsville, ON, Canada, October 27-30, 2019}, pages = {181--195}, publisher = {{ACM}}, year = {2019}, url = {https://doi.org/10.1145/3341301.3359639}, doi = {10.1145/3341301.3359639}, timestamp = {Tue, 19 Nov 2019 12:45:13 +0100}, biburl = {https://dblp.org/rec/conf/sosp/LecuyerSVG019.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, }