Publications | DP Infrastructure

2025

DPack: Efficiency-Oriented Privacy Budget Scheduling

Pierre Tholoniat, Kelly Kostopoulou, Mosharaf Chowdhury, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer, and Junfeng Yang

In Proceedings of the Twentieth European Conference on Computer Systems, 2025

Abstract arXiv Bib

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of computing resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents a scheduler for the privacy resources that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPack, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPack: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness instead of efficiency (1.3-1.7\texttimes in Alibaba, 1.0-2.6X in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Using DPack, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.
@inproceedings{dpack, title = {{{DPack}}: {{Efficiency-Oriented Privacy Budget Scheduling}}}, shorttitle = {{{DPack}}}, booktitle = {Proceedings of the {{Twentieth European Conference}} on {{Computer Systems}}}, author = {Tholoniat, Pierre and Kostopoulou, Kelly and Chowdhury, Mosharaf and Cidon, Asaf and Geambasu, Roxana and L\'ecuyer, Mathias and Yang, Junfeng}, year = {2025}, series = {{{EuroSys}} '25}, pages = {1194--1209}, publisher = {Association for Computing Machinery}, location = {New York, NY, USA}, doi = {10.1145/3689031.3696096}, url = {https://dl.acm.org/doi/10.1145/3689031.3696096}, isbn = {9798400711961}, }
Big Bird: Privacy Budget Management for W3C’s Privacy-Preserving Attribution API

Pierre Tholoniat, Alison Caulfield, Giorgio Cavicchioli, Mark Chen, Nikos Goutzoulias, Benjamin Case, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer, and Martin Thomson

In arXiv preprint, 2025

Abstract arXiv Bib

Privacy-preserving advertising APIs like Privacy-Preserving Attribution (PPA) are designed to enhance web privacy while enabling effective ad measurement. PPA offers an alternative to cross-site tracking with encrypted reports governed by differential privacy (DP), but current designs lack a principled approach to privacy budget management, creating uncertainty around critical design decisions. We present Big Bird, a privacy budget manager for PPA that clarifies per-site budget semantics and introduces a global budgeting system grounded in resource isolation principles. Big Bird enforces utility-preserving limits via quota budgets and improves global budget utilization through a novel batched scheduling algorithm. Together, these mechanisms establish a robust foundation for enforcing privacy protections in adversarial environments. We implement Big Bird in Firefox and evaluate it on real-world ad data, demonstrating its resilience and effectiveness.
@inproceedings{bigbird, title = {Big {{Bird}}: {{Privacy Budget Management}} for {{W3C}}'s {{Privacy-Preserving Attribution API}}}, shorttitle = {Big {{Bird}}}, author = {Tholoniat, Pierre and Caulfield, Alison and Cavicchioli, Giorgio and Chen, Mark and Goutzoulias, Nikos and Case, Benjamin and Cidon, Asaf and Geambasu, Roxana and L\'ecuyer, Mathias and Thomson, Martin}, year = {2025}, doi = {10.48550/arXiv.2506.05290}, url = {http://arxiv.org/abs/2506.05290}, booktitle = {arXiv preprint}, keywords = {Computer Science - Cryptography and Security} }

2024

Cookie Monster: Efficient On-Device Budgeting for Differentially-Private Ad-Measurement Systems

Pierre Tholoniat, Kelly Kostopoulou, Peter McNeely, Prabhpreet Singh Sodhi, Anirudh Varanasi, Benjamin Case, Asaf Cidon, Roxana Geambasu, and Mathias Lécuyer

In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

Abstract arXiv Bib

With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web’s privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Cookie Monster, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Cookie Monster into Chrome and evaluate it on microbenchmarks and advertising datasets. Across workloads, Cookie Monster significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.
@inproceedings{cookiemonster24, author = {Tholoniat, Pierre and Kostopoulou, Kelly and McNeely, Peter and Sodhi, Prabhpreet Singh and Varanasi, Anirudh and Case, Benjamin and Cidon, Asaf and Geambasu, Roxana and L\'{e}cuyer, Mathias}, title = {Cookie Monster: Efficient On-Device Budgeting for Differentially-Private Ad-Measurement Systems}, year = {2024}, isbn = {9798400712517}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3694715.3695965}, doi = {10.1145/3694715.3695965}, booktitle = {Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles}, pages = {693--708}, numpages = {16}, keywords = {differential privacy, budgeting, measurement}, location = {Austin, TX, USA}, series = {SOSP '24}, }

2023

Turbo: Effective Caching in Differentially-Private Databases

Kelly Kostopoulou, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, and Mathias Lécuyer

In Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Abstract arXiv Bib

Differentially-private (DP) databases allow for privacy-preserving analytics over sensitive datasets or data streams. In these systems, user privacy is a limited resource that must be conserved with each query. We propose Turbo, a novel, state-of-the-art caching layer for linear query workloads over DP databases. Turbo builds upon private multiplicative weights (PMW), a DP mechanism that is powerful in theory but ineffective in practice, and transforms it into a highly-effective caching mechanism, PMW-Bypass, that uses prior query results obtained through an external DP mechanism to train a PMW to answer arbitrary future linear queries accurately and "for free" from a privacy perspective. Our experiments on public Covid and CitiBike datasets show that Turbo with PMW-Bypass conserves 1.7 – 15.9× more budget compared to vanilla PMW and simpler cache designs, a significant improvement. Moreover, Turbo provides support for range query workloads, such as timeseries or streams, where opportunities exist to further conserve privacy budget through DP parallel composition and warm-starting of PMW state. Our work provides a theoretical foundation and general system design for effective caching in DP databases.
@inproceedings{turbo, title = {Turbo: {{Effective Caching}} in {{Differentially-Private Databases}}}, shorttitle = {Turbo}, booktitle = {Proceedings of the 29th {{Symposium}} on {{Operating Systems Principles}}}, author = {Kostopoulou, Kelly and Tholoniat, Pierre and Cidon, Asaf and Geambasu, Roxana and Lécuyer, Mathias}, year = {2023}, series = {{{SOSP}} '23}, pages = {579--594}, publisher = {{Association for Computing Machinery}}, location = {{New York, NY, USA}}, doi = {10.1145/3600006.3613174}, url = {https://doi.org/10.1145/3600006.3613174}, urldate = {2023-10-10}, isbn = {9798400702297}, }

2021

Privacy Budget Scheduling

Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, and Mathias Lécuyer

In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, July 14-16, 2021, 2021

Abstract arXiv Bib

Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) – a variant of the popular Dominant Resource Fairness (DRF) algorithm – that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over R\’enyi DP, a highly composable form of DP.
@inproceedings{privatekube, title = {Privacy Budget Scheduling}, booktitle = {15th {{USENIX}} Symposium on Operating Systems Design and Implementation, {{OSDI}} 2021, July 14-16, 2021}, author = {Luo, Tao and Pan, Mingen and Tholoniat, Pierre and Cidon, Asaf and Geambasu, Roxana and Lécuyer, Mathias}, editor = {Brown, Angela Demke and Lorch, Jay R.}, year = {2021}, pages = {55--74}, publisher = {{USENIX Association}}, url = {https://www.usenix.org/conference/osdi21/presentation/luo}, bibsource = {dblp computer science bibliography, https://dblp.org}, biburl = {https://dblp.org/rec/conf/osdi/LuoPTCGL21.bib}, timestamp = {Thu, 12 Aug 2021 18:19:16 +0200}, }

2019

Privacy accounting and quality control in the sage differentially private ML platform

Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, and Daniel Hsu

In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, 2019

Abstract arXiv Bib

Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data’s leakage through these models. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacyadaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developerconfigured quality criteria. Sage’s methods are designed to integrate with TensorFlow-Extended, Google’s open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.
@inproceedings{sage, author = {L{\'{e}}cuyer, Mathias and Spahn, Riley and Vodrahalli, Kiran and Geambasu, Roxana and Hsu, Daniel}, editor = {Brecht, Tim and Williamson, Carey}, title = {Privacy accounting and quality control in the sage differentially private {ML} platform}, booktitle = {Proceedings of the 27th {ACM} Symposium on Operating Systems Principles, {SOSP} 2019, Huntsville, ON, Canada, October 27-30, 2019}, pages = {181--195}, publisher = {{ACM}}, year = {2019}, url = {https://doi.org/10.1145/3341301.3359639}, doi = {10.1145/3341301.3359639}, timestamp = {Tue, 19 Nov 2019 12:45:13 +0100}, biburl = {https://dblp.org/rec/conf/sosp/LecuyerSVG019.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, }