PrivateKube & DPack

incorporate privacy as a resource in Kubernetes and show how to schedule it

Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models.

PrivateKube is an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We developed Dominant Private Block Fairness (DPF) – a variant of the popular Dominant Resource Fairness (DRF) algorithm – that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF.

The design, implementation, and evaluation of PrivateKube and DPF are described in a paper published at OSDI ‘21: Privacy Budget Scheduling. A local copy of this paper is available here. An extended version of this paper, with some details we omitted from the conference paper, is available on arXiv. The PrivateKube repository contains the code we release as a reusable and extensible artifact of our research.

DPack, published at EuroSys ‘25, proposes a new scheduling algorithm that optimizes for efficiency instead of fairness.

References

2025

DPack: Efficiency-Oriented Privacy Budget Scheduling

Pierre Tholoniat, Kelly Kostopoulou, Mosharaf Chowdhury, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer, and Junfeng Yang

In Proceedings of the Twentieth European Conference on Computer Systems, 2025

Abstract arXiv Bib

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of computing resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents a scheduler for the privacy resources that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPack, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPack: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness instead of efficiency (1.3-1.7\texttimes in Alibaba, 1.0-2.6X in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Using DPack, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.
@inproceedings{dpack, title = {{{DPack}}: {{Efficiency-Oriented Privacy Budget Scheduling}}}, shorttitle = {{{DPack}}}, booktitle = {Proceedings of the {{Twentieth European Conference}} on {{Computer Systems}}}, author = {Tholoniat, Pierre and Kostopoulou, Kelly and Chowdhury, Mosharaf and Cidon, Asaf and Geambasu, Roxana and L\'ecuyer, Mathias and Yang, Junfeng}, year = {2025}, series = {{{EuroSys}} '25}, pages = {1194--1209}, publisher = {Association for Computing Machinery}, location = {New York, NY, USA}, doi = {10.1145/3689031.3696096}, url = {https://dl.acm.org/doi/10.1145/3689031.3696096}, isbn = {9798400711961}, }

2021

Privacy Budget Scheduling

Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, and Mathias Lécuyer

In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, July 14-16, 2021, 2021

Abstract arXiv Bib

Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) – a variant of the popular Dominant Resource Fairness (DRF) algorithm – that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over R\’enyi DP, a highly composable form of DP.
@inproceedings{privatekube, title = {Privacy Budget Scheduling}, booktitle = {15th {{USENIX}} Symposium on Operating Systems Design and Implementation, {{OSDI}} 2021, July 14-16, 2021}, author = {Luo, Tao and Pan, Mingen and Tholoniat, Pierre and Cidon, Asaf and Geambasu, Roxana and Lécuyer, Mathias}, editor = {Brown, Angela Demke and Lorch, Jay R.}, year = {2021}, pages = {55--74}, publisher = {{USENIX Association}}, url = {https://www.usenix.org/conference/osdi21/presentation/luo}, bibsource = {dblp computer science bibliography, https://dblp.org}, biburl = {https://dblp.org/rec/conf/osdi/LuoPTCGL21.bib}, timestamp = {Thu, 12 Aug 2021 18:19:16 +0200}, }