Bayesian Pseudo Posterior Synthesis for Data Privacy Protection
Tuesday, May 21, 2019
1:00 PM-2:00 PM
The department of Epidemiology and Biostatistics at Dornsife presents Jingchen (Monika) Hu, PhD, Mathematics and Statistics department at Vassar College.
Statistical agencies utilize models to synthesize respondent-level data for release to the general public as an alternative to the actual data records. A Bayesian model synthesizer encodes privacy protection by employing a hierarchical prior construction that induces smoothing of the real data distribution. Agencies balance a trade-off between utility of the synthetic data versus disclosure risks and hold a specific target threshold for disclosure risk before releasing synthetic datasets. They introduce a pseudo posterior likelihood that exponentiates each contribution by an observation record-indexed weight in (0, 1), defined to be inversely proportional to the disclosure risk for that record in the synthetic data. The use of a vector of weights allows more precise downweighting of high-risk records in a fashion that better preserves utility as compared with using a scalar weight. This method is illustrated with an application to the Consumer Expenditure Survey of the U.S. Bureau of Labor Statistics.
Hu is an assistant professor at Vassar College. Her main research interest is developing Bayesian statistical models to facilitate the release of microdata by statistical agencies.