About us Conference on Anonymization of Integrated and Georeferenced Data (AnigeD), 7-8 October, 2024

Session 6.1 "Synthetic microdata and tables – An investigation of metrics"

Simon Kolb^* ¹, Andreas Tang¹

Abstract

Synthetic data has begun to show potential as an alternative to traditional SDC methods in specific use cases. Since data synthesis predominantly happens at microdata level, development of utility and risk metrics is also focused on this domain. Statistical agencies on the other hand limit data publication mostly to aggregates, by selecting various subsets of variables for cross tabulation. Traditional SDC methods like cell suppression tend to work on tabular level, which makes detailed knowledge of the published data product paramount. By generating synthetic microdata, no post tabular adjustments are required anymore. However, since tabular and microdata metrics can differ significantly, we aim to investigate the relationship between both.

Using a large real life data set as an example for data synthesis, we show that certain global metrics may disproportionately represent small subsets of variables, making them an inappropriate estimator for the quality of aggregates. On the other hand, we show strong similarities between certain microdata level risk metrics and risks of group disclosure in aggregated data.

*: Speaker
1: Statistisches Bundesamt

Search

Search

About us Conference on Anonymization of Integrated and Georeferenced Data (AnigeD), 7-8 October, 2024

Synthetic microdata and tables – An investigation of metrics

Simon Kolb^* ¹, Andreas Tang¹

Abstract

Address

Follow us!

About us Conference on Anonymization of Integrated and Georeferenced Data (AnigeD), 7-8 October, 2024

Synthetic microdata and tables – An investigation of metrics

Simon Kolb* 1, Andreas Tang1

Abstract

Address

Follow us!

Simon Kolb^* ¹, Andreas Tang¹