Developing a Framework for Creating Structural Datasets for the Secure Pre-Analysis of Sensitive Georeferenced Survey Data
Jan Goebel1, Stefan Jünger2, Jonas Lieth2, Alexander Jung* 1
Abstract
Secure rooms in Research Data Centers (RDCs) are essential for protecting sensitive data, particularly in survey research. However, the physical access required to these secure rooms is often time-consuming and inconvenient, limiting researchers' ability to perform analyses efficiently. The SoRa+ project (Development of the Social-Spatial Science Research Data Infrastructure SoRa: FAIR, Smart, Inclusive) seeks to streamline the integration of georeferenced survey data with spatial data, both publicly and within RDC secure rooms. To enable researchers to develop workflows and identify patterns prior to accessing real data, SoRa+ is creating structural datasets of household coordinates. These datasets replicate the (spatial) distributions and attributes of the actual data without exposing sensitive information about survey participants.
The development of workflows for generating these structural datasets focuses on three key areas: 1) establishing spatial generalization techniques to identify similar regions across municipalities, 2) testing data protection requirements, and 3) developing a workflow to automatically generate structural datasets tailored to specific survey datasets. In this paper, we present structural datasets for household coordinates as a specialized form of synthetic data, facilitating precise and meaningful pre-analysis of georeferenced survey data. We explore the novelty and significance of this method for the spatial linking of georeferenced survey data within the SoRa+ project.
*: Speaker
1: DIW Berlin SOEP
2: GESIS – Leibniz-Institut für Sozialwissenschaften