Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 4.1 Measurement Error and Sampling

Weighting for internet quality measurements from a self-selected Brazilian public schools sample

Marcelo Pitta1, Thiago Meireles* 1, Pedro Luis do Nascimento Silva1

Abstract

Propensity Score Weighting has been widely used to reduce bias due to non-coverage or non-response errors, as well as for weighting non-probability samples. We present an application to weigh internet quality measurements from self-selected samples of Brazilian public elementary schools. We developed some models to estimate pseudo-weights based on propensity score adjustment. The main goal is to produce estimates that would represent the population of all public elementary schools in Brazil, based on a large self-selected sample of public elementary schools that adhered to an Internet Traffic Measurement System (SIMET) programme.

The measures provided by SIMET are part of Connected Education, a programme introduced by the Brazilian Ministry of Education in 2017. They are obtained either via firmware installed on the routing equipment or software installed on computers at participating schools for automatic collection of internet quality measurements.

Since the population of public elementary schools is known and its characteristics are measured by the Brazilian Annual School Census, the proposed weighting methodology aims to enable estimation of Internet quality measures for all public elementary schools given the measurements obtained for those that have installed SIMET.

Because the installation of SIMET in schools does not follow a process of probabilistic sampling, the probability of having data for a school is not known. To achieve the goal of estimating Internet quality measures based on weighting the information obtained for schools that have the SIMET installed, we need to assess the potential bias associated to the realized sample and to try to compensate for it. This is done in two steps:

(a) Determine which part of the schools population can be represented by those that have SIMET installed.

(b) Construct pseudo-sampling weights for schools that have SIMET to provide estimates of the quality of internet for the population established in (a).

*: Speaker

1: Regional Centre for Studies on the Development of the Information Society (Cetic.br) - Brazil