To save this page as a PDF, click this button and choose the PDF destination.

Data quality in probability-based online panels

14:00 - 15:20 Tuesday, 7th September, 2021

Online

Social Statistics

In our current digital age, numerous online panels offer researchers inexpensive, fast, and flexible data collection. However, it has frequently been questioned whether these online panels can provide high enough data quality to allow valid inferences to the general population. This may in part depend on the choice of sampling and recruitment design. Most online panels rely on nonprobability sampling and recruitment based on volunteer self-selection on the internet. Some other online panels rely on traditional probability-based offline sampling and recruitment procedures. Many of the latter are recruited on the back of established interviewer-administered survey programmes, such as the European Social Survey or the British Social Attitude Survey. In this session, we will explore the possibilities and challenges of probability-based and nonprobability online panels in the UK and abroad, including evidence from CRONOS, the NatCen Panel, and ten German online panels. Based on this evidence, we will discuss best-practice recommendations and paths for future research.

Organised by Olga Maslovskaya & Carina Cornesse for RSS Social Statistics Section


134 Investigation of nonresponse bias and representativeness in the first cross-national probability based online panel (CRONOS)

Dr Olga Maslovskaya1, Dr Peter Lugtig2
1University of Southampton, Southampton, United Kingdom. 2Utrecht University, Utrecht, Netherlands

Abstract

Driven by innovations in the digital space, surveys started adopting technologies and moving towards online data collection across the world. However, evidence is needed to demonstrate that an online data collection strategy will produce reliable data which could be confidently used to inform policy and business decisionsThis issue is even more pertinent in cross-national surveys, where the comparability of data is of the utmost importance. Due to differences in internet coverage across Europe, there is a risk that any strategy to move existing surveys online will introduce differential coverage and nonresponse bias.  

This paper explores representativeness and non-response bias across waves in the first cross-national online probability-based panel CRONOS by employing R-indicators.  The analysis allows comparison of the results over time and across three countries (Estonia, Great Britain and Slovenia). The results suggest that there are differences in representativeness over time in each country and across countries.  Those with lower levels of education and those who are in the oldest age category contribute more to the lack of representativeness in three countriesOverall, we conclude that the representativeness of CRONOS panel does not become worse when compared to the regular face-to-face interviewing conducted in the European Social Survey (ESS). 

 

 

 


135 Evaluating data quality in the UK NatCen probability-based online panel

Dr Olga Maslovskaya1, Mr Curtis Jessop2, Professor Gabriele Durrant1
1University of Southampton, Southampton, United Kingdom. 2NatCen Social Research, London, United Kingdom

Abstract

We live in a digital age with high level of use of technologies. Surveys have also started adopting technologies for data collection. There is a move towards online data collection across the world due to falling response rates and pressure to reduce survey costs.  Evidence is needed to demonstrate that the online data collection strategy will work and produce reliable data which can be confidently used for policy decisions. No research has been conducted so far to assess data quality in the UK NatCen probability-based online panel. This paper is timely and fills this gap in knowledge.  This paper aims to compare data quality in NatCen probability-based online panel and three non-probability panels.   It also compares NatCen online panel to the British Social Attitude (BSA) probability-based survey on the back of which NatCen panel was created and which collects data using gold standard face-to-face interviews.

Various absolute and relative measures of differences will be used for the analysis such as mean average difference and Duncan dissimilarity Index among others.  This analysis will help us to investigate how sample quality might impact on differences in point estimates between probability and non-probability samples. Modelling of all substantive questions which are the same across all 5 surveys will be conducted and coefficients will be compared across different surveys.

Recommendations will be provided for future waves of data collection and new probability-based as well as non-probability-based online panels.




137 Investigating multiple aspects of data quality across ten German probability-based and nonprobability online panels

Dr. Carina Cornesse1, Dr. Daniela Ackermann-Piek2,3, Tobias Rettig1, Prof. Dr. Annelies Blom1
1University of Mannheim, Mannheim, Germany. 2GESIS - Leibniz Institute for the Social Sciences, Mannheim, Germany. 3SRH Mobile University, Riedlingen, Germany

Abstract

Online panels have been on the rise for over a decade now and are increasingly used for conducting research in the social sciences and beyond. However, online panels differ widely in their design and data quality. One particularly prominent aspect in which existing online panels differ is with regard to their sampling and recruitment strategy. Most online panels are based on nonprobability samples of volunteers recruited on the internet. Only few online panels are based on an offline recruited traditional probability sampling design instead. The majority of the research into how these different strategies impact data quality indicates that probability-based online panels provide more accurate univariate estimates than their nonprobability online panel counterparts. However, little research has examined other aspects of data quality thus far. In this presentation, we will therefore provide evidence from a large-scale study with three waves of parallel data collection across ten online panels in Germany. We will particularly focus on evaluations of the accuracy of associations between variables to see whether nonprobability online panels are “fit-for-puropse” for bivariate and multivariate analyses. We will also provide an assessment of retention rates and biases across data collection waves to investigate the possibilities and challenges of re-surveying the same individuals over time.


139 Integrating Probability and Nonprobability Surveys to Improve Estimation and Reduce Costs

Joseph W Sakshaug1, Arkadiusz Wisniowski2, Diego Perez-Ruiz2, Annelies Blom3
1Institute for Employment Research, Nuremberg, Germany. 2University of Manchester, Manchester, United Kingdom. 3University of Mannheim, Mannheim, Germany

Abstract

Carefully designed probability-based sample surveys can be prohibitively expensive to conduct. As such, many survey organizations have shifted away from using expensive probability samples in favor of less expensive, but possibly less accurate, nonprobability web samples. However, their lower costs and abundant availability make them a potentially useful supplement to traditional probability-based samples. We examine this notion by proposing a method of supplementing small probability samples with nonprobability samples using Bayesian inference. We consider two semi-conjugate informative prior distributions for linear regression coefficients based on nonprobability samples, one accounting for the distance between maximum likelihood coefficients derived from parallel probability and non-probability samples, and the second depending on the variability and size of the nonprobability sample. The method is evaluated in comparison with a reference prior through simulations and a real-data application involving multiple probability and nonprobability surveys fielded simultaneously using the same questionnaire. We show that the method reduces the variance and mean-squared error (MSE) of coefficient estimates and model-based predictions relative to probability-only samples. Using actual and assumed cost data we also show that the method can yield substantial cost savings (up to 55%) for a fixed MSE.