Dealing with Missing Data
A Comparative Exploration of Approaches Utilizing the Integrated City Sustainability Database
Cali Curley, Rachel Krause, Richard Feiock, and Chris Hawkins
In our UAR article, we seek to raise awareness about how to treat missing data in urban studies research. A large proportion of the empirical research on urban politics and policy relies on data collected through surveys of local government or community organization leaders. Surveys provide a relatively efficient way to collect large amounts of consistently measured individual or organizational information needed to conduct comprehensive and accurate statistical analysis. This is particularly important if the aim of research is to produce generalizable findings and contribute to understanding a particular phenomenon by testing theory. However, missing data is a common and significant challenge in survey-based research. It often influences the selection of a statistical method of analysis, and, depending on its severity, can undermine the confidence of analysis. Nonetheless, the problems associated with missing data are among the least acknowledged issues when conducting and reporting analysis.
The goal of this article is to compare three different techniques (listwise deletion, mean replacement, and multiple imputation) that deal with missing data to demonstrate their utility in analyzing survey data. The table included below is an overview of the detailed comparison regarding the three different techniques explored throughout the paper. To demonstrate the performance of these three approaches we utilize data from the Integrated City Sustainability Database (ICSD), portions of which is available to the larger public upon request. The ICSD merges seven different surveys administered to US cities during an 18 month period in 2010-2011 and all include similar questions about local sustainability policy (economic, environmental, equity, climate governmental priority, collaboration, policy adoption, etc). All seven of the surveys were sent to all US cities with populations greater than 50,000 in 2010. The ICSD provides scholars and practitioners with a unique opportunity to examine a very robust set of responses to important questions.
We generate three versions of the ICSD data using each of the common missing data techniques mentioned above - listwise deletion, mean replacement, and multiple imputation – and use them to run three identically specified models. Our analysis finds great variation in the models’ performance based on the version of data used. The paper suggests that understanding why data is missing and how to treat the missingness explain the inflation of certain findings as well as null results that diminish theoretical progress.One key finding of our study are the advantages of employing a theory-based imputation process. The mechanics of imputation may be relatively straightforward, but by developing ‘informing variables’ - broad groupings of variables that have theoretic relationships – we have greater confidence the results reflect more accurate explanatory relationships than alternative methods of handling missing data. Overall the results of our analysis confirm the usefulness of the ICSD in the study of environmental and sustainability and other policy in U.S. cities, and provide suggested pathways for studying urban issues with survey dataIn our analysis, the multiple imputation approach was most appropriate and resulted in the strongest outcomes. This is because the missing values in the ICSD are Missing at Random (MAR) and the pattern of missingness that emerged in the multivariate regression model that we estimated would have resulted in a large number of observations being dropped in the absence of value replacement. Despite the strong performance of multiple imputed data in our example, we emphasize that there is not a one-size-fits-all “best” approach for handling missing data and it is imperative that researchers understand the causes behind the missingness in their own data and the consequences of each potential approach.
Cali Curley is an assistant professor at Indiana University Purdue University Indianapolis School of Public and Environmental Affairs. Her research is focused on environmental policy, energy policy, and local governance.
Rachel M. Krause is an associate professor at the University of Kansas School of Public Affairs and Administration. She researches urban sustainability, local governance, and climate protection initiatives.
Richard Feiock holds the Jerry Collins Eminent Scholar Endowed Chair and is the Augustus B. Turnbull Professor of Public Administration and Policy in the Askew School at Florida State University where he directs of the FSU Local Governance Research Laboratory. He is an elected fellow of the National Academy of Public Administration, serves on the U.S. Environmental Protection Agency, Board of Scientific Counselors.
Christopher V. Hawkins is an associate professor in the School of Public Administration at the University of Central Florida. His research focuses on local economic development, metropolitan governance, and urban sustainability policy.