Inaccuracies in Low Income Housing Geocodes

When and Why They Matter

Nicole Wilson (MIT), Michael Hankinson (MIT), Asya Magazinnik (MIT), Melissa Sands (MIT)

Since 1987, the Low Income Housing Tax Credit (LIHTC) has funded more than 90% of subsidized housing built in the United States. Thus, the LIHTC database—maintained by the Department of Housing and Urban Development (https://lihtc.huduser.gov)—is the primary source of insight into where and when affordable housing is built. For researchers or policymakers hoping to map affordable housing, it is critical to know if the geocodes provided by HUD in this dataset are accurate. By reviewing multiple samples of thousands of LIHTC developments individually, we find that HUD-provided geocodes for housing subsidized by the LIHTC are frequently inaccurate. In our article, forthcoming in Urban Affairs Review, we measure the degree of this inaccuracy and advise researchers on how best to navigate it.

How inaccurate are the data? To assess accuracy, we first reviewed the more than 1,250 LIHTC developments built in California from 1999-2010. We compared the HUD-provided geocodes from the LIHTC database to those generated by feeding each street address into the Google Geocoding API, manually verifying any discrepancies. Next, we repeated this process on a national sample of approximately 1,000 geocodes from 2012-2020. In all, only 55% of the HUD geocodes were within the parcel boundaries—i.e., on the same property lot—of their LIHTC development. In fact, the median discrepancy between the HUD geocode and the center of its actual LIHTC parcel was 70 meters. In contrast, Google performed much better. Google-provided geocodes were in the correct parcel 95% of the time with a median discrepancy of 0 meters. Also, less than 5% of the time, neither HUD nor Google correctly geocoded the LIHTC development. This rare case where both geocodes failed was often due to typos in the original HUD-provided addresses.

Why are these inaccuracies significant? First, these inaccuracies directly affect researchers hoping to understand how spatial proximity to affordable housing affects diverse outcomes such as crime, property values, neighborhood change, and school quality. Second, the inaccuracies even affect researchers seeking to understand the context of affordable housing using Census data. For example, we find that HUD-provided geocodes placed 3% of the LIHTC developments in our California sample in the wrong census tract. At the more granular block level, 19% of the observations in our sample were in the incorrect block (compared to only 6% for Google-generated geocodes). To be clear, we ascribe neither ill motive nor negligence to HUD. When asked, HUD said that they rely on interpolation procedures with varying levels of accuracy. Unfortunately, helpful data on this measurement error—such as whether a given address was geocoded to a rooftop (very accurate) or Census geography (less accurate)—is not carried over to the LIHTC database. In short, researchers are left in the dark.

What should you do with this knowledge? Generally speaking, we recommend that researchers use Google-generated geocodes for most studies where the localized environment matters. But we also found that in the rare cases when Google was inaccurate, the inaccuracy of the Google geocode was greater than the HUD geocode. Thus, if LIHTC developments are going to be aggregated to a larger census geography, researchers may prefer HUD geocodes, which are more frequently inaccurate but typically by smaller distances.

We discovered these errors when using the LIHTC database to understand whether exposure to new affordable housing affects voters' support for funding more affordable housing. Doing so, we found that yes, new nearby affordable housing generates policy feedback loops in surprising ways for both homeowners and renters (“The Policy Adjacent”). But we also realized the need to systematically evaluate the inaccuracies we were finding in the LIHTC database. Since then, we have heard many housing scholars express similar frustrations with the unreliability of the HUD geocodes. While we are happy to have addressed these errors in this article, we also hope that our work encourages scholars to notify the intellectual community of inaccuracies in other popular datasets. Though such work is often tedious, the validation of data is the foundation of credible empirical research.

Read the full UAR article here.


Nicole Wilson is a PhD candidate in the Department of Political Science at the Massachusetts Institute of Technology (MIT). Her research focuses on urbanization and political behavior.

Michael Hankinson is an Assistant Professor in the Department of Political Science at George Washington University. His research focuses on political behavior, local politics, and representation with applications toward housing and land use policy.

Asya Magazinnik is an Assistant Professor in the Department of Political Science at MIT. Her research interests include electoral geography, federalism, local politics, and law enforcement.

Melissa Sands is an Assistant Professor of Politics and Data Science in the Department of Government at the London School of Economics and Political Science. Her research focuses on the link between local social and economic context and political behavior.

Previous
Previous

Producing and Contesting Meanings of Participation in Planning

Next
Next

The Echoes of Echo Park