ndi: Neighborhood Deprivation Indices

Ian D. Buller (GitHub: @idblr)

2023-02-01

Start with the necessary packages for the vignette.

loadedPackages <- c("dplyr", "ggplot2", "ndi", "tidycensus", "tigris")
invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)

Set your U.S. Census Bureau access key. Follow this link to obtain one. Specify your access key in the messer() or powell_wiley() functions using the key argument of the get_acs() function from the tidycensus package called within each or by using the census_api_key() function from the tidycensus package before running the messer() or powell_wiley() functions (see an example of the latter below).

tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API

Compute NDI (Messer)

Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on Messer et al. (2006) with the following socio-economic status (SES) variables:

Characteristic SES dimension ACS table source Description
OCC Occupation C24030 Percent males in management, science, and arts occupation
CWD Housing B25014 Percent of crowded housing
POV Poverty B17017 Percent of households in poverty
FHH Poverty B25115 Percent of female headed households with dependents
PUB Poverty B19058 Percent of households on public assistance
U30 Poverty B19001 Percent households earning <$30,000 per year
EDU Education B06009 Percent earning less than a high school education
EMP Employment B23001 (2010 only); B23025 (2011 onward) Percent unemployed
messer2010GA <- ndi::messer(state = "GA", year = 2010, round_output = TRUE)

One output from the messer() function is a tibble containing the identification, geographic name, NDI (Messer) values, and raw census characteristics for each tract.

messer2010GA$ndi
## # A tibble: 1,969 × 14
##    GEOID  state county tract     NDI NDIQu…¹   OCC   CWD   POV   FHH   PUB   U30
##    <chr>  <chr> <chr>  <chr>   <dbl> <fct>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 13001… Geor… Appli… 9501  -0.0075 2-Belo…     0   0     0.1   0.1   0.1   0.3
##  2 13001… Geor… Appli… 9502   0.0458 4-Most…     0   0     0.3   0.1   0.2   0.5
##  3 13001… Geor… Appli… 9503   0.0269 3-Abov…     0   0     0.2   0     0.2   0.4
##  4 13001… Geor… Appli… 9504  -0.0083 2-Belo…     0   0     0.1   0     0.1   0.3
##  5 13001… Geor… Appli… 9505   0.0231 3-Abov…     0   0     0.2   0     0.2   0.4
##  6 13003… Geor… Atkin… 9601   0.0619 4-Most…     0   0.1   0.2   0.2   0.2   0.5
##  7 13003… Geor… Atkin… 9602   0.0593 4-Most…     0   0.1   0.3   0.1   0.2   0.4
##  8 13003… Geor… Atkin… 9603   0.0252 3-Abov…     0   0     0.3   0.1   0.2   0.4
##  9 13005… Geor… Bacon… 9701   0.0061 3-Abov…     0   0     0.2   0     0.2   0.4
## 10 13005… Geor… Bacon… 9702…  0.0121 3-Abov…     0   0     0.2   0.1   0.1   0.5
## # … with 1,959 more rows, 2 more variables: EDU <dbl>, EMP <dbl>, and
## #   abbreviated variable name ¹​NDIQuart

A second output from the messer() function is the results from the principal component analysis used to compute the NDI (Messer) values.

messer2010GA$pca
## Principal Components Analysis
## Call: psych::principal(r = ndi_data_pca, nfactors = 1, n.obs = nrow(ndi_data_pca), 
##     covar = FALSE, scores = TRUE, missing = imp)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       PC1   h2   u2 com
## OCC -0.59 0.35 0.65   1
## CWD  0.47 0.22 0.78   1
## POV  0.87 0.76 0.24   1
## FHH  0.67 0.45 0.55   1
## PUB  0.89 0.79 0.21   1
## U30  0.90 0.81 0.19   1
## EDU  0.79 0.62 0.38   1
## EMP  0.46 0.21 0.79   1
## 
##                 PC1
## SS loadings    4.21
## Proportion Var 0.53
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.11 
##  with the empirical chi square  1221.09  with prob <  2.3e-246 
## 
## Fit based upon off diagonal values = 0.95

A third output from the messer() function is a tibble containing a breakdown of the missingness of the census characteristics used to compute the NDI (Messer) values.

messer2010GA$missing
## # A tibble: 8 × 4
##   variable total n_missing percent_missing
##   <chr>    <int>     <int> <chr>          
## 1 CWD       1969        14 0.71 %         
## 2 EDU       1969        13 0.66 %         
## 3 EMP       1969        13 0.66 %         
## 4 FHH       1969        14 0.71 %         
## 5 OCC       1969        15 0.76 %         
## 6 POV       1969        14 0.71 %         
## 7 PUB       1969        14 0.71 %         
## 8 U30       1969        14 0.71 %

We can visualize the NDI (Messer) values geographically by linking them to spatial information from the tigris package and plotting with the ggplot2 package suite.

# Obtain the 2010 counties from the "tigris" package
county2010GA <- tigris::counties(state = "GA", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
county2010GA$GEOID <- substring(county2010GA$GEO_ID, 10) 

# Obtain the 2010 census tracts from the "tigris" package
tract2010GA <- tigris::tracts(state = "GA", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010GA$GEOID <- substring(tract2010GA$GEO_ID, 10) 

# Join the NDI (Messer) values to the census tract geometry
GA2010messer <- dplyr::left_join(tract2010GA, messer2010GA$ndi, by = "GEOID")
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts 
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer, 
                   ggplot2::aes(fill = NDI),
                   size = 0.05,
                   color = "transparent") +
   ggplot2::geom_sf(data = county2010GA,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
                   subtitle = "GA census tracts as the referent")

## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
GA2010messer$NDIQuartNA <- factor(replace(as.character(GA2010messer$NDIQuart), 
                                            GA2010messer$NDIQuart == "9-NDI not avail", NA),
                                  c(levels(GA2010messer$NDIQuart)[-5], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer, 
                   ggplot2::aes(fill = NDIQuartNA),
                   size = 0.05,
                   color = "transparent") +
   ggplot2::geom_sf(data = county2010GA,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
                   subtitle = "GA census tracts as the referent")

The results above are at the tract level. The NDI (Messer) values can also be calculated at the county level.

messer2010GA_county <- ndi::messer(geo = "county", state = "GA", year = 2010)

# Join the NDI (Messer) values to the county geometry
GA2010messer_county <- dplyr::left_join(county2010GA, messer2010GA_county$ndi, by = "GEOID")
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer_county, 
                   ggplot2::aes(fill = NDI),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
                   subtitle = "GA counties as the referent")

## Categorical Index

### Rename "9-NDI not avail" level as NA for plotting
GA2010messer_county$NDIQuartNA <- factor(replace(as.character(GA2010messer_county$NDIQuart), 
                                            GA2010messer_county$NDIQuart == "9-NDI not avail", NA),
                                         c(levels(GA2010messer_county$NDIQuart)[-5], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer_county, 
                   ggplot2::aes(fill = NDIQuartNA),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
                   subtitle = "GA counties as the referent")

Compute NDI (Powell-Wiley)

Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts. This metric is based on Andrews et al. (2020) and Slotman et al. (2022) with socio-economic status (SES) variables chosen by Roux and Mair (2010):

Characteristic SES dimension ACS table source Description
MedHHInc Wealth and income B19013 Median household income (dollars)
PctRecvIDR Wealth and income B19054 Percent of households receiving dividends, interest, or rental income
PctPubAsst Wealth and income B19058 Percent of households receiving public assistance
MedHomeVal Wealth and income B25077 Median home value (dollars)
PctMgmtBusSciArt Occupation C24060 Percent in a management, business, science, or arts occupation
PctFemHeadKids Occupation B11005 Percent of households that are female headed with any children under 18 years
PctOwnerOcc Housing conditions DP04 Percent of housing units that are owner occupied
PctNoPhone Housing conditions DP04 Percent of households without a telephone
PctNComPlmb Housing conditions DP04 Percent of households without complete plumbing facilities
PctEducHSPlus Education S1501 Percent with a high school degree or higher (population 25 years and over)
PctEducBchPlus Education S1501 Percent with a college degree or higher (population 25 years and over)
PctFamBelowPov Wealth and income S1702 Percent of families with incomes below the poverty level
PctUnempl Occupation S2301 Percent unemployed

More information about the codebook and computation of the NDI (Powell-Wiley) can be found on a GIS Portal for Cancer Research website.

powell_wiley2020DMVW <- ndi::powell_wiley(state = c("DC", "MD", "VA", "WV"), year = 2020, round_output = TRUE)

One output from the powell_wiley() function is a tibble containing the identification, geographic name, NDI (Powell-Wiley) values, and raw census characteristics for each tract.

powell_wiley2020DMVW$ndi
## # A tibble: 4,425 × 20
##    GEOID       state  county tract   NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
##    <chr>       <chr>  <chr>  <chr> <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>
##  1 11001000101 Distr… Distr… 1.01  -2.13 1-Leas…  187839    50.9     0.8  699100
##  2 11001000102 Distr… Distr… 1.02  -2.46 1-Leas…  184167    52.2     0.6 1556000
##  3 11001000201 Distr… Distr… 2.01  NA    9-NDI …      NA   NaN     NaN        NA
##  4 11001000202 Distr… Distr… 2.02  -2.30 1-Leas…  164261    49.6     0.9 1309100
##  5 11001000300 Distr… Distr… 3     -2.06 1-Leas…  156483    46       0.6  976500
##  6 11001000400 Distr… Distr… 4     -2.09 1-Leas…  153397    47.8     0   1164200
##  7 11001000501 Distr… Distr… 5.01  -2.11 1-Leas…  119911    44.5     0.8  674600
##  8 11001000502 Distr… Distr… 5.02  -2.21 1-Leas…  153264    46.8     0.5 1012500
##  9 11001000600 Distr… Distr… 6     -2.16 1-Leas…  154266    60.8     7.4 1109800
## 10 11001000702 Distr… Distr… 7.02  -1.20 1-Leas…   71747    22.9     0    289900
## # … with 4,415 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
## #   PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## #   PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## #   PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
## #   variable names ¹​NDIQuint, ²​MedHHInc, ³​PctRecvIDR, ⁴​PctPubAsst, ⁵​MedHomeVal

A second output from the powell_wiley() function is the results from the principal component analysis used to compute the NDI (Powell-Wiley) values.

powell_wiley2020DMVW$pca
## $loadings
## 
## Loadings:
##                 PC1    PC2    PC3   
## logMedHHInc     -0.638 -0.364       
## PctNoIDRZ        0.612  0.319       
## PctPubAsstZ      0.379  0.615       
## logMedHomeVal   -0.893              
## PctWorkClassZ    0.974              
## PctFemHeadKidsZ  0.128  0.697 -0.233
## PctNotOwnerOccZ -0.375  0.923       
## PctNoPhoneZ             0.329  0.524
## PctNComPlmbZ           -0.141  0.869
## PctEducLTHSZ     0.642  0.164       
## PctEducLTBchZ    1.020 -0.121       
## PctFamBelowPovZ  0.219  0.698       
## PctUnemplZ              0.596       
## 
##                  PC1   PC2   PC3
## SS loadings    4.340 2.971 1.102
## Proportion Var 0.334 0.229 0.085
## Cumulative Var 0.334 0.562 0.647
## 
## $rotmat
##             [,1]       [,2]       [,3]
## [1,]  0.68516447  0.4403017 0.04517229
## [2,] -0.95446519  1.0806634 0.03196818
## [3,] -0.09078904 -0.2028145 1.02053725
## 
## $rotation
## [1] "promax"
## 
## $Phi
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5250277 0.1649550
## [2,] 0.5250277 1.0000000 0.1923867
## [3,] 0.1649550 0.1923867 1.0000000
## 
## $Structure
##             [,1]        [,2]        [,3]
##  [1,] -0.8432264 -0.71542812 -0.26037289
##  [2,]  0.7750210  0.63477178  0.13357439
##  [3,]  0.7001489  0.81237803  0.17181801
##  [4,] -0.8931597 -0.46211477 -0.19777858
##  [5,]  0.9253217  0.42037509  0.12424330
##  [6,]  0.4559503  0.71962940 -0.07780352
##  [7,]  0.1195748  0.73799969  0.17788881
##  [8,]  0.2552300  0.42753150  0.58693047
##  [9,]  0.1155170  0.05008712  0.84948765
## [10,]  0.7325510  0.50620197  0.16403994
## [11,]  0.9557341  0.41335682  0.13787215
## [12,]  0.5881663  0.81650181  0.18717861
## [13,]  0.3841044  0.63036827  0.09925687
## 
## $communality
##  [1] 0.5468870 0.4774810 0.5220533 0.8012746 0.9576793 0.5566938 0.9968490
##  [8] 0.3829550 0.7774209 0.4398519 1.0560478 0.5359573 0.3616523
## 
## $uniqueness
##     logMedHHInc       PctNoIDRZ     PctPubAsstZ   logMedHomeVal   PctWorkClassZ 
##     0.453113047     0.522518997     0.477946655     0.198725386     0.042320711 
## PctFemHeadKidsZ PctNotOwnerOccZ     PctNoPhoneZ    PctNComPlmbZ    PctEducLTHSZ 
##     0.443306153     0.003150984     0.617045044     0.222579117     0.560148142 
##   PctEducLTBchZ PctFamBelowPovZ      PctUnemplZ 
##    -0.056047809     0.464042693     0.638347726 
## 
## $Vaccounted
##                            [,1]      [,2]       [,3]
## SS loadings           4.5987837 3.2131788 1.10386926
## Proportion Var        0.3537526 0.2471676 0.08491302
## Cumulative Var        0.3537526 0.6009202 0.68583321
## Proportion Explained  0.5157997 0.3603902 0.12381002
## Cumulative Proportion 0.5157997 0.8761900 1.00000000

A third output from the powell_wiley() function is a tibble containing a breakdown of the missingness of the census characteristics used to compute the NDI (Powell-Wiley) values.

powell_wiley2020DMVW$missing
## # A tibble: 13 × 4
##    variable        total n_missing percent_missing
##    <chr>           <int>     <int> <chr>          
##  1 PctEducLTBchZ    4425        47 1.06 %         
##  2 PctEducLTHSZ     4425        47 1.06 %         
##  3 PctFamBelowPovZ  4425        63 1.42 %         
##  4 PctFemHeadKidsZ  4425        60 1.36 %         
##  5 PctNComPlmbZ     4425        60 1.36 %         
##  6 PctNoIDRZ        4425        60 1.36 %         
##  7 PctNoPhoneZ      4425        60 1.36 %         
##  8 PctNotOwnerOccZ  4425        60 1.36 %         
##  9 PctPubAsstZ      4425        60 1.36 %         
## 10 PctUnemplZ       4425        57 1.29 %         
## 11 PctWorkClassZ    4425        57 1.29 %         
## 12 logMedHHInc      4425        73 1.65 %         
## 13 logMedHomeVal    4425       148 3.34 %

A fourth output from the powell_wiley() function is a character string or numeric value of a standardized Cronbach’s alpha. A value greater than 0.7 is desired.

powell_wiley2020DMVW$cronbach
## [1] 0.931138

We can visualize the NDI (Powell-Wiley) values geographically by linking them to spatial information from the tigris package and plotting with the ggplot2 package suite.

# Obtain the 2020 counties from the "tigris" package
county2020 <- tigris::counties(cb = TRUE)
county2020DMVW <- county2020[county2020$STUSPS %in% c("DC", "MD", "VA", "WV"), ]

# Obtain the 2020 census tracts from the "tigris" package
tract2020D <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
tract2020M <- tigris::tracts(state = "MD", year = 2020, cb = TRUE)
tract2020V <- tigris::tracts(state = "VA", year = 2020, cb = TRUE)
tract2020W <- tigris::tracts(state = "WV", year = 2020, cb = TRUE)
tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)

# Join the NDI (Powell-Wiley) values to the census tract geometry
DMVW2020pw <- dplyr::left_join(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID")
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) 
## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts 
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020pw, 
                   ggplot2::aes(fill = NDI), 
                   color = NA) +
  ggplot2::geom_sf(data = county2020DMVW,
                   fill = "transparent", 
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c(na.value = "grey80") +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
                   subtitle = "DC, MD, VA, and WV tracts as the referent")

## Categorical Index (Population-weighted quintiles)
### Rename "9-NDI not avail" level as NA for plotting
DMVW2020pw$NDIQuintNA <- factor(replace(as.character(DMVW2020pw$NDIQuint), 
                                        DMVW2020pw$NDIQuint == "9-NDI not avail", NA),
                                c(levels(DMVW2020pw$NDIQuint)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020pw, 
                   ggplot2::aes(fill = NDIQuintNA), 
                   color = NA) +
  ggplot2::geom_sf(data = county2020DMVW,
                   fill = "transparent", 
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
                   subtitle = "DC, MD, VA, and WV tracts as the referent")

Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley).

# Obtain the 2020 counties from the "tigris" package
county2020DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)

# NDI (Powell-Wiley) at the county level (2016-2020)
powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county",
                                                 state = c("DC", "MD", "VA", "WV"),
                                                 year = 2020)

# Join the NDI (Powell-Wiley) values to the county geometry
DMVW2020pw_county <- dplyr::left_join(county2020DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID")
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020pw_county, 
                   ggplot2::aes(fill = NDI),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
                   subtitle = "DC, MD, VA, and WV counties as the referent")

## Categorical Index

### Rename "9-NDI not avail" level as NA for plotting
DMVW2020pw_county$NDIQuintNA <- factor(replace(as.character(DMVW2020pw_county$NDIQuint), 
                                            DMVW2020pw_county$NDIQuint == "9-NDI not avail", NA),
                                         c(levels(DMVW2020pw_county$NDIQuint)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020pw_county, 
                   ggplot2::aes(fill = NDIQuint),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
                   subtitle = "DC, MD, VA, and WV counties as the referent")

Advanced Features

Imputing missing census variables

In the messer() and powell_wiley() functions, missing census characteristics can be imputed using the missing and impute arguments of the pca() function in the psych package called within the messer() and powell_wiley() functions. Impute values using the logical imp argument (currently only calls impute = "median" by default, which assigns the median values of each missing census variable for a geography).

powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020) # without imputation
powell_wiley2020DCi <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation

table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values
table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values

# Obtain the 2020 census tracts from the "tigris" package
tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)

# Join the NDI (Powell-Wiley) values to the census tract geometry
DC2020pw <- dplyr::left_join(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
DC2020pw <- dplyr::left_join(DC2020pw, powell_wiley2020DCi$ndi, by = "GEOID", suffix = c("_nonimp", "_imp"))
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C., census tracts
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020pw, 
                   ggplot2::aes(fill = NDI_nonimp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed",
                   subtitle = "DC census tracts as the referent")

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020pw, 
                   ggplot2::aes(fill = NDI_imp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed",
                   subtitle = "DC census tracts as the referent")

## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
DC2020pw$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020pw$NDIQuint_nonimp), 
                                            DC2020pw$NDIQuint_nonimp == "9-NDI not avail", NA),
                                         c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020pw, 
                   ggplot2::aes(fill = NDIQuintNA_nonimp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed",
                   subtitle = "DC census tracts as the referent")

### Rename "9-NDI not avail" level as NA for plotting
DC2020pw$NDIQuintNA_imp <- factor(replace(as.character(DC2020pw$NDIQuint_imp), 
                                            DC2020pw$NDIQuint_imp == "9-NDI not avail", NA),
                                      c(levels(DC2020pw$NDIQuint_imp)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020pw, 
                   ggplot2::aes(fill = NDIQuintNA_imp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed",
                   subtitle = "DC census tracts as the referent")

Assign the referent (U.S.-Standardized Metric)

To conduct a contiguous US-standardized index, compute an NDI for all states as in the example below that replicates the nationally standardized NDI (Powell-Wiley) values (2013-2017 ACS-5) found in Slotman et al. (2022) and available from a GIS Portal for Cancer Research website. To replicate the nationally standardized NDI (Powell-Wiley) values (2006-2010 ACS-5) found in Andrews et al. (2020) change the year argument to 2010 (i.e., year = 2010).

us <- tigris::states()
n51 <- c("Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa",
         "Puerto Rico", "United States Virgin Islands")
y51 <- us$STUSPS[!(us$NAME %in% n51)]

start_time <- Sys.time() # record start time
powell_wiley2017US <- ndi::powell_wiley(state = y51, year = 2017)
end_time <- Sys.time() # record end time
time_srr <- end_time - start_time # Calculate run time
ggplot2::ggplot(powell_wiley2017US$ndi, 
                ggplot2::aes(x = NDI)) +
  ggplot2::geom_histogram(color = "black",
                          fill = "white") + 
  ggplot2::theme_minimal() +
  ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)",
                   subtitle = "U.S. census tracts as the referent (including AK, HI, and DC)")

The process to compute a US-standardized NDI (Powell-Wiley) took about 2.5 minutes to run on a machine with the features listed at the end of the vignette.

Additional metrics socio-economic deprivation and disparity

Since version v0.1.1, the ndi package can compute additional metrics of socio-economic deprivation and disparity beyond neighborhood deprivation indices with data from the ACS-5, including:

  1. anthopolos() function that computes the Racial Isolation Index (RI) based on Anthopolos et al. (2011)
  2. bravo() function that computes the Educational Isolation Index (EI) based on Bravo et al. (2021)
  3. gini() function that retrieves the Gini Index based on Gini (1921)
  4. krieger() function that computes the Index of Concentration at the Extremes based on based on Feldman et al. (2015) and Krieger et al. (2016)
  5. duncan() function that computes the Dissimilarity Index based on Duncan & Duncan (1955)
  6. atkinson() function that computes the Atkinson Index based on Atkinson (1970)
  7. bell() function that computes the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954)
  8. white() function that computes the aspatial racial/ethnic Correlation Ratio based on Bell (1954) and White (1986)
  9. sudano() function that computes the aspatial racial/ethnic Location Quotient based on Merton (1939) and Sudano et al. (2013)
  10. bemanian_beyer() function that computes the aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017)

Compute Racial Isolation Index (RI)

Compute the spatial RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on Anthopolos et al. (2011) that assessed the racial isolation of the population that identifies as non-Hispanic or Latino, Black or African American alone. Multiple racial/ethnic subgroups are available in the anthopolos() function, including:

ACS table source racial/ethnic subgroup character for subgroup argument
B03002_002 not Hispanic or Latino NHoL
B03002_003 not Hispanic or Latino, white alone NHoLW
B03002_004 not Hispanic or Latino, Black or African American alone NHoLB
B03002_005 not Hispanic or Latino, American Indian and Alaska Native alone NHoLAIAN
B03002_006 not Hispanic or Latino, Asian alone NHoLA
B03002_007 not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone NHoLNHOPI
B03002_008 not Hispanic or Latino, some other race alone NHoLSOR
B03002_009 not Hispanic or Latino, two or more races NHoLTOMR
B03002_010 not Hispanic or Latino, two races including some other race NHoLTRiSOR
B03002_011 not Hispanic or Latino, two races excluding some other race, and three or more races NHoLTReSOR
B03002_012 Hispanic or Latino HoL
B03002_013 Hispanic or Latino, white alone HoLW
B03002_014 Hispanic or Latino, Black or African American alone HoLB
B03002_015 Hispanic or Latino, American Indian and Alaska Native alone HoLAIAN
B03002_016 Hispanic or Latino, Asian alone HoLA
B03002_017 Hispanic or Latino, Native Hawaiian and other Pacific Islander alone HoLNHOPI
B03002_018 Hispanic or Latino, some other race alone HoLSOR
B03002_019 Hispanic or Latino, two or more races HoLTOMR
B03002_020 Hispanic or Latino, two races including some other race HoLTRiSOR
B03002_021 Hispanic or Latino, two races excluding some other race, and three or more races HoLTReSOR

A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., Not Hispanic or Latino, Black or African American alone) will have an RI value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not Not Hispanic or Latino, Black or African American alone) will have an RI value close to 0.

anthopolos2010NC <- ndi::anthopolos(state = "NC", year = 2010, subgroup = "NHoLB")

# Obtain the 2010 census tracts from the "tigris" package
tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 

# Obtain the 2010 counties from the "tigris" package
county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)

# Join the RI values to the census tract geometry
NC2010anthopolos <- dplyr::left_join(tract2010NC, anthopolos2010NC$ri, by = "GEOID")
# Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = NC2010anthopolos, 
                   ggplot2::aes(fill = RI),
                   size = 0.05,
                   color = "transparent") +
   ggplot2::geom_sf(data = county2010NC,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
                   subtitle = "NC census tracts (not corrected for edge effects)")

The current version of the ndi package does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies, and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest.

# Compute RI for all census tracts in neighboring states
anthopolos2010GNSTV <- ndi::anthopolos(state = c("GA", "NC", "SC", "TN", "VA"),
                                     year = 2010, subgroup = "NHoLB")

# Crop to only North Carolina, U.S.A. census tracts
anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% anthopolos2010NC$ri$GEOID, ]

# Obtain the 2010 census tracts from the "tigris" package
tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 

# Obtain the 2010 counties from the "tigris" package
county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)

# Join the RI values to the census tract geometry
edgeNC2010anthopolos <- dplyr::left_join(tract2010NC, anthopolos2010NCe, by = "GEOID")
# Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = edgeNC2010anthopolos, 
                   ggplot2::aes(fill = RI),
                   size = 0.05,
                   color = "transparent") +
   ggplot2::geom_sf(data = county2010NC,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
                   subtitle = "NC census tracts (corrected for interstate edge effects)")

Compute Educational Isolation Index (EI)

Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on Bravo et al. (2021) that assessed the educational isolation of the population without a four-year college degree. Multiple educational attainment categories are available in the bravo() function, including:

ACS table source educational attainment category character for subgroup argument
B06009_002 less than high school graduate LtHS
B06009_003 high school graduate (includes equivalency) HSGiE
B06009_004 some college or associate’s degree SCoAD
B06009_005 Bachelor’s degree BD
B06009_006 graduate or professional degree GoPD

Note: The ACS-5 data (2005-2009) uses the “B15002” question.

A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI (Bravo) value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI (Bravo) value close to 0.

bravo2010NC <- ndi::bravo(state = "NC", year = 2010, subgroup = c("LtHS", "HSGiE", "SCoAD"))

# Obtain the 2010 census tracts from the "tigris" package
tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 

# Obtain the 2010 counties from the "tigris" package
county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)

# Join the EI (Bravo) values to the census tract geometry
NC2010bravo <- dplyr::left_join(tract2010NC, bravo2010NC$ei, by = "GEOID")
# Visualize the EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = NC2010bravo, 
                   ggplot2::aes(fill = EI),
                   size = 0.05,
                   color = "transparent") +
   ggplot2::geom_sf(data = county2010NC,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Educational Isolation Index (Bravo), without a four-year college degree",
                   subtitle = "NC census tracts (not corrected for edge effects)")

Can correct one source of edge effect in the same manner as shown for the RI metric.

Retrieve the Gini Index

Retrieve the aspatial Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on Gini (1921), and the gini() function retrieves the estimate from the ACS-5.

According to the U.S. Census Bureau: “The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.”

gini2010NC <- ndi::gini(state = "NC", year = 2010)

# Obtain the 2010 census tracts from the "tigris" package
tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 

# Obtain the 2010 counties from the "tigris" package
county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)

# Join the Gini Index values to the census tract geometry
NC2010gini <- dplyr::left_join(tract2010NC, gini2010NC$gini, by = "GEOID")
# Visualize the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = NC2010gini, 
                   ggplot2::aes(fill = gini),
                   size = 0.05,
                   color = "transparent") +
   ggplot2::geom_sf(data = county2010NC,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Gini Index",
                   subtitle = "NC census tracts")

Index of Concentration at the Extremes (ICE)

Compute the aspatial Index of Concentration at the Extremes values (2006-2010 5-year ACS) for Wayne County, Michigan, U.S.A., census tracts. Wayne County is the home of Detroit, Michigan, a highly segregated city in the U.S. This metric is based on Feldman et al. (2015) and Krieger et al. (2016) who expanded the metric designed by Massey in a chapter of Booth & Crouter (2001) initially designed for residential segregation. The krieger() function computes five ICE metrics using the following ACS-5 groups:

ACS table group ICE metric Comparison
B19001 Income, “ICE_inc” 80th income percentile vs. 20th income percentile
B15002 Education, “ICE_edu” less than high school vs. four-year college degree or more
B03002 Race/Ethnicity, “ICE_rewb” 80th income percentile vs. 20th income percentile
B19001 & B19001B & B19001H Income and race/ethnicity combined, “ICE_wbinc” white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile
B19001 & B19001H Income and race/ethnicity combined, “ICE_wpcinc” white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile

ICE metrics can range in value from −1 (most deprived) to 1 (most privileged). A value of 0 can thus represent two possibilities: (1) none of the residents are in the most privileged or most deprived categories, or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups.

ice2020WC <- krieger(state = "MI", county = "Wayne", year = 2010)

# Obtain the 2010 census tracts from the "tigris" package
tract2010WC <- tigris::tracts(state = "MI", county = "Wayne", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010WC$GEOID <- substring(tract2010WC$GEO_ID, 10) 

# Join the ICE values to the census tract geometry
ice2020WC <- dplyr::left_join(tract2010WC, ice2020WC$ice, by = "GEOID")
# Plot ICE for Income
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = ice2020WC, 
                   ggplot2::aes(fill = ICE_inc),
                   color = "white",
                   size = 0.05) +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
  ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome (Krieger)",
                   subtitle = "80th income percentile vs. 20th income percentile")

# Plot ICE for Education
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = ice2020WC, 
                   ggplot2::aes(fill = ICE_edu),
                   color = "white",
                   size = 0.05) +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
  ggplot2::ggtitle("Index of Concentration at the Extremes\nEducation (Krieger)",
                   subtitle = "less than high school vs. four-year college degree or more")

# Plot ICE for Race/Ethnicity
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = ice2020WC, 
                   ggplot2::aes(fill = ICE_rewb),
                   color = "white",
                   size = 0.05) +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
  ggplot2::ggtitle("Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)",
                   subtitle = "white non-Hispanic vs. black non-Hispanic")

# Plot ICE for Income and Race/Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. black (including Hispanic) in 20th income percentile
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = ice2020WC, 
                   ggplot2::aes(fill = ICE_wbinc),
                   color = "white",
                   size = 0.05) +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
  ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)",
                   subtitle = "white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl")

# Plot ICE for Income and Race/Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = ice2020WC, 
                   ggplot2::aes(fill = ICE_wpcinc),
                   color = "white",
                   size = 0.05) +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
  ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)",
                   subtitle = "white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl")