Start with the necessary packages for the vignette.
<- c("dplyr", "ggplot2", "ndi", "tidycensus", "tigris")
loadedPackages invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)
Set your U.S. Census Bureau access key. Follow this link to
obtain one. Specify your access key in the messer()
or
powell_wiley()
functions using the key
argument of the get_acs()
function from the
tidycensus
package called within each or by using the
census_api_key()
function from the tidycensus
package before running the messer()
or
powell_wiley()
functions (see an example of the latter
below).
::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API tidycensus
Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on Messer et al. (2006) with the following socio-economic status (SES) variables:
Characteristic | SES dimension | ACS table source | Description |
---|---|---|---|
OCC | Occupation | C24030 | Percent males in management, science, and arts occupation |
CWD | Housing | B25014 | Percent of crowded housing |
POV | Poverty | B17017 | Percent of households in poverty |
FHH | Poverty | B25115 | Percent of female headed households with dependents |
PUB | Poverty | B19058 | Percent of households on public assistance |
U30 | Poverty | B19001 | Percent households earning <$30,000 per year |
EDU | Education | B06009 | Percent earning less than a high school education |
EMP | Employment | B23001 (2010 only); B23025 (2011 onward) | Percent unemployed |
<- ndi::messer(state = "GA", year = 2010, round_output = TRUE) messer2010GA
One output from the messer()
function is a tibble
containing the identification, geographic name, NDI (Messer) values, and
raw census characteristics for each tract.
$ndi messer2010GA
## # A tibble: 1,969 × 14
## GEOID state county tract NDI NDIQu…¹ OCC CWD POV FHH PUB U30
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 13001… Geor… Appli… 9501 -0.0075 2-Belo… 0 0 0.1 0.1 0.1 0.3
## 2 13001… Geor… Appli… 9502 0.0458 4-Most… 0 0 0.3 0.1 0.2 0.5
## 3 13001… Geor… Appli… 9503 0.0269 3-Abov… 0 0 0.2 0 0.2 0.4
## 4 13001… Geor… Appli… 9504 -0.0083 2-Belo… 0 0 0.1 0 0.1 0.3
## 5 13001… Geor… Appli… 9505 0.0231 3-Abov… 0 0 0.2 0 0.2 0.4
## 6 13003… Geor… Atkin… 9601 0.0619 4-Most… 0 0.1 0.2 0.2 0.2 0.5
## 7 13003… Geor… Atkin… 9602 0.0593 4-Most… 0 0.1 0.3 0.1 0.2 0.4
## 8 13003… Geor… Atkin… 9603 0.0252 3-Abov… 0 0 0.3 0.1 0.2 0.4
## 9 13005… Geor… Bacon… 9701 0.0061 3-Abov… 0 0 0.2 0 0.2 0.4
## 10 13005… Geor… Bacon… 9702… 0.0121 3-Abov… 0 0 0.2 0.1 0.1 0.5
## # … with 1,959 more rows, 2 more variables: EDU <dbl>, EMP <dbl>, and
## # abbreviated variable name ¹NDIQuart
A second output from the messer()
function is the
results from the principal component analysis used to compute the NDI
(Messer) values.
$pca messer2010GA
## Principal Components Analysis
## Call: psych::principal(r = ndi_data_pca, nfactors = 1, n.obs = nrow(ndi_data_pca),
## covar = FALSE, scores = TRUE, missing = imp)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## OCC -0.59 0.35 0.65 1
## CWD 0.47 0.22 0.78 1
## POV 0.87 0.76 0.24 1
## FHH 0.67 0.45 0.55 1
## PUB 0.89 0.79 0.21 1
## U30 0.90 0.81 0.19 1
## EDU 0.79 0.62 0.38 1
## EMP 0.46 0.21 0.79 1
##
## PC1
## SS loadings 4.21
## Proportion Var 0.53
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.11
## with the empirical chi square 1221.09 with prob < 2.3e-246
##
## Fit based upon off diagonal values = 0.95
A third output from the messer()
function is a tibble
containing a breakdown of the missingness of the census characteristics
used to compute the NDI (Messer) values.
$missing messer2010GA
## # A tibble: 8 × 4
## variable total n_missing percent_missing
## <chr> <int> <int> <chr>
## 1 CWD 1969 14 0.71 %
## 2 EDU 1969 13 0.66 %
## 3 EMP 1969 13 0.66 %
## 4 FHH 1969 14 0.71 %
## 5 OCC 1969 15 0.76 %
## 6 POV 1969 14 0.71 %
## 7 PUB 1969 14 0.71 %
## 8 U30 1969 14 0.71 %
We can visualize the NDI (Messer) values geographically by linking
them to spatial information from the tigris
package and
plotting with the ggplot2
package suite.
# Obtain the 2010 counties from the "tigris" package
<- tigris::counties(state = "GA", year = 2010, cb = TRUE)
county2010GA # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(county2010GA$GEO_ID, 10)
county2010GA
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "GA", year = 2010, cb = TRUE)
tract2010GA # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010GA$GEO_ID, 10)
tract2010GA
# Join the NDI (Messer) values to the census tract geometry
<- dplyr::left_join(tract2010GA, messer2010GA$ndi, by = "GEOID") GA2010messer
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = GA2010messer,
ggplot2::aes(fill = NDI),
ggplot2size = 0.05,
color = "transparent") +
::geom_sf(data = county2010GA,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer)",
ggplot2subtitle = "GA census tracts as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuartNA <- factor(replace(as.character(GA2010messer$NDIQuart),
GA2010messer$NDIQuart == "9-NDI not avail", NA),
GA2010messerc(levels(GA2010messer$NDIQuart)[-5], NA))
::ggplot() +
ggplot2::geom_sf(data = GA2010messer,
ggplot2::aes(fill = NDIQuartNA),
ggplot2size = 0.05,
color = "transparent") +
::geom_sf(data = county2010GA,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
ggplot2subtitle = "GA census tracts as the referent")
The results above are at the tract level. The NDI (Messer) values can also be calculated at the county level.
<- ndi::messer(geo = "county", state = "GA", year = 2010)
messer2010GA_county
# Join the NDI (Messer) values to the county geometry
<- dplyr::left_join(county2010GA, messer2010GA_county$ndi, by = "GEOID") GA2010messer_county
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = GA2010messer_county,
ggplot2::aes(fill = NDI),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer)",
ggplot2subtitle = "GA counties as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuartNA <- factor(replace(as.character(GA2010messer_county$NDIQuart),
GA2010messer_county$NDIQuart == "9-NDI not avail", NA),
GA2010messer_countyc(levels(GA2010messer_county$NDIQuart)[-5], NA))
::ggplot() +
ggplot2::geom_sf(data = GA2010messer_county,
ggplot2::aes(fill = NDIQuartNA),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
ggplot2subtitle = "GA counties as the referent")
Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts. This metric is based on Andrews et al. (2020) and Slotman et al. (2022) with socio-economic status (SES) variables chosen by Roux and Mair (2010):
Characteristic | SES dimension | ACS table source | Description |
---|---|---|---|
MedHHInc | Wealth and income | B19013 | Median household income (dollars) |
PctRecvIDR | Wealth and income | B19054 | Percent of households receiving dividends, interest, or rental income |
PctPubAsst | Wealth and income | B19058 | Percent of households receiving public assistance |
MedHomeVal | Wealth and income | B25077 | Median home value (dollars) |
PctMgmtBusSciArt | Occupation | C24060 | Percent in a management, business, science, or arts occupation |
PctFemHeadKids | Occupation | B11005 | Percent of households that are female headed with any children under 18 years |
PctOwnerOcc | Housing conditions | DP04 | Percent of housing units that are owner occupied |
PctNoPhone | Housing conditions | DP04 | Percent of households without a telephone |
PctNComPlmb | Housing conditions | DP04 | Percent of households without complete plumbing facilities |
PctEducHSPlus | Education | S1501 | Percent with a high school degree or higher (population 25 years and over) |
PctEducBchPlus | Education | S1501 | Percent with a college degree or higher (population 25 years and over) |
PctFamBelowPov | Wealth and income | S1702 | Percent of families with incomes below the poverty level |
PctUnempl | Occupation | S2301 | Percent unemployed |
More information about the codebook and computation of the NDI (Powell-Wiley) can be found on a GIS Portal for Cancer Research website.
<- ndi::powell_wiley(state = c("DC", "MD", "VA", "WV"), year = 2020, round_output = TRUE) powell_wiley2020DMVW
One output from the powell_wiley()
function is a tibble
containing the identification, geographic name, NDI (Powell-Wiley)
values, and raw census characteristics for each tract.
$ndi powell_wiley2020DMVW
## # A tibble: 4,425 × 20
## GEOID state county tract NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 11001000101 Distr… Distr… 1.01 -2.13 1-Leas… 187839 50.9 0.8 699100
## 2 11001000102 Distr… Distr… 1.02 -2.46 1-Leas… 184167 52.2 0.6 1556000
## 3 11001000201 Distr… Distr… 2.01 NA 9-NDI … NA NaN NaN NA
## 4 11001000202 Distr… Distr… 2.02 -2.30 1-Leas… 164261 49.6 0.9 1309100
## 5 11001000300 Distr… Distr… 3 -2.06 1-Leas… 156483 46 0.6 976500
## 6 11001000400 Distr… Distr… 4 -2.09 1-Leas… 153397 47.8 0 1164200
## 7 11001000501 Distr… Distr… 5.01 -2.11 1-Leas… 119911 44.5 0.8 674600
## 8 11001000502 Distr… Distr… 5.02 -2.21 1-Leas… 153264 46.8 0.5 1012500
## 9 11001000600 Distr… Distr… 6 -2.16 1-Leas… 154266 60.8 7.4 1109800
## 10 11001000702 Distr… Distr… 7.02 -1.20 1-Leas… 71747 22.9 0 289900
## # … with 4,415 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
## # PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## # PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## # PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
## # variable names ¹NDIQuint, ²MedHHInc, ³PctRecvIDR, ⁴PctPubAsst, ⁵MedHomeVal
A second output from the powell_wiley()
function is the
results from the principal component analysis used to compute the NDI
(Powell-Wiley) values.
$pca powell_wiley2020DMVW
## $loadings
##
## Loadings:
## PC1 PC2 PC3
## logMedHHInc -0.638 -0.364
## PctNoIDRZ 0.612 0.319
## PctPubAsstZ 0.379 0.615
## logMedHomeVal -0.893
## PctWorkClassZ 0.974
## PctFemHeadKidsZ 0.128 0.697 -0.233
## PctNotOwnerOccZ -0.375 0.923
## PctNoPhoneZ 0.329 0.524
## PctNComPlmbZ -0.141 0.869
## PctEducLTHSZ 0.642 0.164
## PctEducLTBchZ 1.020 -0.121
## PctFamBelowPovZ 0.219 0.698
## PctUnemplZ 0.596
##
## PC1 PC2 PC3
## SS loadings 4.340 2.971 1.102
## Proportion Var 0.334 0.229 0.085
## Cumulative Var 0.334 0.562 0.647
##
## $rotmat
## [,1] [,2] [,3]
## [1,] 0.68516447 0.4403017 0.04517229
## [2,] -0.95446519 1.0806634 0.03196818
## [3,] -0.09078904 -0.2028145 1.02053725
##
## $rotation
## [1] "promax"
##
## $Phi
## [,1] [,2] [,3]
## [1,] 1.0000000 0.5250277 0.1649550
## [2,] 0.5250277 1.0000000 0.1923867
## [3,] 0.1649550 0.1923867 1.0000000
##
## $Structure
## [,1] [,2] [,3]
## [1,] -0.8432264 -0.71542812 -0.26037289
## [2,] 0.7750210 0.63477178 0.13357439
## [3,] 0.7001489 0.81237803 0.17181801
## [4,] -0.8931597 -0.46211477 -0.19777858
## [5,] 0.9253217 0.42037509 0.12424330
## [6,] 0.4559503 0.71962940 -0.07780352
## [7,] 0.1195748 0.73799969 0.17788881
## [8,] 0.2552300 0.42753150 0.58693047
## [9,] 0.1155170 0.05008712 0.84948765
## [10,] 0.7325510 0.50620197 0.16403994
## [11,] 0.9557341 0.41335682 0.13787215
## [12,] 0.5881663 0.81650181 0.18717861
## [13,] 0.3841044 0.63036827 0.09925687
##
## $communality
## [1] 0.5468870 0.4774810 0.5220533 0.8012746 0.9576793 0.5566938 0.9968490
## [8] 0.3829550 0.7774209 0.4398519 1.0560478 0.5359573 0.3616523
##
## $uniqueness
## logMedHHInc PctNoIDRZ PctPubAsstZ logMedHomeVal PctWorkClassZ
## 0.453113047 0.522518997 0.477946655 0.198725386 0.042320711
## PctFemHeadKidsZ PctNotOwnerOccZ PctNoPhoneZ PctNComPlmbZ PctEducLTHSZ
## 0.443306153 0.003150984 0.617045044 0.222579117 0.560148142
## PctEducLTBchZ PctFamBelowPovZ PctUnemplZ
## -0.056047809 0.464042693 0.638347726
##
## $Vaccounted
## [,1] [,2] [,3]
## SS loadings 4.5987837 3.2131788 1.10386926
## Proportion Var 0.3537526 0.2471676 0.08491302
## Cumulative Var 0.3537526 0.6009202 0.68583321
## Proportion Explained 0.5157997 0.3603902 0.12381002
## Cumulative Proportion 0.5157997 0.8761900 1.00000000
A third output from the powell_wiley()
function is a
tibble containing a breakdown of the missingness of the census
characteristics used to compute the NDI (Powell-Wiley) values.
$missing powell_wiley2020DMVW
## # A tibble: 13 × 4
## variable total n_missing percent_missing
## <chr> <int> <int> <chr>
## 1 PctEducLTBchZ 4425 47 1.06 %
## 2 PctEducLTHSZ 4425 47 1.06 %
## 3 PctFamBelowPovZ 4425 63 1.42 %
## 4 PctFemHeadKidsZ 4425 60 1.36 %
## 5 PctNComPlmbZ 4425 60 1.36 %
## 6 PctNoIDRZ 4425 60 1.36 %
## 7 PctNoPhoneZ 4425 60 1.36 %
## 8 PctNotOwnerOccZ 4425 60 1.36 %
## 9 PctPubAsstZ 4425 60 1.36 %
## 10 PctUnemplZ 4425 57 1.29 %
## 11 PctWorkClassZ 4425 57 1.29 %
## 12 logMedHHInc 4425 73 1.65 %
## 13 logMedHomeVal 4425 148 3.34 %
A fourth output from the powell_wiley()
function is a
character string or numeric value of a standardized Cronbach’s alpha. A
value greater than 0.7 is desired.
$cronbach powell_wiley2020DMVW
## [1] 0.931138
We can visualize the NDI (Powell-Wiley) values geographically by
linking them to spatial information from the tigris
package
and plotting with the ggplot2
package suite.
# Obtain the 2020 counties from the "tigris" package
<- tigris::counties(cb = TRUE)
county2020 <- county2020[county2020$STUSPS %in% c("DC", "MD", "VA", "WV"), ]
county2020DMVW
# Obtain the 2020 census tracts from the "tigris" package
<- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
tract2020D <- tigris::tracts(state = "MD", year = 2020, cb = TRUE)
tract2020M <- tigris::tracts(state = "VA", year = 2020, cb = TRUE)
tract2020V <- tigris::tracts(state = "WV", year = 2020, cb = TRUE)
tract2020W <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)
tracts2020DMVW
# Join the NDI (Powell-Wiley) values to the census tract geometry
<- dplyr::left_join(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID") DMVW2020pw
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = DMVW2020pw,
ggplot2::aes(fill = NDI),
ggplot2color = NA) +
::geom_sf(data = county2020DMVW,
ggplot2fill = "transparent",
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c(na.value = "grey80") +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates")+
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
ggplot2subtitle = "DC, MD, VA, and WV tracts as the referent")
## Categorical Index (Population-weighted quintiles)
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA <- factor(replace(as.character(DMVW2020pw$NDIQuint),
DMVW2020pw$NDIQuint == "9-NDI not avail", NA),
DMVW2020pwc(levels(DMVW2020pw$NDIQuint)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DMVW2020pw,
ggplot2::aes(fill = NDIQuintNA),
ggplot2color = NA) +
::geom_sf(data = county2020DMVW,
ggplot2fill = "transparent",
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates")+
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
ggplot2subtitle = "DC, MD, VA, and WV tracts as the referent")
Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley).
# Obtain the 2020 counties from the "tigris" package
<- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)
county2020DMVW
# NDI (Powell-Wiley) at the county level (2016-2020)
<- ndi::powell_wiley(geo = "county",
powell_wiley2020DMVW_county state = c("DC", "MD", "VA", "WV"),
year = 2020)
# Join the NDI (Powell-Wiley) values to the county geometry
<- dplyr::left_join(county2020DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID") DMVW2020pw_county
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = DMVW2020pw_county,
ggplot2::aes(fill = NDI),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
ggplot2subtitle = "DC, MD, VA, and WV counties as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA <- factor(replace(as.character(DMVW2020pw_county$NDIQuint),
DMVW2020pw_county$NDIQuint == "9-NDI not avail", NA),
DMVW2020pw_countyc(levels(DMVW2020pw_county$NDIQuint)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DMVW2020pw_county,
ggplot2::aes(fill = NDIQuint),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
ggplot2subtitle = "DC, MD, VA, and WV counties as the referent")
In the messer()
and powell_wiley()
functions, missing census characteristics can be imputed using the
missing
and impute
arguments of the
pca()
function in the psych
package called
within the messer()
and powell_wiley()
functions. Impute values using the logical imp
argument
(currently only calls impute = "median"
by default, which
assigns the median values of each missing census variable for a
geography).
<- ndi::powell_wiley(state = "DC", year = 2020) # without imputation
powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation
powell_wiley2020DCi
table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values
table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values
# Obtain the 2020 census tracts from the "tigris" package
<- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
tract2020DC
# Join the NDI (Powell-Wiley) values to the census tract geometry
<- dplyr::left_join(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
DC2020pw <- dplyr::left_join(DC2020pw, powell_wiley2020DCi$ndi, by = "GEOID", suffix = c("_nonimp", "_imp")) DC2020pw
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C., census tracts
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = DC2020pw,
ggplot2::aes(fill = NDI_nonimp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed",
ggplot2subtitle = "DC census tracts as the referent")
::ggplot() +
ggplot2::geom_sf(data = DC2020pw,
ggplot2::aes(fill = NDI_imp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed",
ggplot2subtitle = "DC census tracts as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020pw$NDIQuint_nonimp),
DC2020pw$NDIQuint_nonimp == "9-NDI not avail", NA),
DC2020pwc(levels(DC2020pw$NDIQuint_nonimp)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DC2020pw,
ggplot2::aes(fill = NDIQuintNA_nonimp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed",
ggplot2subtitle = "DC census tracts as the referent")
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA_imp <- factor(replace(as.character(DC2020pw$NDIQuint_imp),
DC2020pw$NDIQuint_imp == "9-NDI not avail", NA),
DC2020pwc(levels(DC2020pw$NDIQuint_imp)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DC2020pw,
ggplot2::aes(fill = NDIQuintNA_imp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed",
ggplot2subtitle = "DC census tracts as the referent")
To conduct a contiguous US-standardized index, compute an NDI for all
states as in the example below that replicates the nationally
standardized NDI (Powell-Wiley) values (2013-2017 ACS-5) found in Slotman et
al. (2022) and available from a GIS Portal for
Cancer Research website. To replicate the nationally standardized
NDI (Powell-Wiley) values (2006-2010 ACS-5) found in Andrews et
al. (2020) change the year
argument to
2010
(i.e., year = 2010
).
<- tigris::states()
us <- c("Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa",
n51 "Puerto Rico", "United States Virgin Islands")
<- us$STUSPS[!(us$NAME %in% n51)]
y51
<- Sys.time() # record start time
start_time <- ndi::powell_wiley(state = y51, year = 2017)
powell_wiley2017US <- Sys.time() # record end time
end_time <- end_time - start_time # Calculate run time time_srr
::ggplot(powell_wiley2017US$ndi,
ggplot2::aes(x = NDI)) +
ggplot2::geom_histogram(color = "black",
ggplot2fill = "white") +
::theme_minimal() +
ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)",
ggplot2subtitle = "U.S. census tracts as the referent (including AK, HI, and DC)")
The process to compute a US-standardized NDI (Powell-Wiley) took about 2.5 minutes to run on a machine with the features listed at the end of the vignette.
Since version v0.1.1, the ndi
package can compute
additional metrics of socio-economic deprivation and disparity beyond
neighborhood deprivation indices with data from the ACS-5,
including:
anthopolos()
function that computes the Racial
Isolation Index (RI) based on Anthopolos et
al. (2011)bravo()
function that computes the Educational
Isolation Index (EI) based on Bravo et
al. (2021)gini()
function that retrieves the Gini Index based on
Gini (1921)krieger()
function that computes the Index of
Concentration at the Extremes based on based on Feldman et
al. (2015) and Krieger et
al. (2016)duncan()
function that computes the Dissimilarity Index
based on Duncan & Duncan
(1955)atkinson()
function that computes the Atkinson Index
based on Atkinson
(1970)bell()
function that computes the aspatial
racial/ethnic Isolation Index (II) based on Shevky & Williams (1949;
ISBN-13:978-0-837-15637-8) and Bell (1954)white()
function that computes the aspatial
racial/ethnic Correlation Ratio based on Bell (1954) and White (1986)sudano()
function that computes the aspatial
racial/ethnic Location Quotient based on Merton (1939) and Sudano et
al. (2013)bemanian_beyer()
function that computes the aspatial
racial/ethnic Local Exposure and Isolation metric based on Bemanian &
Beyer (2017)Compute the spatial RI values (2006-2010 5-year ACS) for North
Carolina, U.S.A., census tracts. This metric is based on Anthopolos et
al. (2011) that assessed the racial isolation of the population that
identifies as non-Hispanic or Latino, Black or African American alone.
Multiple racial/ethnic subgroups are available in the
anthopolos()
function, including:
ACS table source | racial/ethnic subgroup | character for subgroup argument |
---|---|---|
B03002_002 | not Hispanic or Latino | NHoL |
B03002_003 | not Hispanic or Latino, white alone | NHoLW |
B03002_004 | not Hispanic or Latino, Black or African American alone | NHoLB |
B03002_005 | not Hispanic or Latino, American Indian and Alaska Native alone | NHoLAIAN |
B03002_006 | not Hispanic or Latino, Asian alone | NHoLA |
B03002_007 | not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone | NHoLNHOPI |
B03002_008 | not Hispanic or Latino, some other race alone | NHoLSOR |
B03002_009 | not Hispanic or Latino, two or more races | NHoLTOMR |
B03002_010 | not Hispanic or Latino, two races including some other race | NHoLTRiSOR |
B03002_011 | not Hispanic or Latino, two races excluding some other race, and three or more races | NHoLTReSOR |
B03002_012 | Hispanic or Latino | HoL |
B03002_013 | Hispanic or Latino, white alone | HoLW |
B03002_014 | Hispanic or Latino, Black or African American alone | HoLB |
B03002_015 | Hispanic or Latino, American Indian and Alaska Native alone | HoLAIAN |
B03002_016 | Hispanic or Latino, Asian alone | HoLA |
B03002_017 | Hispanic or Latino, Native Hawaiian and other Pacific Islander alone | HoLNHOPI |
B03002_018 | Hispanic or Latino, some other race alone | HoLSOR |
B03002_019 | Hispanic or Latino, two or more races | HoLTOMR |
B03002_020 | Hispanic or Latino, two races including some other race | HoLTRiSOR |
B03002_021 | Hispanic or Latino, two races excluding some other race, and three or more races | HoLTReSOR |
A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., Not Hispanic or Latino, Black or African American alone) will have an RI value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not Not Hispanic or Latino, Black or African American alone) will have an RI value close to 0.
<- ndi::anthopolos(state = "NC", year = 2010, subgroup = "NHoLB")
anthopolos2010NC
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
tract2010NC # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010NC$GEO_ID, 10)
tract2010NC
# Obtain the 2010 counties from the "tigris" package
<- tigris::counties(state = "NC", year = 2010, cb = TRUE)
county2010NC
# Join the RI values to the census tract geometry
<- dplyr::left_join(tract2010NC, anthopolos2010NC$ri, by = "GEOID") NC2010anthopolos
# Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts
::ggplot() +
ggplot2::geom_sf(data = NC2010anthopolos,
ggplot2::aes(fill = RI),
ggplot2size = 0.05,
color = "transparent") +
::geom_sf(data = county2010NC,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
ggplot2subtitle = "NC census tracts (not corrected for edge effects)")
The current version of the ndi
package does not correct
for edge effects (e.g., census geographies along the specified spatial
extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have
few neighboring census geographies, and RI values in these census
geographies may be unstable. A stop-gap solution for the former source
of edge effect is to compute the RI for neighboring census geographies
(i.e., the states bordering a study area of interest) and then use the
estimates of the study area of interest.
# Compute RI for all census tracts in neighboring states
<- ndi::anthopolos(state = c("GA", "NC", "SC", "TN", "VA"),
anthopolos2010GNSTV year = 2010, subgroup = "NHoLB")
# Crop to only North Carolina, U.S.A. census tracts
<- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% anthopolos2010NC$ri$GEOID, ]
anthopolos2010NCe
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
tract2010NC # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010NC$GEO_ID, 10)
tract2010NC
# Obtain the 2010 counties from the "tigris" package
<- tigris::counties(state = "NC", year = 2010, cb = TRUE)
county2010NC
# Join the RI values to the census tract geometry
<- dplyr::left_join(tract2010NC, anthopolos2010NCe, by = "GEOID") edgeNC2010anthopolos
# Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts
::ggplot() +
ggplot2::geom_sf(data = edgeNC2010anthopolos,
ggplot2::aes(fill = RI),
ggplot2size = 0.05,
color = "transparent") +
::geom_sf(data = county2010NC,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
ggplot2subtitle = "NC census tracts (corrected for interstate edge effects)")
Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for
North Carolina, U.S.A., census tracts. This metric is based on Bravo et al. (2021)
that assessed the educational isolation of the population without a
four-year college degree. Multiple educational attainment categories are
available in the bravo()
function, including:
ACS table source | educational attainment category | character for subgroup argument |
---|---|---|
B06009_002 | less than high school graduate | LtHS |
B06009_003 | high school graduate (includes equivalency) | HSGiE |
B06009_004 | some college or associate’s degree | SCoAD |
B06009_005 | Bachelor’s degree | BD |
B06009_006 | graduate or professional degree | GoPD |
Note: The ACS-5 data (2005-2009) uses the “B15002” question.
A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI (Bravo) value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI (Bravo) value close to 0.
<- ndi::bravo(state = "NC", year = 2010, subgroup = c("LtHS", "HSGiE", "SCoAD"))
bravo2010NC
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
tract2010NC # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010NC$GEO_ID, 10)
tract2010NC
# Obtain the 2010 counties from the "tigris" package
<- tigris::counties(state = "NC", year = 2010, cb = TRUE)
county2010NC
# Join the EI (Bravo) values to the census tract geometry
<- dplyr::left_join(tract2010NC, bravo2010NC$ei, by = "GEOID") NC2010bravo
# Visualize the EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts
::ggplot() +
ggplot2::geom_sf(data = NC2010bravo,
ggplot2::aes(fill = EI),
ggplot2size = 0.05,
color = "transparent") +
::geom_sf(data = county2010NC,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Educational Isolation Index (Bravo), without a four-year college degree",
ggplot2subtitle = "NC census tracts (not corrected for edge effects)")
Can correct one source of edge effect in the same manner as shown for the RI metric.
Retrieve the aspatial Gini Index values (2006-2010 5-year ACS) for
North Carolina, U.S.A., census tracts. This metric is based on Gini (1921), and the
gini()
function retrieves the estimate from the ACS-5.
According to the U.S. Census Bureau: “The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.”
<- ndi::gini(state = "NC", year = 2010)
gini2010NC
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
tract2010NC # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010NC$GEO_ID, 10)
tract2010NC
# Obtain the 2010 counties from the "tigris" package
<- tigris::counties(state = "NC", year = 2010, cb = TRUE)
county2010NC
# Join the Gini Index values to the census tract geometry
<- dplyr::left_join(tract2010NC, gini2010NC$gini, by = "GEOID") NC2010gini
# Visualize the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts
::ggplot() +
ggplot2::geom_sf(data = NC2010gini,
ggplot2::aes(fill = gini),
ggplot2size = 0.05,
color = "transparent") +
::geom_sf(data = county2010NC,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Gini Index",
ggplot2subtitle = "NC census tracts")
Compute the aspatial Index of Concentration at the Extremes values
(2006-2010 5-year ACS) for Wayne County, Michigan, U.S.A., census
tracts. Wayne County is the home of Detroit, Michigan, a highly
segregated city in the U.S. This metric is based on Feldman et
al. (2015) and Krieger et
al. (2016) who expanded the metric designed by Massey in a chapter
of Booth & Crouter
(2001) initially designed for residential segregation. The
krieger()
function computes five ICE metrics using the
following ACS-5 groups:
ACS table group | ICE metric | Comparison |
---|---|---|
B19001 | Income, “ICE_inc” | 80th income percentile vs. 20th income percentile |
B15002 | Education, “ICE_edu” | less than high school vs. four-year college degree or more |
B03002 | Race/Ethnicity, “ICE_rewb” | 80th income percentile vs. 20th income percentile |
B19001 & B19001B & B19001H | Income and race/ethnicity combined, “ICE_wbinc” | white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile |
B19001 & B19001H | Income and race/ethnicity combined, “ICE_wpcinc” | white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile |
ICE metrics can range in value from −1 (most deprived) to 1 (most privileged). A value of 0 can thus represent two possibilities: (1) none of the residents are in the most privileged or most deprived categories, or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups.
<- krieger(state = "MI", county = "Wayne", year = 2010)
ice2020WC
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "MI", county = "Wayne", year = 2010, cb = TRUE)
tract2010WC # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010WC$GEO_ID, 10)
tract2010WC
# Join the ICE values to the census tract geometry
<- dplyr::left_join(tract2010WC, ice2020WC$ice, by = "GEOID") ice2020WC
# Plot ICE for Income
::ggplot() +
ggplot2::geom_sf(data = ice2020WC,
ggplot2::aes(fill = ICE_inc),
ggplot2color = "white",
size = 0.05) +
::theme_bw() +
ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates")+
::ggtitle("Index of Concentration at the Extremes\nIncome (Krieger)",
ggplot2subtitle = "80th income percentile vs. 20th income percentile")
# Plot ICE for Education
::ggplot() +
ggplot2::geom_sf(data = ice2020WC,
ggplot2::aes(fill = ICE_edu),
ggplot2color = "white",
size = 0.05) +
::theme_bw() +
ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates")+
::ggtitle("Index of Concentration at the Extremes\nEducation (Krieger)",
ggplot2subtitle = "less than high school vs. four-year college degree or more")
# Plot ICE for Race/Ethnicity
::ggplot() +
ggplot2::geom_sf(data = ice2020WC,
ggplot2::aes(fill = ICE_rewb),
ggplot2color = "white",
size = 0.05) +
::theme_bw() +
ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates")+
::ggtitle("Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)",
ggplot2subtitle = "white non-Hispanic vs. black non-Hispanic")
# Plot ICE for Income and Race/Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. black (including Hispanic) in 20th income percentile
::ggplot() +
ggplot2::geom_sf(data = ice2020WC,
ggplot2::aes(fill = ICE_wbinc),
ggplot2color = "white",
size = 0.05) +
::theme_bw() +
ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates")+
::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)",
ggplot2subtitle = "white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl")
# Plot ICE for Income and Race/Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
::ggplot() +
ggplot2::geom_sf(data = ice2020WC,
ggplot2::aes(fill = ICE_wpcinc),
ggplot2color = "white",
size = 0.05) +
::theme_bw() +
ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates")+
::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)",
ggplot2subtitle = "white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl")