I am working through the process working with Census Data using the tidycensus package. I have been copying and modifying the examples to get a sense for how I can use this data.
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -82.01433 ymin: 27.86185 xmax: -81.82854 ymax: 28.17202
Geodetic CRS: NAD83
GEOID NAME variable estimate
1 12105015102 Census Tract 151.02, Polk County, Florida B19013_001 83750
2 12105010402 Census Tract 104.02, Polk County, Florida B19013_001 50721
3 12105010800 Census Tract 108, Polk County, Florida B19013_001 49265
4 12105014905 Census Tract 149.05, Polk County, Florida B19013_001 62784
5 12105011838 Census Tract 118.38, Polk County, Florida B19013_001 89300
6 12105012303 Census Tract 123.03, Polk County, Florida B19013_001 65094
moe geometry
1 6800 MULTIPOLYGON (((-81.84696 2...
2 6762 MULTIPOLYGON (((-81.95061 2...
3 18923 MULTIPOLYGON (((-81.97355 2...
4 7841 MULTIPOLYGON (((-82.01411 2...
5 16974 MULTIPOLYGON (((-81.92359 2...
6 20419 MULTIPOLYGON (((-81.95816 2...
options(scipen =999)polk_p <- polk %>%ggplot(aes(fill = estimate)) +geom_sf(color =NA) +scale_fill_viridis_c(option ="magma", labels =dollar_format()) +labs(title ="Household Income: Estimates", subtitle ="Polk County, Florida", caption ="Data source: US Census Bureau population estimates & tidycensus R package") +theme_void()polk_p
I needed help from ChatGPT to get the legend to dollars–here is the link to the assistance log. Please note the variable names changed.
Lets make the plot interactive.
plotly::ggplotly(polk_p)
racevars <-c(White ="P2_005N", Black ="P2_006N", Asian ="P2_008N", Hispanic ="P2_002N")### This is an interesting techinuqe pythonish polk_race <-get_decennial(geography ="tract",variables = racevars,state ="FL",county ="Polk County",geometry =TRUE,summary_var ="P2_001N", #this is the total population in the census tract. year =2020,sumfile ="pl")
Getting data from the 2020 decennial Census
Using the PL 94-171 Redistricting Data Summary File
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
This message is displayed once per session.
head(polk_race)
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -82.0564 ymin: 27.99637 xmax: -81.97668 ymax: 28.08173
Geodetic CRS: NAD83
# A tibble: 6 × 6
GEOID NAME variable value summary_value geometry
<chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
1 12105012004 Census Tra… White 1686 2556 (((-82.05595 28.03673, -…
2 12105012004 Census Tra… Black 147 2556 (((-82.05595 28.03673, -…
3 12105012004 Census Tra… Asian 42 2556 (((-82.05595 28.03673, -…
4 12105012004 Census Tra… Hispanic 555 2556 (((-82.05595 28.03673, -…
5 12105012001 Census Tra… White 2825 5798 (((-82.0564 28.08038, -8…
6 12105012001 Census Tra… Black 1460 5798 (((-82.0564 28.08038, -8…
polk_race_p <- polk_race %>%mutate(percent =100* (value / summary_value)) %>%ggplot(aes(fill = percent)) +facet_wrap(~variable) +geom_sf(color =NA) +theme_void() +scale_fill_viridis_c() +labs(fill ="% of population\n(2020 Census)",title ="Race Populatin Estimates", subtitle ="Polk County, Florida", caption ="Data source: US Census Bureau population estimates & tidycensus R package")
plotly::ggplotly(polk_race_p)
Conclusion
What may useful is to also show the Census Tract for additional exploration.
About
Kevin is a nonprofit data professional operating out of Lakeland, Florida.
My expertise is helping nonprofits collect, manage and analyze their program data.