This vignette shows a full spatial epidemiology workflow using
pk_neighbors() to build spatial weights, test for spatial
autocorrelation, and identify local clusters across Pakistan’s
administrative districts.
Build spatial weights
pk_neighbors() returns three elements in a named
list:
-
w$nb: thespdepneighbours list -
w$listw: row-standardised spatial weights, ready forspdep -
w$data: thesfobject used to build the weights (see note below)
districts <- get_districts()
w <- pk_neighbors(districts)Always use
w$datafor subsequent analysis and mapping, not the originaldistrictsobject. This ensures row alignment between the spatial weights and your data, particularly when units have been excluded via thedisputedargument.
Disputed unit handling
Gilgit-Baltistan (GB) and Azad Jammu & Kashmir (AJK) are present
in the OCHA/HDX boundary data and share polygon edges with KP and
Punjab. Under the default (disputed = "include") they
participate in the neighbour graph as normal administrative units.
The disputed argument lets analysts explicitly control
whether these units enter the spatial weights structure, which is useful
for sensitivity analysis when results may be influenced by their
inclusion:
# Default: all units included
w_all <- pk_neighbors(districts)
# Exclude both GB and AJK
w_excl <- pk_neighbors(districts, disputed = "exclude_both")
# Exclude only Gilgit-Baltistan
w_no_gb <- pk_neighbors(districts, disputed = "exclude_gb")
# Exclude only Azad Jammu & Kashmir
w_no_ajk <- pk_neighbors(districts, disputed = "exclude_ajk")When any units are excluded, w$data contains the
subsetted sf object with those units removed. Always pass
w$data to downstream functions:
w_excl <- pk_neighbors(districts, disputed = "exclude_both")
# Use w_excl$data — not the original districts object
moran.test(w_excl$data$area_km2, w_excl$listw)Global Moran’s I (synthetic data)
The example below generates spatially autocorrelated synthetic data
to demonstrate the workflow. Replace
districts$synthetic_rate with your own variable — disease
incidence rates, prevalence estimates, and so on.
set.seed(2023)
n <- nrow(w$data)
rho <- 0.6
W_mat <- listw2mat(w$listw)
epsilon <- rnorm(n, mean = 50, sd = 10)
y <- as.numeric(solve(diag(n) - rho * W_mat) %*% epsilon)
w$data$synthetic_rate <- y
# Test for spatial autocorrelation
moran_result <- moran.test(w$data$synthetic_rate, w$listw)
print(moran_result)A significant Moran’s I indicates that districts with similar rates tend to cluster spatially rather than being arranged at random.
Local Indicators of Spatial Association (LISA)
LISA identifies local clusters (High-High, Low-Low) and spatial outliers:
lisa <- localmoran(w$data$synthetic_rate, w$listw)
w$data$lisa_i <- lisa[, 1] # local I statistic
w$data$lisa_p <- lisa[, 5] # p-value
w$data <- w$data |>
mutate(
cluster = case_when(
lisa_p > 0.05 ~ "Not significant",
synthetic_rate > mean(synthetic_rate) & lisa_i > 0 ~ "High-High",
synthetic_rate < mean(synthetic_rate) & lisa_i > 0 ~ "Low-Low",
lisa_i < 0 ~ "Outlier",
TRUE ~ "Other"
)
)
ggplot2::ggplot(w$data) +
ggplot2::geom_sf(ggplot2::aes(fill = cluster)) +
ggplot2::scale_fill_manual(values = c(
"High-High" = "#d7191c",
"Low-Low" = "#2c7bb6",
"Outlier" = "#fdae61",
"Not significant" = "#f0f0f0",
"Other" = "#cccccc"
)) +
ggplot2::theme_void() +
ggplot2::labs(title = "LISA Cluster Map", fill = "Cluster type")Hotspot detection (Getis-Ord Gi*)
Getis-Ord Gi* identifies statistically significant hotspots and coldspots based on the concentration of high or low values:
gi_star <- localG(w$data$synthetic_rate, w$listw)
w$data$gi_star <- as.numeric(gi_star)
w$data$hotspot <- case_when(
w$data$gi_star > 1.96 ~ "Hotspot",
w$data$gi_star < -1.96 ~ "Coldspot",
TRUE ~ "Not significant"
)
pk_map(w$data, fill = "hotspot", title = "Hotspots (p < 0.05)")Sensitivity analysis: disputed units
To assess how much your results depend on the inclusion of GB and AJK, compare Moran’s I under each treatment:
w_all <- pk_neighbors(districts)
w_excl <- pk_neighbors(districts, disputed = "exclude_both")
# Attach the same variable to each subsetted dataset before comparing
moran_all <- moran.test(w_all$data$area_km2, w_all$listw)
moran_excl <- moran.test(w_excl$data$area_km2, w_excl$listw)
cat("Moran's I (all units): ", round(moran_all$estimate[1], 4), "\n")
cat("Moran's I (GB + AJK excl):", round(moran_excl$estimate[1], 4), "\n")If results differ materially, report both and document your choice of boundary treatment explicitly in your methods section.
Complete workflow
# 1. Get data
districts <- get_districts()
# 2. Build spatial weights
w <- pk_neighbors(districts)
# or, for sensitivity: pk_neighbors(districts, disputed = "exclude_both")
# 3. Attach your case data — always join to w$data, not districts
# w$data <- pk_join(w$data, your_case_data, by = "district_code")
# 4. Calculate rates
# w$data <- w$data |>
# mutate(rate = cases / population * 100000)
# 5. Test for spatial autocorrelation
# moran.test(w$data$rate, w$listw)
# 6. Identify clusters
# lisa <- localmoran(w$data$rate, w$listw)
# w$data$lisa_cluster <- attr(lisa, "quadr")$mean
# 7. Map results
# pk_map(w$data, fill = "lisa_cluster")References
- Bivand, R. S., Pebesma, E., & Gomez-Rubio, V. (2013). Applied spatial data analysis with R. Springer.
- OCHA Pakistan. (2023). COD-AB Administrative Boundaries. https://data.humdata.org/dataset/cod-ab-pak