In a recent post (see below), I emphasized Python, SQL, and Cloud as core skills for GIS professionals today. Yes I still think these are critical skills but there is one thing I would fix about how I positioned that.
However, some pointed out that I seemed to overlook the fundamentals: spatial thinking, methodologies, and the foundational practices that underpin our field.
Let me assure you, these are integral to everything we do. I guess I sort of assume that there is some knowledge of those things in the folks that take the time to read these newsletters and posts, but I think it is important to talk about them and show how you can learn them even if you didn’t go through a traditional geospatial education.
Let’s start with spatial thinking
I think the best way to start the section is to start with the fundamentals. So let's start with a quote that many of us will already know, but bears repeating:
Everything is related to everything else, but near things are more related than distant things
However, do you think that Tobler's second rule is also quite relevant here as well, and one that is not spoken of enough and actually lays a foundation for many of the spatial analyses that we perform on a regular basis.
The phenomenon external to a geographic area of interest affects what goes on inside.
Of course there are other laws of geography or proposed laws of geography that have some applicability in different cases, but for me, these two stand apart. They are time tested and apply to nearly every spatial analysis that we will discuss going forward.
At its core, spatial thinking is about understanding how objects relate to each other in space. It involves analyzing patterns, proximities, and contexts. Concepts like scale, topology, and spatial relationships form the bedrock of geospatial analysis. For example:
- Proximity Analysis: How close are certain populations to essential services?
- Spatial Relationships: Which areas are most vulnerable to flooding based on terrain and rainfall patterns?
These are the kinds of questions spatial thinking empowers us to answer. It’s about seeing the world not just as data points but as interconnected systems. This is integral in anything you do working with spatial data, understanding how you work with spatial join, spatial relationships, intersections, overlays, and what types of data you use and the spatial relationships you apply to them.
This video from Dr. Luc Anselin from the University of Chicago provides some of the fundamental aspects of spatial analysis within it, and I highly recommend to anyone who is trying to understand this topic in deeper context.
Some other key points from this video that are things I think about quite often when I am approaching a problem:
- Spatial Context: Understanding the effect that your neighbors have on an observation, and vice versa
- Spatial Support Problem: where scales of data do not match (zip codes and block groups)
- Spatial Scale of Observations: behavior does not match the unit of observation (you know what neighborhood you live in but not with block group)
- Spatial Spillover: the activity of one location will impact the costs of other locations (closing a road will have an increased cost for a distribution center)
- Spatial Multiplier: a successful store will not just impact that store, but the nearby stores as well
- Spatial Decay: Observations change and decay as you move away from an observation
While there are many more things you can think about and different models and theories, these form the foundation of spatial thinking for me.
While spatial thinking gives us the questions, modern tools help us answer them. Python, SQL, and cloud technologies make it possible to operationalize spatial concepts at a scale and speed previously unimaginable.
Putting it into practice with code
The marriage of modern tools like Python, SQL, and cloud computing with traditional GIS methods provides an unparalleled opportunity to elevate spatial analysis. To bridge the gap between methods and tools, we must understand how the foundational principles of GIS can be expressed through modern analytical practices.
Spatial Thinking as the Core
Spatial thinking is the foundation of GIS. Concepts like proximity, connectivity, scale, and patterns are integral to spatial analysis. The methods developed over decades, such as spatial joins, buffering, and network analysis, are still relevant—but modern tools allow us to scale and automate these processes. Consider a classic example:
SELECT shelter_id,
COUNT(*) AS people_within_range
FROM shelters
JOIN population_points
ON ST_DWithin(shelters.geom, population_points.geom, 5000)
GROUP BY shelter_id;
This query, using Spatial SQL, not only captures the method but scales it to large datasets with hundreds of thousands of records. However this is quite simple. Here is another example query of finding polygons that touch a target polygon and creating a weight based on the distance between the polygon centroids:
WITH neighbor_analysis AS (
SELECT
target.id AS target_id,
neighbor.id AS neighbor_id,
ST_Distance(ST_Centroid(target.geom), ST_Centroid(neighbor.geom)) AS distance,
1 / NULLIF(ST_Distance(ST_Centroid(target.geom), ST_Centroid(neighbor.geom)), 0) AS distance_weight
FROM
polygons AS target
JOIN
polygons AS neighbor
ON
ST_Touches(target.geom, neighbor.geom)
WHERE
target.id = :target_polygon_id -- Replace with the ID of your target polygon
)
SELECT
target_id,
neighbor_id,
distance,
distance_weight
FROM
neighbor_analysis
ORDER BY
distance ASC;
The methods we learn in GIS—such as spatial autocorrelation or interpolation—can and should be translated into modern tools. Let’s consider two examples:
-
Spatial Autocorrelation
- The principle: Understanding how spatial data is related across a region. For instance, Moran's I measures spatial clustering.
from pysal.explore.esda.moran import Moran
from pysal.lib import weights
import geopandas as gpd
gdf = gpd.read_file("data.shp")
w = weights.Queen.from_dataframe(gdf)
moran = Moran(gdf["attribute"], w)
print(moran.I)
-
Interpolation
- The principle: Estimating unknown values at specific locations based on surrounding data points.
from scipy.interpolate import griddata
import numpy as np
points = np.array([[x, y] for x, y in zip(data['x'], data['y'])])
values = data['value']
grid_x, grid_y = np.mgrid[0:100:100j, 0:100:100j]
grid_z = griddata(points, values, (grid_x, grid_y), method='cubic')
By embedding these principles into reproducible code, you ensure that your methods are not only accurate but scalable.
Methods in the Context of Big Data
With modern data scales, the need for optimized algorithms and frameworks becomes essential. Here’s how the fundamentals integrate into big data contexts:
-
Spatial Joins at Scale:
- Traditional GIS tools handle spatial joins efficiently for small datasets, but larger datasets demand systems like PostGIS or cloud-based databases (e.g., BigQuery, Wherobots, DuckDB).
SELECT a.id, COUNT(b.id)
FROM large_buildings a
JOIN population_points b
ON ST_Intersects(a.geom, b.geom)
GROUP BY a.id;
# Define the target and neighbor tables
target_table = "buildings"
neighbor_table = "points_of_interest"
# Perform ST_KNN join with k nearest neighbors
knn_df = db.table(target_table).alias("target").join(
db.table(neighbor_table).alias("neighbor"),
f.expr("ST_KNN(target.geometry, neighbor.geometry, 5)"), # k = 5
"inner"
).select(
f.col("target.id").alias("target_id"),
f.col("neighbor.id").alias("neighbor_id"),
f.col("neighbor.geometry"),
f.expr("ST_Distance(target.geometry, neighbor.geometry)").alias("distance")
)
# Show results
knn_df.show()
-
Network Analysis for Urban Planning:
SELECT *
FROM pgr_dijkstra(
'SELECT id, source, target, cost FROM road_network',
start_vertex, end_vertex
);
if region is not None:
places_df = places_df.filter(ST_Intersects(ST_GeomFromText(f.lit(region)), f.col("geometry"))).repartition(100)
hexes_df = (
places_df
.groupBy(f.col("h3Cell"))
.agg(f.count("*").alias("num_places"))
.withColumn("geometry", ST_H3ToGeom(f.array(f.col("h3Cell")))[0])
)
# Generate Getis-Ord Gi* statistic
gi_df = g_local(
add_binary_distance_band_column(
hexes_df,
neighbor_search_radius_degrees,
include_self=True,
),
"num_places",
"weights",
star=True
).cache()
# Show results
gi_df.drop("weights", "h3Cell", "geometry").orderBy(f.col("P").asc()).show()
Why This is Liberating
The true freedom of modern GIS lies in its capacity to align methods with tools that scale, automate, and integrate. By expressing methods as code:
-
Reproducibility:
- Once written, a script or query can be reused or adapted, reducing manual effort.
-
Scalability:
- Operations that once required manual intervention or limited tools can now handle millions of records in seconds.
-
Interoperability:
- Working with SQL or Python integrates spatial thinking with broader data workflows, eliminating silos.
From Fundamentals to Applications: Scaling Spatial Analytics
As we move from the core methods and principles to practical applications, the focus shifts to bridging the gap between theoretical understanding and real-world execution. This involves not only applying GIS methods but also integrating them seamlessly into workflows that leverage modern tools for enhanced performance and scalability.
Similarly, moving beyond desktop GIS to cloud environments enables scalability for large datasets. For instance, spatial joins involving billions of records can be efficiently handled using cloud-based platforms for parallel computing. SQL queries can process and analyze spatial relationships at scales that were previously impractical with desktop tools, significantly enhancing the efficiency of large-scale projects.
Case Study: Urban Heat Island Effect
Understanding urban heat islands exemplifies the intersection of traditional methods and modern tools. By combining remotely sensed data, such as satellite imagery for land surface temperatures, with in-situ measurements from weather stations, analysts can explore temperature patterns across urban landscapes.
The analysis workflow begins with raster processing, where thermal imagery is processed using Python libraries like Rasterio. This data can then be complemented by vector overlay analysis, using GeoPandas to integrate weather station point data with thermal rasters. The outcome of such an analysis is the generation of heat maps that visualize temperature anomalies, providing critical insights for urban planning and mitigation strategies.
While modern tools offer immense possibilities, they come with challenges that require careful attention. One major challenge is performance optimization. SQL queries or Python scripts can become bottlenecks, especially with large datasets. Solutions like using indexes in databases such as PostGIS or employing optimized libraries like Dask or Apache Sedona for parallel processing can mitigate these issues effectively.
Interoperability is another common challenge, as integrating diverse tools—from desktop GIS to cloud systems—often involves compatibility issues. Standardizing workflows with interoperable file formats like GeoParquet or Cloud Optimized GeoTIFF ensures smooth transitions between tools and platforms.
A third challenge lies in the skill gap among GIS professionals, many of whom may lack coding expertise. Tools with graphical interfaces, such as QGIS, offer a user-friendly entry point. These can be combined with Python scripting to gradually introduce coding workflows, empowering GIS professionals to expand their capabilities.
I think that it is important to have a understanding of how spatial analysis and spatial relationships are integral in how you analyze the world. There will always be a need for and there should be a strong basis for academic research and understanding around these different statistical methods. If we didn't have them, these wouldn't exist. If we didn't have people thinking about these problems, we wouldn't be applying them to the technology we use.
I do however believe that there is an increasing amount of open information that you can use to understand these different techniques. Of course, going to a college program will help accelerate them, but if you're simply looking to practice the technology or apply this to your own technical practice, you can get a baseline understanding of how these things work.
You should, however, understand how these things work if you are doing this type of work. Understanding how the world relates to itself in understanding how things are right in space is a core tenet of geography, one that unfortunately has becoming decreasingly common within our educational system.
So I hope that at least it is your primer into some of these things and helps you understand how to think about them if they're coming from the technology perspective. And if you're not in, this is a refresher. Well, I hope you enjoyed it nonetheless. With that, we can get back to more regular programming on how to perform technical model GIS.