Entering Geospatial Machine Learning with GeoPandas
The field of artificial intelligence (AI) has progressed rapidly in recent years, matching or, in some cases, even surpassing human accuracy at tasks such as image recognition, reading comprehension, and translating text. The intersection of AI and GIS is creating massive opportunities that weren’t possible before. AI, machine learning, and deep learning are helping us make the world better by helping, for example, to increase crop yield through precision agriculture, understand crime patterns, and predict when the next big storm will hit and being better equipped to handle it.
What is Geopandas
GeoPandas is open-sourced library and enables the use and manipulation of geospatial data in Python. It extends the common data type used in pandas to allow for the many and unique geometric operations: GeoSeries and GeoDataFrame. Geopandas is also built on top of shapely for its geometric operation; its underlying datatype allows Geopandas to run blazingly fast and is appropriate for many machine learning pipelines that require large geospatial datasets.
Geospatial concepts
Geospatial common datatypes
There are some common geospatial datatypes that you need to be familiar with: Shapefile (.shp) and GeoJSON (.geojson).
Shapefile is a vector data format that is developed and maintained mostly by a company called ESRI. It stores important geospatial information including the topology, shape geometry, etc.
GeoJSON, similar to JSON, stores geometry information (coordinates, projection, etc) in addition to your typical attributes relevant to the object (index, name, etc).
Once you load either of these data formats using Geopandas, the library will create a DataFrame with the additional geometry column.
Introduction to basic geometric attributes
Now that we have some ideas of geospatial data and how to import our very first one using Geopandas, we can perform some basic methods to further cement our understanding.
Area
From the geometry column, we can measure the areas (if they are of type POLYGON or MULTIPOLYGON: since we can’t measure the area of lines or points)
Polygon Boundary
Since our geometry is of type polygon or multipolygon, we can extract out the line coordinates of the objects. This can be useful when, say, we want to measure the perimeter of the polygon objects, etc.
Centroid
If you want to find the centroid point of the given polygons, you can call the gdf attribute.
Distance
Now that we already know the positions of the centroids and wanted to find out where the distance between points, this can be done easily using the distance() method.
You can then perform many spatial aggregates function to find out the mean, max, or min distances.