No results found

Your search did not match any results.

We suggest you try the following to help find what you’re looking for:

  • Check the spelling of your keyword search.
  • Use synonyms for the keyword you typed, for example, try “application” instead of “software.”
  • Try one of the popular searches shown below.
  • Start a new search.
Trending Questions
 

How to perform spatial data queries and geolocation analysis with GeoPandas

Use pandas, GeoPandas, and other Python libraries to manipulate and take actions on geospatial data.

By Yuli Vasiliev | June 2021


How to perform spatial data queries and geolocation analysis with GeoPandas

What does the Internet of Things (IoT) look like in the back end? IoT refers to the network of devices that are outfitted with sensors to collect data and that use software to communicate this data to a common server, and exchange that data with other devices, without human intervention.

One IoT scenario might have a transport company use drivers’ mobile phones to collect geolocation telemetry from the company’s vehicles through an app, without distracting the drivers. With that technology in place, the company could add functionality, such as a geofencing alert that automatically notifies a driver (or customers) when they enter or leave a certain area or neighborhood. The IoT software could also implement more advanced logic, such as monitoring the speed of a track within a tracked area or the dwell time, thus performing time-series analysis on spatial data.

Of course, the toolset you might choose to implement such a geofencing application depends on the programming language you’re using. If you’re a Python user, GeoPandas—an open source library combining the analytical power of the pandas library with spatial capabilities—is an ideal choice. This article illustrates how you might use two GeoPandas data structures, GeoDataFrame and GeoSeries, to run queries against spatial data and perform analysis on it.

A geofencing example

Geofencing means more than monitoring and analyzing restricted area movements. You might build a geofencing application that will send notifications to its users when they approach places of potential interest. For example, visitors of a metropolis, such as New York City, might benefit from an application that will notify them when they are near a place that is worth a visit, such as a museum or monument.

The implementation of such an application requires you to delineate the places of potential interest, defining a set of polygons on the map. The application is also assumed to constantly receive real-time coordinates of its users so that it can determine if a user is near or within one of the specified areas. In the context of this application, the current location of a user is considered a circle with a certain radius and centered at the point of the actual user location. The application checks to see whether the circle intersects any of the specified areas.

Note: Using a circle rather than just a point allows the application to determine not only when a user is within an area but also when the user is near it, to send a notification in advance, as illustrated in Figure 1.

A spatial object is coming close to a specified area.

Figure 1. A spatial object is coming close to a specified area.

The user of this tourist application might also want to know how much time was spent exploring each attraction or the distance walked during this time. To generate these statistics, the application is supposed to analyze the entire time series of the user’s movements. The rest of this article covers how you can implement spatial data queries and analysis operations for such an application in Python using GeoPandas and some other libraries designed to work with geospatial data. For comparison purposes, you’ll also see how you can implement these same tasks with SQL.

Preparing the working environment

To follow this article’s examples, you need to have Python 3.7 or newer installed on your machine. The examples also require the GeoPandas, Shapely, and geog libraries installed in your Python environment. You can install them with the pip command, as follows:


pip install geopandas
pip install geog

The Shapely library will be installed implicitly with the first pip command, since Shapely is one of the GeoPandas dependencies. You also need to have access to an instance of Oracle Database, which is used to create several sample tables by executing the geopandas_article.sql file located at the end of this article.

Coding the spatial objects

You need a way to define spatial objects of those New York landmarks programmatically. That means defining several polygons by connecting certain points specified by longitude and latitude coordinates. Also, you’ll need to define circles of a certain radius defined in meters, with the center specified in the geographic coordinate system.

The following code snippet creates two polygons and stores them in a GeoDataFrame (the primary data structure in GeoPandas). The first polygon encloses the location of the Museum of Modern Art (MoMA) in Manhattan, New York City. The second encloses the location of the Metropolitan Museum of Art of New York City.


import pandas as pd
import geopandas as gpd 
df = pd.DataFrame(
    {'Place': ['Museum of Modern Art (MoMA)', 'Metropolitan Museum of Art'],
     'Location': ['Midtown Manhattan, New York City', 'Central Park, New York City'],
     'Area': ['POLYGON ((-73.97773 40.76208, -73.97834 40.76144, -73.97723 40.76095, -73.97661 40.76162, -73.97773 40.76208))',
              'POLYGON ((-73.96440 40.78093, -73.96485 40.77850, -73.96271 40.77820, -73.96192 40.78087, -73.96440 40.78093))']})
gdf = gpd.GeoDataFrame(
    df, geometry=gpd.GeoSeries.from_wkt(df.Area, crs = 'epsg:4326'))

The next script defines a 100-meter-radius circle around a user’s location, described as a point. For this example, I’ve chosen a point close to MoMA.

Technically, the objective is to create a circle-like polygon defined by a certain number of points given along the circle. To create such a polygon, use NumPy’s linspace() function and then the propagate() function from the geog library. NumPy is a package for scientific and mathematical computing with Python.

In this example, linspace() generates a sequence of 10 evenly spaced numbers over the [0,360] interval (360 is the number of degrees in a circle), thus calculating the angles between 10 equidistant points on the circle. Then, given the latitude/longitude coordinates of the circle center point, propagate() calculates the coordinates for each of those points, which are then used to create the circle-like polygon.


import numpy as np
import geog
import shapely.geometry

point = shapely.geometry.Point([-73.97929, 40.76188])
num_points = 10
radius = 100  #radius in meters
angles = np.linspace(0, 360, num_points)
polygon_array = geog.propagate(point, angles, radius)
circle = shapely.geometry.Polygon(polygon_array)

The next step is to check if the newly created circle intersects with any of the areas stored in the gdf GeoDataFrame created earlier. For that, use the intersects() method of the gdf['geometry'] GeoSeries.

 
print(gdf['geometry'].intersects(circle))

The output should be

 
0    True
1    False

This output means the boundary or interior of the circle intersects in some way with those of the polygon found first in the gdf['geometry'] GeoSeries. Simply put, the user in that circle is near MoMA. This can be explicitly seen with the following code:


print(gdf[gdf['geometry'].intersects(circle)==True]['Place'].to_string(index=False))

The following is the output:


Museum of Modern Art (MoMA)

The same task in SQL. How might you arrive at the same result using a database schema containing the places table? The SDO_UTIL.CIRCLE_POLYGON() SQL function provides the solution by creating a polyline approximation of a circle in geodetic coordinate systems. This function can be invoked with several different sets of parameters. The simplest format requires only the following four parameters: center_longitude, center_latitude, radius, and arc_tolerance. To create the circle just created in Python, issue the following SQL query:


SELECT SDO_UTIL.CIRCLE_POLYGON(-73.97929, 40.76188, 100, 5) FROM DUAL;

Use the SDO_RELATE() SQL function to check whether the above circle intersects with any of the polygon areas stored in the places table.


SELECT p.place
    FROM places p
    WHERE SDO_RELATE(p.area,
            (SELECT SDO_UTIL.CIRCLE_POLYGON(-73.97929, 40.76188, 100, 5) FROM DUAL),
              'mask=anyinteract'
    ) = 'TRUE';

The query should return the following output:


PLACE
--------------------------------------------------
Museum of Modern Art (MoMA)

This information can be included in a notification message sent to the user who has approached the given place. The script in the following section illustrates how you can generate such messages against the time series of a user’s movements.

Processing the time series of spatial object movements

Imagine you are in New York, moving along West 53rd Street from 7th Avenue to Park Avenue (refer back to Figure 1). Your current spatial position is recorded every 10 seconds. The following script illustrates how to process a time series to check when you are near a place of potential interest. The script first puts the spatial data representing your movements into a GeoSeries and then checks each position for intersection with any of the areas found in the gdf GeoDataFrame.


#spatial data representing your movements
tm = ['20/05/21 11:20:30', '20/05/21 11:20:40', '20/05/21 11:20:50', '20/05/21 11:21:00']
lat = [40.762173257210215, 40.7618785089327, 40.76081063947411, 40.760245289872934]
long = [-73.97999636680773, -73.97929144267268, -73.97679709568447, -73.9753968165023]

#putting the spatial data of your movements into a GeoSeries
points=gpd.GeoSeries(gpd.points_from_xy(long, lat), index = tm, crs = 'epsg:4326')

#generating 100-meter-radius circles for each position and checking them for intersection with the areas of potential interest
num_points = 10
d = 100  # meters
angles = np.linspace(0, 360, num_points)

#defining a Series to be used for storing the chronicle of the user's presence in predefined areas
inarea=pd.Series(index = tm)
for idx, point in points.items():
  polygon_array = geog.propagate(point, angles, d)
  circle = shapely.geometry.Polygon(polygon_array)
  place = gdf[gdf['geometry'].intersects(circle)==True]['Place']
  if not place.empty:
    print(idx, "You're near", place.to_string(index=False)) 
    inarea.loc[idx] = place.to_string(index=False)

This is the result.


20/05/21 11:20:40 You're near Museum of Modern Art (MoMA)
20/05/21 11:20:50 You're near Museum of Modern Art (MoMA)

The same task in SQL. To return the same result set with a query to the article’s sample database, issue the following join statement against the places and movements tables.


SELECT TO_CHAR(m.tm,'dd/mm/yy hh24:mi:ss') time, 'You''re near' msg, p.place FROM places p 
INNER JOIN movements m
ON 
SDO_RELATE(p.area,
            (SELECT SDO_UTIL.CIRCLE_POLYGON(m.area, 100, 5) FROM DUAL),
              'mask=anyinteract'
           ) = 'TRUE';

Here is the output.


TIME		  MSG	      PLACE
----------------- ----------- --------------------------------------------------
20/05/21 11:20:40 You're near Museum of Modern Art (MoMA)
20/05/21 11:20:50 You're near Museum of Modern Art (MoMA)

Performing analysis operations on spatial data

GeoPandas is still pandas, meaning that everything you can do with pandas objects you can do with GeoPandas objects as well. For example, you can merge two GeoSeries into a single GeoDataFrame. You can also merge a GeoSeries with a Series, which produces a GeoDataFrame.

In the New York example, here is how to merge the points GeoSeries and inarea Series into a GeoDataFrame.

 
gdf_movements = pd.concat([points, inarea], axis=1).rename(columns={0:"geometry", 1:"inarea"})

The newly created GeoDataFrame will look like this.


                                        point                       inarea
20/05/21 11:20:30  POINT (-73.98000 40.76217)                          NaN
20/05/21 11:20:40  POINT (-73.97929 40.76188)  Museum of Modern Art (MoMA)
20/05/21 11:20:50  POINT (-73.97680 40.76081)  Museum of Modern Art (MoMA)
20/05/21 11:21:00  POINT (-73.97540 40.76025)                          NaN

You can then use this GeoDataFrame for further analysis of the article’s data. For example, you might want to calculate the approximate distance being covered in each area. For that, use the distance() method of GeoDataFrame, checking the distance of each point to a previous point. Before you can do that, however, you must re-project the representation of point geometries in the GeoDataFrame to the coordinate system specific to the desired geographic area, such as the New York Long Island EPSG:32118 metric coordinate system. If you want the results in feet instead of meters, specify epsg : 2263 instead.


gdf_movements = gdf_movements.to_crs('epsg:32118')

Check the distance of each point to a previous point with the gdf_movements.distance() method, sending the results to a new column.


gdf_movements['dist'] = gdf_movements.distance(gdf_movements.shift(1))

To view the results, print the newly created column along with the inarea column.


print(gdf_movements[['dist','inarea']])

Her are the results.


                         dist                       inarea
20/05/21 11:20:30         NaN                          NaN
20/05/21 11:20:40   67.927585  Museum of Modern Art (MoMA)
20/05/21 11:20:50  241.706323  Museum of Modern Art (MoMA)
20/05/21 11:21:00  133.871772                          NaN

Built on top of pandas, GeoPandas supports grouping and aggregating data, using the same methods. In the following code, perform grouping by the inarea column:


print(gdf_movements.groupby(inarea).sum())

This should give the approximate distance being covered in each area. Since there is only a single area in this example, there will be a single row in the output.


                                   dist
Museum of Modern Art (MoMA)  309.633908

The same task in SQL. Use the following code:


CREATE VIEW time_place_v AS
(SELECT TO_CHAR(m.tm,'dd/mm/yy hh24:mi:ss') time, m.area area, p.place place FROM places p 
RIGHT JOIN movements m
ON 
SDO_RELATE(p.area,
            (SELECT SDO_UTIL.CIRCLE_POLYGON(m.area, 100, 5) FROM DUAL),
              'mask=anyinteract'
           ) = 'TRUE');

SELECT place, SUM(prev_p) FROM
  (SELECT
      time, place, prev_p
   FROM time_place_v 
   MATCH_RECOGNIZE (
      ORDER BY time
      MEASURES
         NVL(SDO_GEOM.SDO_DISTANCE(area, PREV(area), 0.005), 0) AS prev_p
      ALL ROWS PER MATCH
      PATTERN (A)
      DEFINE
         A AS (1=1)
  ))
WHERE place IS NOT NULL
GROUP BY place;

The generated output should look as follows:


PLACE						   SUM(PREV_P)
-------------------------------------------------- -----------
Museum of Modern Art (MoMA)			    309.544797

Conclusion

GeoPandas is a powerful tool when it comes to querying and analyzing spatial data, such as time, location, and motion data from Internet of Things applications. Built on top of the pandas library for Python, GeoPandas lets you perform all those operations on data for which pandas has become so popular. Thus, GeoPandas supports grouping, filtering, aggregations, and analytics, using the same functions you can use with pandas data structures. Moreover, you can merge GeoPandas data structures with each other and pandas data structures, thus creating new data structures needed for further analysis.

The geopandas_article.sql file

Run this script in a SQL tool such as SQL*Plus or Oracle SQL Developer to create database objects for the article’s examples. Uncomment the following DROP TABLE commands when you rerun the script.


-- DROP TABLE places;
-- DROP TABLE movements;

CREATE TABLE places(
  place_id NUMBER PRIMARY KEY, 
  place VARCHAR2(50) not null,
  location VARCHAR2(100) not null,
  area SDO_GEOMETRY
);

INSERT INTO places VALUES(
  1,
  'Museum of Modern Art (MoMA)',
  'Midtown Manhattan, New York City',
  SDO_GEOMETRY(
    2003,
    8307,
    NULL,
    SDO_ELEM_INFO_ARRAY(1,1003,1), 
    SDO_ORDINATE_ARRAY(-73.97773, 40.76208, -73.97834, 40.76144, -73.97723, 40.76095, -73.97661, 40.76162, -73.97773, 40.76208))); 

INSERT INTO places VALUES(
  2,
  'Metropolitan Museum of Art',
  'Central Park, New York City',
  SDO_GEOMETRY(
    2003,
    8307,
    NULL,
    SDO_ELEM_INFO_ARRAY(1,1003,1), 
    SDO_ORDINATE_ARRAY(-73.96440, 40.78093, -73.96485, 40.77850, -73.96271, 40.77820, -73.96192, 40.78087, -73.96440, 40.78093)));

CREATE TABLE movements(
  tm DATE, 
  usr VARCHAR2(50) not null,
  area SDO_GEOMETRY
);

INSERT INTO movements VALUES(TO_DATE('20/05/21 11:20:30','dd/mm/yy hh24:mi:ss'), 'john', SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-73.97999, 40.76217, NULL),NULL,NULL)); 

INSERT INTO movements VALUES(TO_DATE('20/05/21 11:20:40','dd/mm/yy hh24:mi:ss'), 'john', SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-73.97929, 40.76187, NULL),NULL,NULL)); 

INSERT INTO movements VALUES(TO_DATE('20/05/21 11:20:50','dd/mm/yy hh24:mi:ss'), 'john', SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-73.97679, 40.76081, NULL),NULL,NULL)); 

INSERT INTO movements VALUES(TO_DATE('20/05/21 11:21:00','dd/mm/yy hh24:mi:ss'), 'john', SDO_GEOMETRY(2001, 8307, SDO_POINT_TYPE(-73.97539, 40.760245, NULL),NULL,NULL)); 

COMMIT; 

Dig deeper

Illustration: Wes Rowell

Yuli Vasiliev is a programmer, freelance writer, and consultant. He is the author of Natural Language Processing with Python and spaCy.