Apache Sedona与GeoPandas/Shapely的互操作指南

最新推荐文章于 2025-06-27 09:14:08 发布

原创最新推荐文章于 2025-06-27 09:14:08 发布 · 389 阅读

9 ·

CC 4.0 BY-SA版权

Apache Sedona与GeoPandas/Shapely的互操作指南

前言

Apache Sedona是一个强大的空间数据分析系统，而GeoPandas和Shapely则是Python生态中广泛使用的空间数据处理库。本文将详细介绍如何在Apache Sedona中与这两个库进行互操作，帮助开发者高效处理空间数据。

版本兼容性说明

在使用互操作功能前，请特别注意版本兼容性：

Sedona 1.6.0之前版本仅支持Shapely 1.x
如需使用Shapely 2.x，请确保使用Sedona 1.6.0或更高版本
如果使用Sedona < 1.6.0，GeoPandas版本应≤0.11.1，Shapely版本应≤1.8.5

GeoPandas与Sedona互操作

从GeoPandas到Sedona DataFrame

我们可以轻松地将GeoPandas的GeoDataFrame转换为Sedona的DataFrame：

import geopandas as gpd
from sedona.spark import *

# 初始化Sedona上下文
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

# 使用GeoPandas读取shapefile
gdf = gpd.read_file("gis_osm_pois_free_1.shp")

# 转换为Sedona DataFrame并展示
sedona.createDataFrame(gdf).show()

输出结果将包含空间几何对象，例如：

+---------+----+-----------+--------------------+--------------------+
|   osm_id|code|     fclass|                name|            geometry|
+---------+----+-----------+--------------------+--------------------+
| 26860257|2422|  camp_site|            de Kroon|POINT (15.3393145...|

从Sedona DataFrame到GeoPandas

反向转换同样简单，我们可以将Sedona DataFrame转换为GeoPandas GeoDataFrame并进行可视化：

# 读取CSV数据并创建临时视图
counties = sedona.read.option("delimiter", "|").option("header", "true").csv("counties.csv")
counties.createOrReplaceTempView("county")

# 执行空间查询
counties_geom = sedona.sql("SELECT *, st_geomFromWKT(geom) as geometry from county")

# 转换为Pandas DataFrame再转为GeoDataFrame
df = counties_geom.toPandas()
gdf = gpd.GeoDataFrame(df, geometry="geometry")

# 绘制专题地图
gdf.plot(
    figsize=(10, 8),
    column="value",
    legend=True,
    cmap='YlOrBr',
    scheme='quantiles',
    edgecolor='lightgray'
)

Shapely与Sedona互操作

Sedona支持与Shapely的各种几何对象互转，包括：

| 几何类型 | 是否支持 | |--------------------|----------| | Point | ✓ | | MultiPoint | ✓ | | LineString | ✓ | | MultiLinestring | ✓ | | Polygon | ✓ | | MultiPolygon | ✓ | | GeometryCollection | ✓ |

基础设置

首先定义包含几何字段的Schema：

from pyspark.sql.types import IntegerType, StructField, StructType
from sedona.sql.types import GeometryType

schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("geom", GeometryType(), False)
])

各几何类型示例

1. 点(Point)对象

from shapely.geometry import Point

data = [[1, Point(21.0, 52.0)]]
gdf = sedona.createDataFrame(data, schema)
gdf.show()

2. 多点(MultiPoint)对象

from shapely.geometry import MultiPoint

data = [[1, MultiPoint([[19.511463, 51.765158], [19.446408, 51.779752]])]]
gdf = sedona.createDataFrame(data, schema)

3. 线(LineString)对象

from shapely.geometry import LineString

line = [(40, 40), (30, 30), (40, 20), (30, 10)]
data = [[1, LineString(line)]]
gdf = sedona.createDataFrame(data, schema)

4. 多线(MultiLineString)对象

from shapely.geometry import MultiLineString

line1 = [(10, 10), (20, 20), (10, 40)]
line2 = [(40, 40), (30, 30), (40, 20), (30, 10)]
data = [[1, MultiLineString([line1, line2])]]

5. 面(Polygon)对象

from shapely.geometry import Polygon

polygon = Polygon([
    [19.51121, 51.76426],
    [19.51056, 51.76583],
    [19.51216, 51.76599],
    [19.51280, 51.76448],
    [19.51121, 51.76426]
])
data = [[1, polygon]]

6. 多面(MultiPolygon)对象

from shapely.geometry import MultiPolygon

exterior_p1 = [(0, 0), (0, 2), (2, 2), (2, 0), (0, 0)]
interior_p1 = [(1, 1), (1, 1.5), (1.5, 1.5), (1.5, 1), (1, 1)]
exterior_p2 = [(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]

polygons = [Polygon(exterior_p1, [interior_p1]), Polygon(exterior_p2)]
data = [[1, MultiPolygon(polygons)]]

7. 几何集合(GeometryCollection)对象

from shapely.geometry import GeometryCollection, Point, LineString, Polygon

geoms = [
    Polygon(exterior_p1, [interior_p1]),
    Polygon(exterior_p2),
    Point(1, 1),
    LineString([(0, 0), (1, 1), (2, 2)])
]
data = [[1, GeometryCollection(geoms)]]