Converting Shapefiles to GeoPackage with GeoPandas
Converting Shapefiles to GeoPackage with GeoPandas is a direct I/O operation: load the shapefile into a GeoDataFrame using gpd.readfile(), then export it…
Converting Shapefiles to GeoPackage with GeoPandas is a direct I/O operation: load the shapefile into a GeoDataFrame using gpd.read_file(), then export it with gdf.to_file(driver="GPKG"). For production workloads, explicitly specify engine="pyogrio" to leverage vectorized C-level reads, native GDAL 3.x compliance, and significantly faster throughput compared to legacy Fiona backends.
Production-Ready Conversion Script
The following function handles encoding normalization, SQLite-safe layer naming, and spatial index generation. It is designed for batch ETL, CLI wrappers, or serverless spatial pipelines.
import geopandas as gpd
import os
import logging
from pathlib import Path
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
def convert_shp_to_geopackage(
input_shp: str | Path,
output_gpkg: str | Path,
layer_name: str | None = None,
engine: str = "pyogrio",
encoding: str = "utf-8"
) -> None:
input_path = Path(input_shp)
output_path = Path(output_gpkg)
if not input_path.exists():
raise FileNotFoundError(f"Shapefile root not found: {input_path}")
# 1. Load with explicit engine and encoding
gdf = gpd.read_file(input_path, engine=engine, encoding=encoding)
# 2. Sanitize layer name for SQLite/GPKG compliance
if layer_name is None:
layer_name = input_path.stem
# Replace invalid characters and enforce lowercase
layer_name = layer_name.replace(" ", "_").replace("-", "_").lower()
# 3. Export to GeoPackage with R-tree spatial index
gdf.to_file(
output_path,
driver="GPKG",
layer=layer_name,
engine=engine,
index=False # don't write the DataFrame index as a column;
# GDAL builds the GPKG R-tree spatial index by default
)
logging.info(f"Converted {input_path.name} → {output_path.name}:{layer_name} ({len(gdf)} features)")
# Example:
# convert_shp_to_geopackage("data/field_survey.shp", "data/survey_archive.gpkg")
Why Migrate from Shapefile to GeoPackage?
Shapefiles remain ubiquitous but suffer from architectural constraints that break modern data pipelines. GeoPackage resolves these by packaging geometry, attributes, and metadata into a single, standards-compliant SQLite database. Key advantages include:
- Single-File Architecture: Eliminates the fragile
.shp/.shx/.dbf/.prjquartet. No more missing projection files or corrupted attribute tables. - Size & Precision Limits: Shapefiles cap each
.shp/.dbfcomponent at 2 GB and truncate field names to 10 characters. GeoPackage supports terabyte-scale datasets and full Unicode identifiers. - Mixed Geometry & Transactions: Unlike Shapefiles, which enforce a single geometry type per file, GeoPackage allows mixed geometries and supports ACID transactions with row-level locking.
- Native Spatial Indexing: GeoPackage automatically maintains an R-tree index, accelerating bounding-box queries and spatial joins without external sidecar files.
For teams building offline-first mobile applications or edge-deployed GIS tools, this consolidation reduces sync complexity and prevents silent schema drift during data handoffs.
Engine Selection & I/O Configuration
GeoPandas abstracts GDAL/OGR translation, but the underlying I/O engine dictates performance and compatibility.
pyogrio(Recommended): The modern default for GeoPandas ≥0.14. It uses vectorized NumPy arrays and bypasses Python object overhead, delivering 3–10× speedups on datasets exceeding 100k features. It also natively respects GDAL 3.x CRS handling and GeoPackage 1.3+ specifications.fiona(Legacy Fallback): Still functional but slower due to row-by-row iteration. Use only when maintaining compatibility with older Python/GDAL stacks or whenpyogrioencounters niche geometry type edge cases.
When configuring the conversion, always verify your environment’s GDAL version. The OGC GeoPackage Standard defines strict compliance requirements for metadata tables and spatial reference systems. GeoPandas handles most of this automatically, but explicit engine="pyogrio" ensures deterministic behavior across CI runners, Docker containers, and field laptops.
Integrating into Automated ETL Pipelines
Embedding spatial conversion into automated workflows requires more than a simple script call. Production pipelines should validate geometry, enforce CRS consistency, and handle batch processing efficiently.
def batch_convert_directory(shp_dir: str, gpkg_dir: str) -> None:
Path(gpkg_dir).mkdir(parents=True, exist_ok=True)
for shp in Path(shp_dir).glob("*.shp"):
try:
convert_shp_to_geopackage(shp, Path(gpkg_dir) / f"{shp.stem}.gpkg")
except Exception as e:
logging.error(f"Failed {shp.name}: {e}")
When scaling this pattern, memory management becomes critical. GeoPandas loads entire datasets into RAM. For multi-gigabyte shapefiles, read incrementally with pyogrio.open_arrow() (Arrow record batches) or page through the data using read_dataframe(..., skip_features=, max_features=), or leverage Dask-GeoPandas for distributed processing. Additionally, always validate coordinate reference systems before export. Mismatched or undefined CRS values will propagate into the GeoPackage metadata table, breaking downstream spatial joins.
Understanding how Python Integration & Database Workflows manage spatial I/O prevents silent failures during schema migration. By standardizing on GeoPackage as the canonical intermediate format, teams eliminate Shapefile fragmentation and streamline handoffs between Python data engineers, mobile developers, and GIS analysts. For deeper architectural patterns around spatial database orchestration, review the GeoPandas & GeoPackage Integration reference.
Troubleshooting Common Conversion Errors
| Symptom | Root Cause | Resolution |
|---|---|---|
DriverError: Could not open datasource | Missing .shx or corrupted .dbf | Ensure all companion files exist in the same directory. Use ogrinfo to validate integrity. |
UnicodeDecodeError | Legacy Windows-1252 encoding in .dbf | Pass encoding="latin1" or encoding="cp1252" to read_file(). |
CRSWarning: No CRS defined | Missing .prj file | Assign explicitly: gdf.set_crs("EPSG:4326", inplace=True) before export. |
SQLite error: table already exists | Layer name collision in target .gpkg | GeoPackage supports multiple layers per file. Change layer_name or use mode="w" to overwrite the entire database. |
pyogrio import fails | Missing GDAL binaries or mismatched wheels | Install via conda install -c conda-forge pyogrio gdal or use pip install pyogrio with a precompiled GDAL environment. |
For comprehensive I/O parameter documentation, consult the official GeoPandas I/O Guide. When deploying to constrained edge devices, strip unnecessary columns before export using gdf[["geometry", "id", "status"]] to minimize database footprint and accelerate mobile sync operations.