Converting Shapefiles to GeoPackage with GeoPandas

Converting Shapefiles to GeoPackage with GeoPandas is a direct I/O operation: load the shapefile into a GeoDataFrame using gpd.readfile(), then export it…

Converting Shapefiles to GeoPackage with GeoPandas is a direct I/O operation: load the shapefile into a GeoDataFrame using gpd.read_file(), then export it with gdf.to_file(driver="GPKG"). For production workloads, explicitly specify engine="pyogrio" to leverage vectorized C-level reads, native GDAL 3.x compliance, and significantly faster throughput compared to legacy Fiona backends.

Production-Ready Conversion Script

The following function handles encoding normalization, SQLite-safe layer naming, and spatial index generation. It is designed for batch ETL, CLI wrappers, or serverless spatial pipelines.

python
import geopandas as gpd
import os
import logging
from pathlib import Path

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

def convert_shp_to_geopackage(
    input_shp: str | Path, 
    output_gpkg: str | Path, 
    layer_name: str | None = None, 
    engine: str = "pyogrio",
    encoding: str = "utf-8"
) -> None:
    input_path = Path(input_shp)
    output_path = Path(output_gpkg)
    
    if not input_path.exists():
        raise FileNotFoundError(f"Shapefile root not found: {input_path}")
        
    # 1. Load with explicit engine and encoding
    gdf = gpd.read_file(input_path, engine=engine, encoding=encoding)
    
    # 2. Sanitize layer name for SQLite/GPKG compliance
    if layer_name is None:
        layer_name = input_path.stem
    # Replace invalid characters and enforce lowercase
    layer_name = layer_name.replace(" ", "_").replace("-", "_").lower()
    
    # 3. Export to GeoPackage with R-tree spatial index
    gdf.to_file(
        output_path, 
        driver="GPKG", 
        layer=layer_name, 
        engine=engine,
        index=False  # don't write the DataFrame index as a column;
                     # GDAL builds the GPKG R-tree spatial index by default
    )
    logging.info(f"Converted {input_path.name}{output_path.name}:{layer_name} ({len(gdf)} features)")

# Example:
# convert_shp_to_geopackage("data/field_survey.shp", "data/survey_archive.gpkg")

Why Migrate from Shapefile to GeoPackage?

Shapefiles remain ubiquitous but suffer from architectural constraints that break modern data pipelines. GeoPackage resolves these by packaging geometry, attributes, and metadata into a single, standards-compliant SQLite database. Key advantages include:

  • Single-File Architecture: Eliminates the fragile .shp/.shx/.dbf/.prj quartet. No more missing projection files or corrupted attribute tables.
  • Size & Precision Limits: Shapefiles cap each .shp/.dbf component at 2 GB and truncate field names to 10 characters. GeoPackage supports terabyte-scale datasets and full Unicode identifiers.
  • Mixed Geometry & Transactions: Unlike Shapefiles, which enforce a single geometry type per file, GeoPackage allows mixed geometries and supports ACID transactions with row-level locking.
  • Native Spatial Indexing: GeoPackage automatically maintains an R-tree index, accelerating bounding-box queries and spatial joins without external sidecar files.

For teams building offline-first mobile applications or edge-deployed GIS tools, this consolidation reduces sync complexity and prevents silent schema drift during data handoffs.

Engine Selection & I/O Configuration

GeoPandas abstracts GDAL/OGR translation, but the underlying I/O engine dictates performance and compatibility.

  • pyogrio (Recommended): The modern default for GeoPandas ≥0.14. It uses vectorized NumPy arrays and bypasses Python object overhead, delivering 3–10× speedups on datasets exceeding 100k features. It also natively respects GDAL 3.x CRS handling and GeoPackage 1.3+ specifications.
  • fiona (Legacy Fallback): Still functional but slower due to row-by-row iteration. Use only when maintaining compatibility with older Python/GDAL stacks or when pyogrio encounters niche geometry type edge cases.

When configuring the conversion, always verify your environment’s GDAL version. The OGC GeoPackage Standard defines strict compliance requirements for metadata tables and spatial reference systems. GeoPandas handles most of this automatically, but explicit engine="pyogrio" ensures deterministic behavior across CI runners, Docker containers, and field laptops.

Integrating into Automated ETL Pipelines

Embedding spatial conversion into automated workflows requires more than a simple script call. Production pipelines should validate geometry, enforce CRS consistency, and handle batch processing efficiently.

python
def batch_convert_directory(shp_dir: str, gpkg_dir: str) -> None:
    Path(gpkg_dir).mkdir(parents=True, exist_ok=True)
    for shp in Path(shp_dir).glob("*.shp"):
        try:
            convert_shp_to_geopackage(shp, Path(gpkg_dir) / f"{shp.stem}.gpkg")
        except Exception as e:
            logging.error(f"Failed {shp.name}: {e}")

When scaling this pattern, memory management becomes critical. GeoPandas loads entire datasets into RAM. For multi-gigabyte shapefiles, read incrementally with pyogrio.open_arrow() (Arrow record batches) or page through the data using read_dataframe(..., skip_features=, max_features=), or leverage Dask-GeoPandas for distributed processing. Additionally, always validate coordinate reference systems before export. Mismatched or undefined CRS values will propagate into the GeoPackage metadata table, breaking downstream spatial joins.

Understanding how Python Integration & Database Workflows manage spatial I/O prevents silent failures during schema migration. By standardizing on GeoPackage as the canonical intermediate format, teams eliminate Shapefile fragmentation and streamline handoffs between Python data engineers, mobile developers, and GIS analysts. For deeper architectural patterns around spatial database orchestration, review the GeoPandas & GeoPackage Integration reference.

Troubleshooting Common Conversion Errors

SymptomRoot CauseResolution
DriverError: Could not open datasourceMissing .shx or corrupted .dbfEnsure all companion files exist in the same directory. Use ogrinfo to validate integrity.
UnicodeDecodeErrorLegacy Windows-1252 encoding in .dbfPass encoding="latin1" or encoding="cp1252" to read_file().
CRSWarning: No CRS definedMissing .prj fileAssign explicitly: gdf.set_crs("EPSG:4326", inplace=True) before export.
SQLite error: table already existsLayer name collision in target .gpkgGeoPackage supports multiple layers per file. Change layer_name or use mode="w" to overwrite the entire database.
pyogrio import failsMissing GDAL binaries or mismatched wheelsInstall via conda install -c conda-forge pyogrio gdal or use pip install pyogrio with a precompiled GDAL environment.

For comprehensive I/O parameter documentation, consult the official GeoPandas I/O Guide. When deploying to constrained edge devices, strip unnecessary columns before export using gdf[["geometry", "id", "status"]] to minimize database footprint and accelerate mobile sync operations.