The Cloud-Native Geospatial Forum (CNG) features an interesting interview with Kyle Barron, cloud Engineer at Development Seed and, among other roles, developer behind the Python-based “big data”-capable geovisualization library Lonboard1. The interview was published back in December, but if you haven’t read it yet, it is still well worth your time.
In the interview, Kyle covers and touches upon, among others:
- Parquet and GeoParquet and their characteristics as well as suitable use-cases for these formats
- GeoArrow, a very fast and efficient in-memory representation of geospatial data
- stac-geoparquet, a converter between STAC items, GeoParquet and other formats
- Apache Iceberg, an open table format for analytics applications
These days, Parquet and GeoParquet are talked about quite often among geospatial professionals. But take note also of GeoArrow. While GeoArrow plays particularly well with GeoParquet, it is not tied to that format and can be used to improve geospatial workflows that involve other formats, too.
From Kyle:
Historically, GeoPandas and GDAL didn’t have a way to share a collection of data at a binary level. So GDAL (via the fiona driver) would effectively create GeoJSON Python objects for GeoPandas to consume. This is horribly inefficient. Since GDAL 3.6, GDAL has supported reading into GeoArrow and since GDAL 3.8 it has supported writing from GeoArrow.
GDAL’s adoption of GeoArrow made it 23x faster to read FlatGeobuf and GeoPackage into GeoPandas.
Be it through GDAL or dozens of other tools such as FME: You may very well have already benefitted from the work around #cloudnativegeo and, in particular, GeoArrow.
Footnotes
As to the name: you know, like “longboard”, but with “lon(gitude)”↩︎