Writing ESRI File Geodatabase text fields with fixed length using Python

Question 1

I am reading and writing a layer to an ESRI File GDB that contains text fields. Standard text fields are stored as objects by GeoPandas. When the GDB is read the text fields have an unlimited length (or 65536 to be exact). However, I want to write the file so it has a field width character limit of 255 or less.

I have tried several solutions such as:

gdf = gdf.convert_dtypes(convert_string=True)
gdf["text_col"] = gdf["text_col"].astype(str)
gdf["text_col"] = pd.Series(df["text_col"], dtype="S255")

The only solution so far that changed the field length is to convert the column to "S255" dtype. However this writes the column as a byte-string resulting in the addition of b'' in the column values, making it unsuitable for analysis.

So how can I write text columns using GeoPandas to an ESRI FileGDB layer with a dtype of char(255) (or similar) that is visible when inspecting the field metadata in QGIS or ArcGIS.

GeoPandas version: 1.0.1

Snippet to replicate.

import geopandas as gpd
from shapely.geometry import Point
data = {
 'name': ['Location 1', 'Location 2', 'Location 3'],
 'description': [
 'This is a description for location 1.',
 'This is a description for location 2.',
 'This is a description for location 3.'
 ],
 'geometry': [Point(4.895168, 52.370216), Point(4.904139, 52.367573), Point(4.899431, 52.379189)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
gdf.to_file("test.gdb", driver='OpenFileGDB')

Question 2

"text", to FGDB, is CLOB (2GiB limit). All varchar fields are variable length, not "fixed".

Question 3

To follow up on what has already been stated in the comments, the file geodatabase text field is implemented as varchar/nvarchar. There is no more space used up by a 65536-length field than a 255-length field (assuming text itself doesn't go beyond 255), so why bother forcing it back to 255? There is nothing special about Esri's default being 255, it is mostly an interoperability relic from the past.

Question 4

You can control the output datatypes by passing a schema to gdf.to_file.

schema: dict, default None

If specified, the schema dictionary is passed to Fiona to better control how the file is written. If None, GeoPandas will determine the schema based on each column’s dtype. Not supported for the "pyogrio" engine.

This works in geopandas < 1.0 as the default io engine is fiona.

So you can specify your text field in the schema as str:255 to limit the field width. More info on fiona schemas here.

However, the default io engine in geopandas >=1.0 is pyogrio which doesn't support schemas.

So as you are using geopandas >=1.0 you need to specify you want to use the fiona io engine in your script using either

gpd.options.io_engine = "fiona" to set it for the entire script,
or you can pass the engine="fiona" to gdf.to_file if you want to use pyogrio for most cases (i.e because of its better performance) and only use fiona when you need to specify a schema.

Here is an example:

import geopandas as gpd
from shapely.geometry import Point
gpd.options.io_engine = "fiona" # sets it for the whole script
data = {
 'name': ['Location 1', 'Location 2', 'Location 3'],
 'description': [
 'This is a description for location 1.',
 'This is a description for location 2.',
 'This is a description for location 3.'
 ],
 'geometry': [Point(4.895168, 52.370216), Point(4.904139, 52.367573), Point(4.899431, 52.379189)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
schema = gpd.io.file.infer_schema(gdf)
for col, dtype in schema["properties"].items():
 if dtype == "str":
 schema["properties"][col] = "str:255"
gdf.to_file("test.gdb", layer="test", driver='OpenFileGDB', schema=schema)
# Or
# gdf.to_file("test.gdb", layer="test", engine="fiona", driver='OpenFileGDB', schema=schema)

Checking output with GDAL:

ogrinfo -so -al test.gdb
INFO: Open of `test.gdb'
 using driver `OpenFileGDB' successful.
Layer name: test
Geometry: Point
Feature Count: 3
Extent: (4.895168, 52.367573) - (4.904139, 52.379189)
Layer SRS WKT:
GEOGCRS["WGS 84",
snip...
 ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
FID Column = OBJECTID
Geometry Column = SHAPE
name: String (255.0)
description: String (255.0)

Question 5

This worked! Thanks a lot.

user2856 user2856 73.7k7 gold badges123 silver badges207 bronze badges · Accepted Answer · 2024-12-19 10:06:08Z

You can control the output datatypes by passing a schema to gdf.to_file.

schema: dict, default None

If specified, the schema dictionary is passed to Fiona to better control how the file is written. If None, GeoPandas will determine the schema based on each column’s dtype. Not supported for the "pyogrio" engine.

This works in geopandas < 1.0 as the default io engine is fiona.

So you can specify your text field in the schema as str:255 to limit the field width. More info on fiona schemas here.

However, the default io engine in geopandas >=1.0 is pyogrio which doesn't support schemas.

So as you are using geopandas >=1.0 you need to specify you want to use the fiona io engine in your script using either

gpd.options.io_engine = "fiona" to set it for the entire script,
or you can pass the engine="fiona" to gdf.to_file if you want to use pyogrio for most cases (i.e because of its better performance) and only use fiona when you need to specify a schema.

Here is an example:

import geopandas as gpd
from shapely.geometry import Point
gpd.options.io_engine = "fiona" # sets it for the whole script
data = {
 'name': ['Location 1', 'Location 2', 'Location 3'],
 'description': [
 'This is a description for location 1.',
 'This is a description for location 2.',
 'This is a description for location 3.'
 ],
 'geometry': [Point(4.895168, 52.370216), Point(4.904139, 52.367573), Point(4.899431, 52.379189)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
schema = gpd.io.file.infer_schema(gdf)
for col, dtype in schema["properties"].items():
 if dtype == "str":
 schema["properties"][col] = "str:255"
gdf.to_file("test.gdb", layer="test", driver='OpenFileGDB', schema=schema)
# Or
# gdf.to_file("test.gdb", layer="test", engine="fiona", driver='OpenFileGDB', schema=schema)

Checking output with GDAL:

ogrinfo -so -al test.gdb
INFO: Open of `test.gdb'
 using driver `OpenFileGDB' successful.
Layer name: test
Geometry: Point
Feature Count: 3
Extent: (4.895168, 52.367573) - (4.904139, 52.379189)
Layer SRS WKT:
GEOGCRS["WGS 84",
snip...
 ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
FID Column = OBJECTID
Geometry Column = SHAPE
name: String (255.0)
description: String (255.0)

This worked! Thanks a lot.

PythonStudent
– PythonStudent

2024年12月20日 07:16:01 +00:00
Commented Dec 20, 2024 at 7:16

Stack Exchange Network

Writing ESRI File Geodatabase text fields with fixed length using Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Writing ESRI File Geodatabase text fields with fixed length using Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions