I am reading and writing a layer to an ESRI File GDB that contains text fields. Standard text fields are stored as objects by GeoPandas. When the GDB is read the text fields have an unlimited length (or 65536 to be exact). However, I want to write the file so it has a field width character limit of 255 or less.
I have tried several solutions such as:
gdf = gdf.convert_dtypes(convert_string=True)
gdf["text_col"] = gdf["text_col"].astype(str)
gdf["text_col"] = pd.Series(df["text_col"], dtype="S255")
The only solution so far that changed the field length is to convert the column to "S255" dtype. However this writes the column as a byte-string resulting in the addition of b''
in the column values, making it unsuitable for analysis.
So how can I write text columns using GeoPandas to an ESRI FileGDB layer with a dtype of char(255) (or similar) that is visible when inspecting the field metadata in QGIS or ArcGIS.
GeoPandas version: 1.0.1
Snippet to replicate.
import geopandas as gpd
from shapely.geometry import Point
data = {
'name': ['Location 1', 'Location 2', 'Location 3'],
'description': [
'This is a description for location 1.',
'This is a description for location 2.',
'This is a description for location 3.'
],
'geometry': [Point(4.895168, 52.370216), Point(4.904139, 52.367573), Point(4.899431, 52.379189)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
gdf.to_file("test.gdb", driver='OpenFileGDB')
-
1"text", to FGDB, is CLOB (2GiB limit). All varchar fields are variable length, not "fixed".Vince– Vince2024年12月19日 12:28:52 +00:00Commented Dec 19, 2024 at 12:28
-
To follow up on what has already been stated in the comments, the file geodatabase text field is implemented as varchar/nvarchar. There is no more space used up by a 65536-length field than a 255-length field (assuming text itself doesn't go beyond 255), so why bother forcing it back to 255? There is nothing special about Esri's default being 255, it is mostly an interoperability relic from the past.bixb0012– bixb00122025年01月04日 20:47:40 +00:00Commented Jan 4 at 20:47
1 Answer 1
You can control the output datatypes by passing a schema to gdf.to_file
.
schema: dict, default None
If specified, the schema dictionary is passed to Fiona to better control how the file is written. If
None
, GeoPandas will determine the schema based on each column’s dtype. Not supported for the "pyogrio" engine.
This works in geopandas < 1.0 as the default io engine is fiona
.
So you can specify your text field in the schema as str:255
to limit the field width. More info on fiona schemas here.
However, the default io engine in geopandas
>=1.0 is pyogrio
which doesn't support schemas.
So as you are using geopandas >=1.0 you need to specify you want to use the fiona
io engine in your script using either
gpd.options.io_engine = "fiona"
to set it for the entire script,- or you can pass the
engine="fiona"
togdf.to_file
if you want to usepyogrio
for most cases (i.e because of its better performance) and only usefiona
when you need to specify a schema.
Here is an example:
import geopandas as gpd
from shapely.geometry import Point
gpd.options.io_engine = "fiona" # sets it for the whole script
data = {
'name': ['Location 1', 'Location 2', 'Location 3'],
'description': [
'This is a description for location 1.',
'This is a description for location 2.',
'This is a description for location 3.'
],
'geometry': [Point(4.895168, 52.370216), Point(4.904139, 52.367573), Point(4.899431, 52.379189)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")
schema = gpd.io.file.infer_schema(gdf)
for col, dtype in schema["properties"].items():
if dtype == "str":
schema["properties"][col] = "str:255"
gdf.to_file("test.gdb", layer="test", driver='OpenFileGDB', schema=schema)
# Or
# gdf.to_file("test.gdb", layer="test", engine="fiona", driver='OpenFileGDB', schema=schema)
Checking output with GDAL:
ogrinfo -so -al test.gdb
INFO: Open of `test.gdb'
using driver `OpenFileGDB' successful.
Layer name: test
Geometry: Point
Feature Count: 3
Extent: (4.895168, 52.367573) - (4.904139, 52.379189)
Layer SRS WKT:
GEOGCRS["WGS 84",
snip...
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
FID Column = OBJECTID
Geometry Column = SHAPE
name: String (255.0)
description: String (255.0)
-
This worked! Thanks a lot.PythonStudent– PythonStudent2024年12月20日 07:16:01 +00:00Commented Dec 20, 2024 at 7:16
Explore related questions
See similar questions with these tags.