I am importing csv files into a gdb, and need to change a couple field types during the import. I am trying to use the field mapping in TableToTable_conversion to change the field type as the table gets imported.
I am confused about how to identify the field that needs to be updated. When setting up the field mapping I am using the fm.addInputField line, which requires a field name. But because the gdb table does not exist yet as it has not yet been imported, I can't define the field to input.
Should the approach be to use the csv rows to define which field I want to change, or am I missing something completely? Is there an easier way to change a field type upon import rather than using Field Mappings?
4 Answers 4
I don't know how sophisticated are your QA/QC operations you run on the input .csv
files (what if user has added a row with a string in the column that you expect to be an integer?). If you target ArcGIS 10.4+, I may recommend using pandas
Python package to read the .csv
file and cast the columns into the proper types so you don't have deal with the cast errors yourself. When you are done, you can always export the produced data frame into an output .csv
into a user temp folder using the tempfile
module.
If you are only interested in getting your columns right (without actually checking whether all rows would qualify), I suggest converting the .csv
file into an in_memory
layer first.
Say you have a .csv
file with the rows:
ID FieldInt FieldStr FieldDate
1 10 Value1 2018年02月12日
2 20 Value2 2018年02月14日
3a 20a Value3 2018年02月16日
You would like all the fields to be of string types. If you would convert this .csv
into a table using the arcpy.TableToTable_conversion(
, you would get:
As you can see, ArcGIS decided to cast the ID
and FieldInt
fields into Integer
fields and values that could not have been casted are now just null
.
You will not be able to restore the null
values, but you can still move the data left into the columns of right type. You create a new table with the fields found in the .csv
file using the data types you need:
- Create an empty geodatabase table.
- Add fields with the necessary types (using
arcpy.AddField_management
. - Convert source
.csv
into a temp tablein_memory\data
. - Append with the
arcpy.Append_management(src, target)
moving the data from the temp table into a production one.
Even if you would have a field map in place, the situation I'm describing above would make it impossible to import all the data right. Try yourself to run the TableToTable
tool in ArcMap UI.
arcpy.TableToTable_conversion(in_rows="C:/GIS/Temp/data.csv", out_path="C:/GIS/Temp/ArcGISHomeFolder/sample.gdb", out_name="trick1", where_clause="", field_mapping='ID "ID" true true false 4 Text 0 0 ,First,#,C:\GIS\Temp\data.csv,ID,-1,-1;FieldInt "FieldInt" true true false 4 Text 0 0 ,First,#,C:\GIS\Temp\data.csv,FieldInt,-1,-1;FieldStr "FieldStr" true true false 8000 Text 0 0 ,First,#,C:\GIS\Temp\data.csv,FieldStr,-1,-1;FieldDate "FieldDate" true true false 20 Text 0 0 ,First,#,C:\GIS\Temp\data.csv,FieldDate,-1,-1', config_keyword="")
Even after you've specified all the fields to be of Text
type, the last row is not loaded (only null
present`).
PS. A dirty workaround I've seen in someone's code was to put a top row in the .csv
file with values of the type one wanted to have and then delete the row after the data import was done. This could be done using Python's csv
module and then using arcpy.da.UpdateCursor
to delete the first row.
-
Thanks for the suggestions - in this case the data doesn't need a QA/QC, I just can't loose any data, even if it is invalid. This is why in the case of an ID for example, I need it to import as a string so I don't loose something like 3a, which is what was happening if course when TabletoTable_conversion changes a string field to a integer field. I did manage to get the Field Mappings to work so I will post that as an answer now. My new problem is the leading 0s being lost on the import into the string field, for example 01008 becomes 1008, which the field mapping doesn't solve.JS24– JS242018年03月27日 10:10:37 +00:00Commented Mar 27, 2018 at 10:10
Just to follow up, I got the field mappings to work based on the csv fields, using this below:
input = 'mydata.csv'
fms = arcpy.FieldMappings()
with open(input, 'rb') as f:
d_reader = csv.DictReader(f)
headers = d_reader.fieldnames
for header in headers:
fm = arcpy.FieldMap()
fm.addInputField(output, header)
newField = fm.outputField
newField.type = "Text"
newField.length = 8000
fm.outputField = newField
fms.addFieldMap(fm)
arcpy.TableToTable_conversion(output, outputgdb, fnametable, field_mapping=fms)
This doesn't prevent leading zeros from being lost in the transition from int to text field, for example 01008 becomes 1008, so I am still working on that.
In response to the issue
This doesn't prevent leading zeros from being lost in the transition from int to text field, for example 01008 becomes 1008, so I am still working on that.
To keep leading 0
s, add letter to the front of the value prior to import then remove it once imported
Not sure if that fit your requirement but by creating a "schema.ini" text file in the same folder as your csv you can specify the field data type.
Have a look at this question for more detail How to auto-create a schema.ini file for a .csv?
-
I came across that as a fix but it won't work in this case because the script will be used by a number of different users who obtain the csvs themselves and store them locally.JS24– JS242018年02月12日 15:10:25 +00:00Commented Feb 12, 2018 at 15:10