I have a shapefile with an xml metadata file. Most of the elements are pre-populated using the previous editions xml file. However, I am trying to automatically update the current edition within a larger script.
I cannot update my Python library with any new modules (for example the metadata module seen in other questions) as I am using a work computer which is locked down and cannot install anything.
The three elements that need updating are the title, the revision date and the edition date.
I have managed to update these using the tree and an index to find the tags.
However, how do I search for the metadata element using the tag title? So for example, the the edition date tag is "redEdDate".
In addition, will the metadata tree index ever change if/when additional information is added? Or do the trees have a set format, therefore the same indexes. Im worried that any update to ArcGIS metadata, will affect the tree, and therefore the index for the specific tags mentioned above, and cause error within the script.
The issue above is shown via the script below. Both datasets use the ArcGIS metadata format. However the same tags have different indexes. For example, the MCMS Polygon dataset uses an index of "root[4][0][0]" for the edition date. However the MCMS Exclusion Zone edition date has an index of "root 3[0][0]".
My script so far is:
import arcpy, os, sys, datetime
import xml
import xml.etree.ElementTree as ET
ws = arcpy.env.workspace = r"path/to/folder"
today = datetime.date.today()
date = today.strftime("%Y%m%d")
#Update the MCMS polygon metadata titles, update date and edition date
for f in os.listdir(ws):
#Find the polygon xml file
if f.endswith("A.shp.xml"):
fpath = os.path.join(ws, f)
#Identify the metadata tree
tree = ET.parse(fpath)
root = tree.getroot()
#Set the title and date variables to the relevant metadata tag index
editiondate = root[4][0][0]
reviseddate = root[4][0][5][0]
title = root[4][0][7]
#Update the tags with the new data
editiondate.text = today.strftime("%Y-%m-%d")
reviseddate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
title.text = "MCMS (polygon)"
#Write the updates to the xml file
tree.write(fpath)
#Update the MCMS exclusion metadata titles, update date and edition date
for f in os.listdir(ws):
#Find the exclusion zone xml file
if f.endswith("Zones.shp.xml"):
fpath = os.path.join(ws, f)
#Identify the metadata tree
tree = ET.parse(fpath)
root = tree.getroot()
#Set the title and date variables to the relevant metadata tag index
editiondate = root[3][0][0]
reviseddate = root[3][0][5][0]
title = root[3][0][7]
#Update the tags with the new data
editiondate.text = today.strftime("%Y-%m-%d")
reviseddate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
title.text = "MCMS Exclusion Zones"
#Write the updates to the xml file
tree.write(fpath)
-
1Can you post what the XML File looks like? It is hard to troubleshoot without knowing the structure of the tree.crmackey– crmackey2016年09月02日 15:23:45 +00:00Commented Sep 2, 2016 at 15:23
-
The structure of the tree is the ArcGIS format xml. Are these always consistent across different geometry? At the moment I'm using indexes to locate tags, but how do I search for them instead? The structure may change over time which will cause an issue with the index in the future.MacroZED– MacroZED2016年09月03日 07:54:17 +00:00Commented Sep 3, 2016 at 7:54
-
I have added additional information and a screenshot of the metadata treeMacroZED– MacroZED2016年09月05日 10:31:44 +00:00Commented Sep 5, 2016 at 10:31
-
Thanks for posting your xml code. I have added an answer below.crmackey– crmackey2016年09月06日 16:10:56 +00:00Commented Sep 6, 2016 at 16:10
1 Answer 1
I have some comments/suggestions for you. You mention that you do not have any admin privileges on your machine to be able to install the metadata module. Good news here, you don't need admin to install python packages/modules. You do if you're using a binary install, however, most modules can be installed using pip. You can just download pip and put it in your C:\Python27\ArcGIS10.x\Scripts
folder.
You can also just download python packages and just place the modules somewhere in your PYTHONPATH
such as C:\Python27\ArcGIS10.x\Lib\site-packages
. The flaws with this is you could be missing some dependencies, but that is where pip will be the better option as it should install all dependencies as well.
However, with all that being said, I have never used the metadata module, but I believe the builtin xml module will do everything you need. I actually built a wrapper a while back that has convenience methods for working with xml
files (see below). You can try this to see if it helps.
As for hardcoding indices in your script for the metadata, I would avoid doing this. I am not certain if ArcGIS will add future elements to the metadata, but if anything does get added/deleted, it could definitely mess up the indices in your current structure. It is best to get at the elements by name. You can use the xml.etree.ElementTree.Element.find()
or xml.etree.ElementTree.Element.findall()
methods to accomplish this.
Here is the wrapper I built for working with xml
files:
from xml.etree.ElementTree import ElementTree, Element, SubElement, Comment, tostring, parse, fromstring, fromstringlist
from xml.dom import minidom
from xml.sax.saxutils import escape, unescape
import os
import codecs
HTML = {
'"': """,
"'": "'",
">": ">",
"<": "<",
}
HTML_UNESC = {v:k for k,v in HTML.iteritems()}
class BaseXML(object):
def __init__(self, xml_file):
"""base class for xml files"""
self.document = xml_file
if isinstance(xml_file, list):
# we have a list of strings?
self.tree = fromstringlist(xml_file)
elif isinstance(xml_file, basestring) and not os.path.isfile(xml_file) and '<' in xml_file:
# we have a string?
self.tree = fromstring(xml_file)
elif os.path.exists(xml_file):
self.tree = parse(self.document)
else:
raise IOError('Invalid Input for XML file')
self.directory = os.path.dirname(self.document)
self.root = self.tree.getroot()
self.parent_map = {}
# make static copy
self._backup = parse(self.document).getroot()
# initialize parent map
self.updateParentMap()
@staticmethod
def iterElm(root, tag_name=None, childrenOnly=True, **kwargs):
"""return generator for tree
Optional:
tag_name -- name of tag
kwargs -- optional key word args to filter by tag attributes
"""
for tag in root.iter(tag_name):
if all([tag.get(k) == v for k,v in kwargs.iteritems()]):
if childrenOnly and tag != root:
yield tag
elif not childrenOnly:
yield tag
def elmHasTags(self, root, tag, **kwargs):
"""tests if there are valid tags
tag_name -- name of tag to check for
"""
gen = self.iterElm(root, tag, **kwargs)
try:
gen.next()
return True
except StopIteration:
return False
def findChild(self, parent, child_name, **kwargs):
"""find child anywhwere under parent element
child_name -- name of tag
kwargs -- keyword args to filter
"""
for c in self.iterElm(parent, child_name, **kwargs):
return c
def findChildren(self, parent, child_name, **kwargs):
"""find all children anywhwere under parent element,
returns a list of elements.
child_name -- name of tag
kwargs -- keyword args to filter
"""
return [c for c in self.iterElm(parent, child_name, **kwargs)]
def validateElm(self, elm, elm_name=None, **kwargs):
"""validates whether input is an Element name or Element object. If it
is an Element name, it will return the Element object with that name and
any additional key word args
Required:
elm -- element name or Element object
elm_name -- name of Element.tag, only used if elm is a string.
Optional:
kwargs -- keyword argument filters, required if elm is a string
"""
if isinstance(elm, Element):
return elm
elif isinstance(elm, basestring):
return self.getElm(elm_name, **kwargs)
def updateParentMap(self):
"""updates the parent_map dictionary"""
self.parent_map = {c:p for p in self.tree.iter() for c in p}
def countParents(self, elm, parent_name, **kwargs):
"""Count the number of parents an element has of a certain name, does
heiarchal search
Required:
elm -- child element for which to search parents
parent_name -- name of parent tag
Optional:
kwargs -- keyword argument filters
"""
count = 0
parent = self.getParent(elm, parent_name, **kwargs)
while parent != None:
count += 1
parent = self.getParent(parent, parent_name, **kwargs)
return count
def getParent(self, child, parent_name=None, **kwargs):
"""get parent element by tag name or first parent
Required:
child -- child element for which to find parent
tag_name -- name of tag
Optional:
kwargs -- optional key word args to filter by tag attributes
"""
parent = self.parent_map.get(child)
if parent is None:
return None
if parent_name is None:
return parent
else:
if parent.tag == parent_name and all([parent.get(k) == v for k,v in kwargs.iteritems()]):
return parent
else:
return self.getParent(parent, parent_name, **kwargs)
def elmHasParentOfName(self, child, parent_name=None, **kwargs):
"""checks if a child element has a parent of an input name
Required:
child -- child element for which to find parent
tag_name -- name of tag
Optional:
kwargs -- optional key word args to filter by tag attributes
"""
return self.getParent(child, parent_name, **kwargs) is not None
def getElm(self, tag_name, root=None, **kwargs):
"""get specific tag by name and kwargs filter
Required:
tag_name -- name of tag
Optional:
root -- root element to start with, defaults to the ElementTree
kwargs -- optional key word args to filter by tag attributes
"""
for tag in self.iterTags(tag_name, root=root, **kwargs):
return tag
def findChildrenWithKeys(self, elm, tag_name=None, keys=[]):
"""finds children of a parent Element of a specific tag and/or if that element has
attributes matching the names found in input keys list
Required:
elm -- root element
Optional: (should implement one or both of these)
tag_name -- name of tags to search for
keys -- list of attribute keys to check for
"""
if isinstance(keys, basestring):
keys = [keys]
return [c for c in self.iterChildren(elm, tag_name) if c is not None and all(map(lambda k: k in c.keys(), keys))]
@staticmethod
def prettify(elem):
"""Return a pretty-printed XML string for the Element."""
rough_string = tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
pretty = reparsed.toprettyxml(indent=" ").split('\n')
return '\n'.join([l for l in pretty if l.strip()])
def iterTags(self, tag_name=None, root=None, **kwargs):
"""return generator for tree
Optional:
tag_name -- name of tag
root -- optional root tag to start from, if None specified defaults
to the ElementTree
kwargs -- optional key word args to filter by tag attributes
"""
if isinstance(root, Element):
return self.iterElm(root, tag_name, **kwargs)
else:
return self.iterElm(self.tree, tag_name, **kwargs)
@staticmethod
def iterChildren(parent, tag=None, childrenOnly=True, **kwargs):
"""iterate all children of an element based on **kwargs filter
Required:
parent -- element for which to search children
Optional:
tag -- name of tag for filter
childrenOnly -- return children only, if false, iterator will start
at parent
kwargs -- optional key word args to filter by tag attributes
"""
for elm in parent.iter(tag):
if all([elm.get(k) == v for k,v in kwargs.iteritems()]):
if childrenOnly and elm != parent:
yield elm
elif not childrenOnly:
yield elm
def hasTags(self, tag_name, root=None, **kwargs):
"""tests if there are valid tags
tag_name -- name of tag to check for
"""
gen = self.iterTags(tag_name, **kwargs)
try:
gen.next()
return True
except StopIteration:
return False
def addElm(self, tag_name, attrib={}, root=None, update_map=True):
"""add SubElement to site or existing element
Required:
tag_name -- name of new element
Optional:
attrib -- dictionary of attributes for new element
root -- parent element for which to add element. If none specified,
element will be added to <Site> root.
update_map -- option to update parent map, you may want to disable this
when making many changes during an iterative process. Default is True.
"""
if root is None:
root = self.root
sub = SubElement(root, tag_name, attrib)
if update_map:
self.updateParentMap()
return sub
def restore(self):
"""reverts all changes back to the state at which the Site.xml document was
when this class was initialized
"""
self.__init__(self.document)
def save(self):
"""saves the changes"""
with codecs.open(self.document, 'w', 'utf-8') as f:
f.write(self.prettify(self.root))
def __iter__(self):
"""create generator"""
for elm in iter(self.tree.iter()):
yield elm
To use this, save it in your C:\Python27\ArcGIS10.x\Lib\site-packages
(or better yet a network share where it can be imported from) as something like xmlhelper.py
. To do some of the above stuff, you can do something like the following:
import xmlhelper # make sure this is importable, if not use sys.path.append(r'path_to_module_parent_folder') first
import os
import datetime
import glob
ws = arcpy.env.workspace = r"path/to/folder"
today = datetime.date.today()
date = today.strftime("%Y%m%d")
# find all metadata files in path
for f in glob.glob(os.path.join(ws, '*.shp.xml')):
# user wrapper here
doc = xmlhelper.BaseXML(f)
# get edition date and set it
editiondate = doc.getElm('resEdDate')
editiondate.text = today.strftime("%Y-%m-%d")
# update revise date
revisedate = doc.getElm('reviseDate')
revisedate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
# update title
title = doc.getElm('resTitle') # should be the same, regardless of shapefile?
# may want to be a little more explicit with if statement here....
if f.endswith('A.shp.xml'):
title.text = 'MCMS (polygon)'
elif f.endswith('Zones.shp.xml'):
title.text = 'MCMS Exclusion Zones'
# save it
doc.save()
print 'done'
To test to make sure it is working, I would make a copy of the data on your desktop first to try this out on that. If it works, then you can run it against your production data. The code above is untested.
-
Ive tried to implement your xmlwrapper, but when I try to use doc.save(), i get the following error: Runtime error Traceback (most recent call last): File "<string>", line 6, in <module> File "C:\Python27\ArcGIS10.2\lib\site-packages\xmlhelper.py", line 277, in save with codecs.open(self.document, 'w', 'utf-8') as f: NameError: global name 'codecs' is not defined.MacroZED– MacroZED2016年09月07日 12:54:54 +00:00Commented Sep 7, 2016 at 12:54
-
2Ohh, oops. When I copied the code I missed the first line, which is
import codecs
. Once you import that module, it should work. I just edited this to include importingcodecs
.crmackey– crmackey2016年09月08日 13:13:36 +00:00Commented Sep 8, 2016 at 13:13 -
Thanks for that. It worked a charm. I actually tried to import that module into my script. I didnt clock on that i needed to import it into the xmlhelper module. CheersMacroZED– MacroZED2016年09月08日 13:49:55 +00:00Commented Sep 8, 2016 at 13:49
-
Glad it worked for you, I use this wrapper for all things XML. I did edit the question though to remove the
arcpy
tags as this has more to deal with general Python programming against an ArcGIS metadata XML file.crmackey– crmackey2016年09月08日 16:28:29 +00:00Commented Sep 8, 2016 at 16:28 -
Hi mate, im trying to use this wrapper again. However having trouble finding a specific element. For example, the element "linkage" is used multiple times. Trying to locate a specific element called linkage, but the "getelm" only finds the first one.MacroZED– MacroZED2017年01月13日 10:55:12 +00:00Commented Jan 13, 2017 at 10:55