2

I have a shapefile with an xml metadata file. Most of the elements are pre-populated using the previous editions xml file. However, I am trying to automatically update the current edition within a larger script.

I cannot update my Python library with any new modules (for example the metadata module seen in other questions) as I am using a work computer which is locked down and cannot install anything.

The three elements that need updating are the title, the revision date and the edition date.

I have managed to update these using the tree and an index to find the tags.

However, how do I search for the metadata element using the tag title? So for example, the the edition date tag is "redEdDate".

In addition, will the metadata tree index ever change if/when additional information is added? Or do the trees have a set format, therefore the same indexes. Im worried that any update to ArcGIS metadata, will affect the tree, and therefore the index for the specific tags mentioned above, and cause error within the script.

The issue above is shown via the script below. Both datasets use the ArcGIS metadata format. However the same tags have different indexes. For example, the MCMS Polygon dataset uses an index of "root[4][0][0]" for the edition date. However the MCMS Exclusion Zone edition date has an index of "root 3[0][0]".

enter image description here

enter image description here

metadata tree

My script so far is:

import arcpy, os, sys, datetime
import xml
import xml.etree.ElementTree as ET
ws = arcpy.env.workspace = r"path/to/folder"
today = datetime.date.today()
date = today.strftime("%Y%m%d")
#Update the MCMS polygon metadata titles, update date and edition date
for f in os.listdir(ws):
#Find the polygon xml file 
 if f.endswith("A.shp.xml"):
 fpath = os.path.join(ws, f)
#Identify the metadata tree 
 tree = ET.parse(fpath)
 root = tree.getroot()
#Set the title and date variables to the relevant metadata tag index
 editiondate = root[4][0][0]
 reviseddate = root[4][0][5][0]
 title = root[4][0][7]
#Update the tags with the new data
 editiondate.text = today.strftime("%Y-%m-%d")
 reviseddate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
 title.text = "MCMS (polygon)"
#Write the updates to the xml file
 tree.write(fpath)
#Update the MCMS exclusion metadata titles, update date and edition date
for f in os.listdir(ws):
#Find the exclusion zone xml file 
 if f.endswith("Zones.shp.xml"):
 fpath = os.path.join(ws, f)
#Identify the metadata tree 
 tree = ET.parse(fpath)
 root = tree.getroot()
#Set the title and date variables to the relevant metadata tag index
 editiondate = root[3][0][0]
 reviseddate = root[3][0][5][0]
 title = root[3][0][7]
#Update the tags with the new data
 editiondate.text = today.strftime("%Y-%m-%d")
 reviseddate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
 title.text = "MCMS Exclusion Zones"
#Write the updates to the xml file
 tree.write(fpath)
PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
asked Sep 2, 2016 at 9:19
4
  • 1
    Can you post what the XML File looks like? It is hard to troubleshoot without knowing the structure of the tree. Commented Sep 2, 2016 at 15:23
  • The structure of the tree is the ArcGIS format xml. Are these always consistent across different geometry? At the moment I'm using indexes to locate tags, but how do I search for them instead? The structure may change over time which will cause an issue with the index in the future. Commented Sep 3, 2016 at 7:54
  • I have added additional information and a screenshot of the metadata tree Commented Sep 5, 2016 at 10:31
  • Thanks for posting your xml code. I have added an answer below. Commented Sep 6, 2016 at 16:10

1 Answer 1

4

I have some comments/suggestions for you. You mention that you do not have any admin privileges on your machine to be able to install the metadata module. Good news here, you don't need admin to install python packages/modules. You do if you're using a binary install, however, most modules can be installed using pip. You can just download pip and put it in your C:\Python27\ArcGIS10.x\Scripts folder.

You can also just download python packages and just place the modules somewhere in your PYTHONPATH such as C:\Python27\ArcGIS10.x\Lib\site-packages. The flaws with this is you could be missing some dependencies, but that is where pip will be the better option as it should install all dependencies as well.

However, with all that being said, I have never used the metadata module, but I believe the builtin xml module will do everything you need. I actually built a wrapper a while back that has convenience methods for working with xml files (see below). You can try this to see if it helps.

As for hardcoding indices in your script for the metadata, I would avoid doing this. I am not certain if ArcGIS will add future elements to the metadata, but if anything does get added/deleted, it could definitely mess up the indices in your current structure. It is best to get at the elements by name. You can use the xml.etree.ElementTree.Element.find() or xml.etree.ElementTree.Element.findall() methods to accomplish this.

Here is the wrapper I built for working with xml files:

from xml.etree.ElementTree import ElementTree, Element, SubElement, Comment, tostring, parse, fromstring, fromstringlist
from xml.dom import minidom
from xml.sax.saxutils import escape, unescape
import os
import codecs
HTML = {
 '"': """,
 "'": "'",
 ">": ">",
 "<": "&lt;",
 }
HTML_UNESC = {v:k for k,v in HTML.iteritems()}
class BaseXML(object):
 def __init__(self, xml_file):
 """base class for xml files"""
 self.document = xml_file
 if isinstance(xml_file, list):
 # we have a list of strings?
 self.tree = fromstringlist(xml_file)
 elif isinstance(xml_file, basestring) and not os.path.isfile(xml_file) and '<' in xml_file:
 # we have a string?
 self.tree = fromstring(xml_file)
 elif os.path.exists(xml_file):
 self.tree = parse(self.document)
 else:
 raise IOError('Invalid Input for XML file')
 self.directory = os.path.dirname(self.document)
 self.root = self.tree.getroot()
 self.parent_map = {}
 # make static copy
 self._backup = parse(self.document).getroot()
 # initialize parent map
 self.updateParentMap()
 @staticmethod
 def iterElm(root, tag_name=None, childrenOnly=True, **kwargs):
 """return generator for tree
 Optional:
 tag_name -- name of tag
 kwargs -- optional key word args to filter by tag attributes
 """
 for tag in root.iter(tag_name):
 if all([tag.get(k) == v for k,v in kwargs.iteritems()]):
 if childrenOnly and tag != root:
 yield tag
 elif not childrenOnly:
 yield tag
 def elmHasTags(self, root, tag, **kwargs):
 """tests if there are valid tags
 tag_name -- name of tag to check for
 """
 gen = self.iterElm(root, tag, **kwargs)
 try:
 gen.next()
 return True
 except StopIteration:
 return False
 def findChild(self, parent, child_name, **kwargs):
 """find child anywhwere under parent element
 child_name -- name of tag
 kwargs -- keyword args to filter
 """
 for c in self.iterElm(parent, child_name, **kwargs):
 return c
 def findChildren(self, parent, child_name, **kwargs):
 """find all children anywhwere under parent element,
 returns a list of elements.
 child_name -- name of tag
 kwargs -- keyword args to filter
 """
 return [c for c in self.iterElm(parent, child_name, **kwargs)]
 def validateElm(self, elm, elm_name=None, **kwargs):
 """validates whether input is an Element name or Element object. If it
 is an Element name, it will return the Element object with that name and
 any additional key word args
 Required:
 elm -- element name or Element object
 elm_name -- name of Element.tag, only used if elm is a string.
 Optional:
 kwargs -- keyword argument filters, required if elm is a string
 """
 if isinstance(elm, Element):
 return elm
 elif isinstance(elm, basestring):
 return self.getElm(elm_name, **kwargs)
 def updateParentMap(self):
 """updates the parent_map dictionary"""
 self.parent_map = {c:p for p in self.tree.iter() for c in p}
 def countParents(self, elm, parent_name, **kwargs):
 """Count the number of parents an element has of a certain name, does
 heiarchal search
 Required:
 elm -- child element for which to search parents
 parent_name -- name of parent tag
 Optional:
 kwargs -- keyword argument filters
 """
 count = 0
 parent = self.getParent(elm, parent_name, **kwargs)
 while parent != None:
 count += 1
 parent = self.getParent(parent, parent_name, **kwargs)
 return count
 def getParent(self, child, parent_name=None, **kwargs):
 """get parent element by tag name or first parent
 Required:
 child -- child element for which to find parent
 tag_name -- name of tag
 Optional:
 kwargs -- optional key word args to filter by tag attributes
 """
 parent = self.parent_map.get(child)
 if parent is None:
 return None
 if parent_name is None:
 return parent
 else:
 if parent.tag == parent_name and all([parent.get(k) == v for k,v in kwargs.iteritems()]):
 return parent
 else:
 return self.getParent(parent, parent_name, **kwargs)
 def elmHasParentOfName(self, child, parent_name=None, **kwargs):
 """checks if a child element has a parent of an input name
 Required:
 child -- child element for which to find parent
 tag_name -- name of tag
 Optional:
 kwargs -- optional key word args to filter by tag attributes
 """
 return self.getParent(child, parent_name, **kwargs) is not None
 def getElm(self, tag_name, root=None, **kwargs):
 """get specific tag by name and kwargs filter
 Required:
 tag_name -- name of tag
 Optional:
 root -- root element to start with, defaults to the ElementTree
 kwargs -- optional key word args to filter by tag attributes
 """
 for tag in self.iterTags(tag_name, root=root, **kwargs):
 return tag
 def findChildrenWithKeys(self, elm, tag_name=None, keys=[]):
 """finds children of a parent Element of a specific tag and/or if that element has
 attributes matching the names found in input keys list
 Required:
 elm -- root element
 Optional: (should implement one or both of these)
 tag_name -- name of tags to search for
 keys -- list of attribute keys to check for
 """
 if isinstance(keys, basestring):
 keys = [keys]
 return [c for c in self.iterChildren(elm, tag_name) if c is not None and all(map(lambda k: k in c.keys(), keys))]
 @staticmethod
 def prettify(elem):
 """Return a pretty-printed XML string for the Element."""
 rough_string = tostring(elem, 'utf-8')
 reparsed = minidom.parseString(rough_string)
 pretty = reparsed.toprettyxml(indent=" ").split('\n')
 return '\n'.join([l for l in pretty if l.strip()])
 def iterTags(self, tag_name=None, root=None, **kwargs):
 """return generator for tree
 Optional:
 tag_name -- name of tag
 root -- optional root tag to start from, if None specified defaults
 to the ElementTree
 kwargs -- optional key word args to filter by tag attributes
 """
 if isinstance(root, Element):
 return self.iterElm(root, tag_name, **kwargs)
 else:
 return self.iterElm(self.tree, tag_name, **kwargs)
 @staticmethod
 def iterChildren(parent, tag=None, childrenOnly=True, **kwargs):
 """iterate all children of an element based on **kwargs filter
 Required:
 parent -- element for which to search children
 Optional:
 tag -- name of tag for filter
 childrenOnly -- return children only, if false, iterator will start
 at parent
 kwargs -- optional key word args to filter by tag attributes
 """
 for elm in parent.iter(tag):
 if all([elm.get(k) == v for k,v in kwargs.iteritems()]):
 if childrenOnly and elm != parent:
 yield elm
 elif not childrenOnly:
 yield elm
 def hasTags(self, tag_name, root=None, **kwargs):
 """tests if there are valid tags
 tag_name -- name of tag to check for
 """
 gen = self.iterTags(tag_name, **kwargs)
 try:
 gen.next()
 return True
 except StopIteration:
 return False
 def addElm(self, tag_name, attrib={}, root=None, update_map=True):
 """add SubElement to site or existing element
 Required:
 tag_name -- name of new element
 Optional:
 attrib -- dictionary of attributes for new element
 root -- parent element for which to add element. If none specified,
 element will be added to <Site> root.
 update_map -- option to update parent map, you may want to disable this
 when making many changes during an iterative process. Default is True.
 """
 if root is None:
 root = self.root
 sub = SubElement(root, tag_name, attrib)
 if update_map:
 self.updateParentMap()
 return sub
 def restore(self):
 """reverts all changes back to the state at which the Site.xml document was
 when this class was initialized
 """
 self.__init__(self.document)
 def save(self):
 """saves the changes"""
 with codecs.open(self.document, 'w', 'utf-8') as f:
 f.write(self.prettify(self.root))
 def __iter__(self):
 """create generator"""
 for elm in iter(self.tree.iter()):
 yield elm

To use this, save it in your C:\Python27\ArcGIS10.x\Lib\site-packages (or better yet a network share where it can be imported from) as something like xmlhelper.py. To do some of the above stuff, you can do something like the following:

import xmlhelper # make sure this is importable, if not use sys.path.append(r'path_to_module_parent_folder') first
import os
import datetime
import glob
ws = arcpy.env.workspace = r"path/to/folder"
today = datetime.date.today()
date = today.strftime("%Y%m%d")
# find all metadata files in path
for f in glob.glob(os.path.join(ws, '*.shp.xml')):
 # user wrapper here
 doc = xmlhelper.BaseXML(f)
 # get edition date and set it
 editiondate = doc.getElm('resEdDate')
 editiondate.text = today.strftime("%Y-%m-%d")
 # update revise date
 revisedate = doc.getElm('reviseDate')
 revisedate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
 # update title
 title = doc.getElm('resTitle') # should be the same, regardless of shapefile?
 # may want to be a little more explicit with if statement here....
 if f.endswith('A.shp.xml'):
 title.text = 'MCMS (polygon)'
 elif f.endswith('Zones.shp.xml'):
 title.text = 'MCMS Exclusion Zones'
 # save it
 doc.save()
print 'done'

To test to make sure it is working, I would make a copy of the data on your desktop first to try this out on that. If it works, then you can run it against your production data. The code above is untested.

PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
answered Sep 6, 2016 at 16:10
7
  • Ive tried to implement your xmlwrapper, but when I try to use doc.save(), i get the following error: Runtime error Traceback (most recent call last): File "<string>", line 6, in <module> File "C:\Python27\ArcGIS10.2\lib\site-packages\xmlhelper.py", line 277, in save with codecs.open(self.document, 'w', 'utf-8') as f: NameError: global name 'codecs' is not defined. Commented Sep 7, 2016 at 12:54
  • 2
    Ohh, oops. When I copied the code I missed the first line, which is import codecs. Once you import that module, it should work. I just edited this to include importing codecs. Commented Sep 8, 2016 at 13:13
  • Thanks for that. It worked a charm. I actually tried to import that module into my script. I didnt clock on that i needed to import it into the xmlhelper module. Cheers Commented Sep 8, 2016 at 13:49
  • Glad it worked for you, I use this wrapper for all things XML. I did edit the question though to remove the arcpy tags as this has more to deal with general Python programming against an ArcGIS metadata XML file. Commented Sep 8, 2016 at 16:28
  • Hi mate, im trying to use this wrapper again. However having trouble finding a specific element. For example, the element "linkage" is used multiple times. Trying to locate a specific element called linkage, but the "getelm" only finds the first one. Commented Jan 13, 2017 at 10:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.