I have a number of XML files (Atlassian Confluence space exports FYI), that I need to parse through to replace certain strings, as long as the line begins with a set string.
Here is an example of a line that I need to parse and change:
<ac:structured-macro ac:name="jira" ac:schema-version="1" ac:macro-id="4dacac64-1234-8dd4-badd-acdfddf208d4"><ac:parameter ac:name="server">Jira Server Name</ac:parameter><ac:parameter ac:name="columns">key,summary,assignee,reporter,status</ac:parameter><ac:parameter ac:name="maximumIssues">20</ac:parameter><ac:parameter ac:name="jqlQuery">project = SOMECODE and issuetype = IssueType and "SomeOtherThing" != ExtraThing </ac:parameter><ac:parameter ac:name="serverId">6adef236-9999-66a6-72bc-e87a4cc03c47</ac:parameter></ac:structured-macro>
The two things that I need to replace are the values of:
<ac:parameter ac:name="serverId">
<ac:parameter ac:name="server">
All lines that need to have values replaced start with:
<ac:structured-macro ac:name="jira"
There are multiple different values of "serverId" and "server", that will need to be matched and replacced.
I have tried to use sed
with regex, but can't work out how to add regex like:
(.*?)
, to grab all the different values of 'serverId' and 'server'.
These two lines of sed are the closest I've got, but once again, I need to be able to 'fuzzy match' the server value (ie. 'Jira Server Name'), and the server Id value (ie. 6adef236-9999-66a6-72bc-e87a4cc03c47)
sed -i -e '/<ac:structured-macro ac:name="jira"/s~\(<ac:parameter ac:name="server">\)Jira Server Name\(</ac:parameter>\)~1円Replacement Server Name2円~' entities.xml
sed -i -e '/<ac:structured-macro ac:name="jira"/s~\(<ac:parameter ac:name="serverId">\)6adef236-9999-66a6-72bc-e87a4cc03c47\(</ac:parameter>\)~16円c3d2a6e-1234-95cb-33f3-b11a8ff01c442円~' entities.xml
How can I 'wildcard' the serverID and server name values in my sed commands, so that I can match and replace multiple different ids/names?
Is sed
the right tool for the job? Is there another/easier way to do this?
Actual XML file to be considered:
<jira>
<object class="BodyContent" package="com.atlassian.confluence.core">
<id name="id">12334762</id>
<property name="body"><![CDATA[<h1>Links</h1><p><ac:structured-macro ac:name="jira" ac:schema-version="1" ac:macro-id="a0d29f31-1212-4234-abcd-9ba23456f8cf"><ac:parameter ac:name="server">JIRA SERVER NAME</ac:parameter>6123450c-1234-acdb-8123-33333397828b</ac:parameter><ac:parameter ac:name="key">ABC-272</ac:parameter></ac:structured-macro></p><h1>Files</h1><p><ac:structured-macro ac:name="attachments" ac:schema-version="1" ac:macro-id="4f911234-1234-1234-1234-12345aad9b77" /></p>]]></property>
<property name="content" class="Page" package="com.atlassian.confluence.pages"><id name="id">443449761</id>
</property>
<property name="bodyType">2</property>
</object>
<object class="OutgoingLink" package="com.atlassian.confluence.links">
<id name="id">931112345</id>
<property name="destinationPageTitle"><![CDATA[some thing]]></property>
<property name="lowerDestinationPageTitle"><![CDATA[some thing]]></property>
<property name="destinationSpaceKey"><![CDATA[https]]></property>
<property name="lowerDestinationSpaceKey"><![CDATA[https]]></property>
<property name="sourceContent" class="Page" package="com.atlassian.confluence.pages"><id name="id">943325975</id>
</property>
<property name="creator" class="ConfluenceUserImpl" package="com.atlassian.confluence.user"><id name="key"><![CDATA[1234567890]]></id>
</property>
<property name="creationDate">2018年10月10日 07:02:45.817</property>
<property name="lastModifier" class="ConfluenceUserImpl" package="com.atlassian.confluence.user"><id name="key"><![CDATA[1234567890]]></id>
</property>
<property name="lastModificationDate">2018年10月10日 07:02:45.817</property>
</object>
<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">123457845</id>
<property name="hibernateVersion">10</property>
<property name="title"><![CDATA[20170428 - somehing]]></property>
<property name="lowerTitle"><![CDATA[20170428 - somehing]]></property>
<collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">1234567</id>
</element>
</collection>
<collection name="contentProperties" class="java.util.Collection"><element class="ContentProperty" package="com.atlassian.confluence.content"><id name="id">1234567</id>
</element>
<element class="ContentProperty" package="com.atlassian.confluence.content"><id name="id">123456748</id>
</element>
<element class="ContentProperty" package="com.atlassian.confluence.content"><id name="id">123456749</id>
</element>
<element class="ContentProperty" package="com.atlassian.confluence.content"><id name="id">123456750</id>
</element>
<element class="ContentProperty" package="com.atlassian.confluence.content"><id name="id">123456751</id>
</element>
</collection>
<property name="version">1</property>
<property name="creator" class="ConfluenceUserImpl" package="com.atlassian.confluence.user"><id name="key"><![CDATA[1234567890]]></id>
</property>
<property name="creationDate">2018年09月20日 04:52:30.727</property>
<property name="lastModifier" class="ConfluenceUserImpl" package="com.atlassian.confluence.user"><id name="key"><![CDATA[1234567890]]></id>
</property>
<property name="lastModificationDate">2018年09月20日 04:57:08.072</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="originalVersion" class="Page" package="com.atlassian.confluence.pages"><id name="id">878936102</id>
</property>
<property name="originalVersionId">878936102</property>
<property name="contentStatus"><![CDATA[current]]></property>
<property name="position">2143289343</property>
</object>
<object class="BodyContent" package="com.atlassian.confluence.core">
<id name="id">443449754</id>
<property name="body"><![CDATA[<h1>Links</h1><p><ac:structured-macro ac:name="jira" ac:schema-version="1" ac:macro-id="a0d29f31-acdb-1234-1234-12345ffff8cf"><ac:parameter ac:name="server">JIRA SERVER NAME</ac:parameter>6123450c-1234-acdb-8123-33333397828b</ac:parameter><ac:parameter ac:name="key">ABC-272</ac:parameter></ac:structured-macro></p><h1>Files</h1><p><ac:structured-macro ac:name="attachments" ac:schema-version="1" ac:macro-id="abcd1236-1234-1234-1234-abcd3aa12345" /></p>]]></property>
<property name="content" class="Page" package="com.atlassian.confluence.pages"><id name="id">443613611</id>
</property>
<property name="bodyType">2</property>
</object>
</jira>
1 Answer 1
If I understand your problem correctly, simply make sure to use a pattern that fits only the name or the id. As the fields include everything upto the next opening <
, this should be simple:
sed -i -e '/<ac:structured-macro ac:name="jira"/s~\(<ac:parameter ac:name="server">\)[^<]*\(</ac:parameter>\)~1円Replacement Server Name2円~' entities.xml
with [^<]*
matching all characters except for <
.
Please note that this will only work for input like this. Theoretically, xml files may contain line breaks or other stuff that will break any script that is not aware of xml syntax. Then better use something like a python script with import xml.etree.ElementTree
-
I'll give it a shot! Thanks for that!davodinkum– davodinkum2019年04月29日 06:59:14 +00:00Commented Apr 29, 2019 at 6:59
sed
you’re running. (2) Part of your problem is thatsed
doesn’t support non-greedy regular expressions (*?
). (3) Can you possibly rework your problem into something with shorter strings, to make the question more manageable? (4) Please give a clearer statement of your objective. Do you have lists of old and new values for server name and serverID? ... (Cont’d)xmlstarlet
. I'll try to get you an example later today. In the meantime please can you post a slightly larger block of the XML. Do feel free to remove large or sensitive textual content, but leave the tags as best you can.