11

I have HTML code stored in the data base, and I want to read it as XML.

My codes:

http://rextester.com/RMEHO89992

This is an example of the HTML code I have:

<div>
 <section>
 <h4>
 <span> A </span>
 </h4>
 <ul>
 <li>
 <span> Ab</span>
 AD
 <span> AC </span>
 </li>
 <li>
 <span> Ag</span>
 <span> AL </span>
 </li>
 </ul>
 <h4>
 <span> B </span>
 </h4>
 <ul>
 <li>
 <span> Bb</span>
 BD
 <span> BC </span>
 </li>
 <li>
 <span> Bg</span>
 <span> BL </span>
 </li>
 </ul>
 </section>
</div>

and this is an example of the output I need:

Category Selection Value 
--------- --------- ------------ 
A Ab AD 
A Ag AL 
B Bb BD 
B Bg BL 

I need to get the value inside the <h4> tag as a Category, the first <span> tag as Selection, and the rest of the values as a concatenated string.

I've tried the following query:

SELECT 
 ( isnull(t.v.value('(h4/span/span[1]/text())[1]','nvarchar(max)'),'') 
 + isnull(t.v.value('(h4/span/text())[1]','nvarchar(max)'),'')
 + isnull(t.v.value('(h4/span/span[2]/text())[2]','nvarchar(max)'),'')
 ) AS [Category],
 ( isnull(c.g.value('(span[1]/text())[1]','nvarchar(max)'),'')
 + isnull(c.g.value('(span[1]/span/text())[1]','nvarchar(max)'),'')
 + isnull(c.g.value('(span[1]/text())[2]','nvarchar(max)'),'')
 ) AS [Selection],
 ( isnull(c.g.value('(span[2]/text())[1]','nvarchar(max)'),'')
 + isnull(c.g.value('(span[2]/span/text())[1]','nvarchar(max)'),'')
 + isnull(c.g.value('(span[2]/text())[2]','nvarchar(max)'),'')
 ) AS [Value]
FROM @htmlXML.nodes('div/section') as t(v)
CROSS APPLY t.v.nodes('./ul/li') AS c(g) 

and :

SELECT 
 t.v.value('.','nvarchar(max)')
 ,
 --( isnull(t.v.value('(h4/span/span[1]/text())[1]','nvarchar(max)'),'')+isnull(t.v.value('(h4/span/text())[1]','nvarchar(max)'),'')+isnull(t.v.value('(h4/span/span[2]/text())[2]','nvarchar(max)'),''))AS [Category],
 ( isnull(c.g.value('(span[1]/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[1]/span/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[1]/text())[2]','nvarchar(max)'),''))AS [Selection]
 ,
 ( isnull(c.g.value('(span[2]/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[2]/span/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[2]/text())[2]','nvarchar(max)'),''))AS [Value]
 FROM @htmlXML.nodes('div/section/h4/span') as t(v)
 CROSS APPLY @htmlXML.nodes('div/section/ul/li') AS c(g)

But it only gets the first category, and doesn't get all the values togheter.

Category Selection Value
--------- --------- ------------
A Ab AC 
B Ab AC 
A Ag AL
B Ag AL 
A Bb BC
B Bb BC 
A Bg BL 
B Bg BL 

There can be N categories, and the values might or might not be inside <span> tags. How can I get all the categories with their corresponding value? or get :

category h4 number
-------- -----------
 A 1
 B 2
  • 1 ,mean = h4 first , 2 ,mean = h4 second
 ul number Selection Value 
 --------- --------- ------------ 
 1 Ab AD 
 1 Ag AL 
 2 Bb BD 
 2 Bg BL 

relation between column ul number and h4 number. i cannt.

asked Jan 28, 2017 at 12:38
2
  • 1
    Are you sure the expected result is correct? Shouldn't it be AD AC for the first row in the third column? Commented Jan 28, 2017 at 16:08
  • I am trying to establish communication between nodes` h4` and ` ul `. Commented Jan 28, 2017 at 20:00

2 Answers 2

7

This is not exactly elegant but seems to do the job.

DECLARE @X XML = REPLACE(REPLACE(@S, '<h4>', '<foo><h4>'), '</ul>', '</ul></foo>')
SELECT Category = x.value('../../h4[1]/span[1]', 'varchar(10)'),
 Selection = x.value('descendant-or-self::text()[1]', 'varchar(10)'),
 Value = REPLACE(
 REPLACE(
 REPLACE(
 LTRIM(
 RTRIM(
 REPLACE(
 REPLACE(
 CAST(x.x.query('fn:data(descendant-or-self::text()[fn:position() > 1])') AS VARCHAR(MAX))
 , char(10), '')
 , char(13), '')
 )
 )
 , ' ', ' |')
 , '| ', '')
 , '|', '')
FROM @X.nodes('div/section/foo/ul/li') x(x)
ORDER BY Category,
 Selection

Which returns

+----------+-----------+-------+
| Category | Selection | Value |
+----------+-----------+-------+
| A | Ab | AD AC |
| A | Ag | AL |
| B | Bb | BD BC |
| B | Bg | BL |
+----------+-----------+-------+

I'm assuming this is what you want as the desired results table in the question does not return the "rest of the values as a concatenated string"

answered Jan 28, 2017 at 17:43
0
14

I am trying to establish communication between nodesh4 and ul.

You can use the << and >> operator to check if a node is before or after another node in document order. Combine that with a predicate on position, [1], to get the first occurrence also in document order.

select H4.X.value('(span/text())[1]', 'varchar(10)') as Section,
 UL.X.query('.') as UL
from @X.nodes('/div/section/h4') as H4(X)
 cross apply H4.X.nodes('(let $h4 := . (: Save current h4 node :)
 return /div/section/ul[$h4 << .])[1]') as UL(X);

rextester:

<< and >> are called Node Order Comparison Operators

If you have an XML fragment like this:

<N1>1</N1>
<N2>2</N2>
<N3>3</N3>
<N4>4</N4>
<N5>5</N5>

you can get all nodes before the first occurrence of N3 with this query:

select @X.query('/*[. << /N3[1]]');

Result:

<N1>1</N1>
<N2>2</N2>

/* will give you all root nodes. What is enclosed in [] is a predicate. . is the current node and /N3[1] is the first N3 node in document order at the root level. So from each root node you get the nodes that precede N3.

Here is almost the same query, only you get the nodes that follow the first N3 node:

select @X.query('/*[. >> /N3[1]]');
<N4>4</N4>
<N5>5</N5>

To only get the first node after the first N3 node, you add the predicate [1]:

select @X.query('/*[. >> /N3[1]][1]');
<N4>4</N4>
Andriy M
23.3k6 gold badges60 silver badges104 bronze badges
answered Jan 29, 2017 at 9:12
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.