Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit dfc144e

Browse files
committed
提交代码
1 parent a42c8cf commit dfc144e

File tree

2 files changed

+98
-0
lines changed

2 files changed

+98
-0
lines changed

‎xianhuan/README.md‎

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Python技术 公众号文章代码库
1010

1111
## 实例代码
1212

13+
[肝了三天三夜,一文道尽Python的xpath解析!](https://github.com/JustDoPython/python-examples/tree/master/xianhuan/xpath):肝了三天三夜,一文道尽Python的xpath解析!
14+
1315
[丢弃Tkinter,这款GUI神器值得拥有!](https://github.com/JustDoPython/python-examples/tree/master/xianhuan/gooey):丢弃Tkinter,这款GUI神器值得拥有!
1416

1517
[知乎热门:如何提高爬虫速度?](https://github.com/JustDoPython/python-examples/tree/master/xianhuan/spiderspeed):知乎热门:如何提高爬虫速度?

‎xianhuan/xpath/xpathstudy.py‎

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
"""
4+
@author: 闲欢
5+
"""
6+
from lxml import etree
7+
8+
text = '''
9+
<div>
10+
<ul id='ultest'>
11+
<li class="item-0"><a href="link1.html">first item</a></li>
12+
<li class="item-1"><a href="link2.html">second item</a></li>
13+
<li class="item-inactive"><a href="link3.html">third item</a></li>
14+
<li class="item-1"><a href="link4.html"><span>fourth item</span></a></li>
15+
<li class="item-0"><a href="link5.html">fifth item</a> # 注意,此处缺少一个 </li> 闭合标签
16+
</ul>
17+
</div>
18+
'''
19+
20+
# 调用HTML类进行初始化,这样就成功构造了一个XPath解析对象。
21+
page = etree.HTML(text)
22+
print(type(page))
23+
print(etree.tostring(page))
24+
25+
# nodename
26+
print(page.xpath("ul"))
27+
28+
# /
29+
print(page.xpath("/html"))
30+
31+
# //
32+
print(page.xpath("//li"))
33+
34+
# .
35+
ul = page.xpath("//ul")
36+
print(ul)
37+
print(ul[0].xpath("."))
38+
print(ul[0].xpath("./li"))
39+
40+
# ..
41+
print(ul[0].xpath(".."))
42+
43+
# @
44+
print(ul[0].xpath("@id"))
45+
46+
# 谓语
47+
# 第三个li标签
48+
print(page.xpath('//ul/li[3]'))
49+
# 最后一个li标签
50+
print(page.xpath('//ul/li[last()]'))
51+
# 倒数第二个li标签
52+
print(page.xpath('//ul/li[last()-1]'))
53+
# 序号小于3的li标签
54+
print(page.xpath('//ul/li[position()<3]'))
55+
# 有class属性的li标签
56+
print(page.xpath('//li[@class]'))
57+
# class属性为item-inactive的li标签
58+
print(page.xpath("//li[@class='item-inactive']"))
59+
60+
61+
# 获取文本
62+
# text()
63+
print(page.xpath('//ul/li/a/text()'))
64+
# string()
65+
print(page.xpath('string(//ul)'))
66+
67+
# 通配符
68+
print(page.xpath('//li/*'))
69+
print(page.xpath('//li/@*'))
70+
71+
# |
72+
print(page.xpath("//li|//a"))
73+
74+
# 函数
75+
# contains
76+
print(page.xpath("//*[contains(@class, 'item-inactive')]"))
77+
78+
# starts-with
79+
print(page.xpath("//*[starts-with(@class, 'item-inactive')]"))
80+
81+
82+
# 节点轴
83+
# ancestor轴
84+
print(page.xpath('//li[1]/ancestor::*'))
85+
# attribute轴
86+
print(page.xpath('//li[1]/attribute::*'))
87+
# child轴
88+
print(page.xpath('//li[1]/child::a[@href="link1.html"]'))
89+
# descendant轴
90+
print(page.xpath('//li[4]/descendant::span'))
91+
# following轴
92+
print(page.xpath('//li[4]/following::*[2]'))
93+
# following-sibling轴
94+
print(page.xpath('//li[4]/following-sibling::*'))
95+
96+

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /