Commit aca3b71

committed

add 数据提取基础知识

1 parent cfc6df8 commit aca3b71Copy full SHA for aca3b71

File tree

1 file changed

+78

-0

lines changed

数据提取概念和数据的分类.md

1 file changed

+78

-0

lines changed

`‎数据提取概念和数据的分类.md`

Lines changed: 78 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,78 @@`
	`1`	`+# 数据提取`
	`2`	`+## 介绍`
	`3`	`+> 用网络获取的数据中提取出想要的数据。`
	`4`	`+`
	`5`	`+## 概要`
	`6`	`+- 数据提取概念和数据的分类`
	`7`	+- 使用 `json` 模块提取数据
	`8`	`+- 使用正则表达式提取数据`
	`9`	+- 使用 `xpath` 提取数据
	`10`	+- 使用 `beautifulsoup` 提取数据
	`11`	+- `json`、`csv` 数据转换
	`12`	`+`
	`13`	`+## 数据提取概念和数据的分类`
	`14`	`+`
	`15`	`+### 什么是数据提取`
	`16`	`+> 简单的来说,数据提取就是从响应中获取我们想要的数据的过程`
	`17`	`+`
	`18`	`+### 数据的种类`
	`19`	`+`
	`20`	`+#### 构化数据`
	`21`	`+- 数据类型`
	`22`	`+`
	`23`	`+- json 格式数据`
	`24`	+```json
	`25`	`+{`
	`26`	`+ "name":"hello",`
	`27`	`+ "age":18,`
	`28`	`+ "parents":{`
	`29`	`+ "mother":"妈妈",`
	`30`	`+ "father":"爸爸"`
	`31`	`+ }`
	`32`	`+}`
	`33`	+```
	`34`	`+- xml 格式数据`
	`35`	+```xml
	`36`	`+<bookstore>`
	`37`	`+ <book category="COOKING">`
	`38`	`+ <title lang="en">Everyday Italian</title>`
	`39`	`+ <author>Giada De Laurentiis</author>`
	`40`	`+ <year>2005</year>`
	`41`	`+ <price>30.00</price>`
	`42`	`+ </book>`
	`43`	`+ <book category="CHILDREN">`
	`44`	`+ <title lang="en">Harry Potter</title>`
	`45`	`+ <author>J K. Rowling</author>`
	`46`	`+ <year>2005</year>`
	`47`	`+ <price>29.99</price>`
	`48`	`+ </book>`
	`49`	`+ <book category="WEB">`
	`50`	`+ <title lang="en">Learning XML</title>`
	`51`	`+ <author>Erik T. Ray</author>`
	`52`	`+ <year>2003</year>`
	`53`	`+ <price>39.95</price>`
	`54`	`+ </book>`
	`55`	`+</bookstore>`
	`56`	+```
	`57`	`+`
	`58`	`+- 处理方式`
	`59`	`+> 通过 json 模块等直接转成 Python 数据类型`
	`60`	`+`
	`61`	`+#### 非结构化数据`
	`62`	`+- 数据类型`
	`63`	`+ - html 格式数据`
	`64`	`+ - word 格式数据`
	`65`	`+ - 等`
	`66`	`+- 处理方式`
	`67`	+> 通过 `正则表达式` 、 `xpath` 、`beautifulsoup` 等模块提取数据
	`68`	`+`
	`69`	`+### 总结`
	`70`	`+`
	`71`	`+- 数据提取从网络获取数据中提取想要的数据`
	`72`	`+- 数据的种类`
	`73`	`+ - 结构化数据`
	`74`	`+ - json`
	`75`	`+ - xml`
	`76`	`+- 非结构化数据`
	`77`	`+ - html`
	`78`	`+ - word`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit aca3b71

File tree

1 file changed

1 file changed

`‎数据提取概念和数据的分类.md`

0 commit comments