Cha 2 -编写你的第一个网络爬虫.ipynb #2

Open

Description

opened

on Jan 31, 2019

当我试图运行下面这段编码的时候,编译器报错。在想是否因为我在用的是英文系统,无法encode中文。请教
import requests
from bs4 import BeautifulSoup #从bs4这个库中导入BeautifulSoup

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "html.parser") #使用BeautifulSoup解析这段代码
title = soup.find("h1", class_="post-title").a.text.strip()
print (title)

with open('title_test.txt', "a+") as f:
f.write(title)
f.close()

===================================================================
4.3 通过selenium 模拟浏览器抓取

UnicodeEncodeError Traceback (most recent call last)
in
11
12 with open('title_test.txt', "a+") as f:
---> 13 f.write(title)
14 f.close()

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 4-5: character maps to

Metadata

Assignees

No one assigned

Labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Description

===================================================================
4.3 通过selenium 模拟浏览器抓取

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Description

=================================================================== 4.3 通过selenium 模拟浏览器抓取

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

===================================================================
4.3 通过selenium 模拟浏览器抓取