1. 首页
  2. 主题
  3. Go问与答

【求助】Go爬虫无法获取北邮人论坛cookie的疑惑

freezer-glp · · 1506 次点击
#### 背景 这两天尝试写个Go爬虫爬北邮人论坛,期望能登录后保存cookie,后续的访问都带着这个cookie。查看资料推荐用`net/http/cookiejar`。 目前能登录成功,获取成功登录Json。但是发现并未获取登录后的cookie,导致后续直接Get帖子正文报错**"您未登录,请登录后继续操作"** 请教各位大神,这种情况哪里出错了? #### 实现 ``` package main import ( "net/http/cookiejar" "net/url" "strings" "fmt" "net/http" "crypto/tls" "io/ioutil" ) func main() { // init cookiejar var cookieJar *cookiejar.Jar cookieJar, _ = cookiejar.New(nil) // init client with cookiejar httpClient := &http.Client{ Jar: cookieJar, } // login param postValues := url.Values{} postValues.Set("id", "ID") postValues.Set("passwd", "PWD") postValues.Set("s-mode", "0") postValues.Set("CookieDate", "3") // request for login httpReq, _ := http.NewRequest("POST", "https://bbs.byr.cn/user/ajax_login.json", strings.NewReader(postValues.Encode())) httpReq.Header.Set("Content-Type", "application/x-www-form-urlencoded; param=value") httpReq.Header.Add("X-Requested-With", "XMLHttpRequest") httpReq.Header.Add("Connection", "keep-alive") httpReq.Header.Add("User-Agent", "Mozilla/5.0") httpReq.Header.Add("Referer", "https://bbs.byr.cn") httpReq.Header.Add("Accept", "application/json, text/javascript, */*; q=0.01") httpReq.Header.Add("authority", "bbs.byr.cn") // for nginx/1.10 httpClient.Transport = &http.Transport{ TLSNextProto: make(map[string]func(authority string, c *tls.Conn) http.RoundTripper), } // login httpResp, _ := httpClient.Do(httpReq) fmt.Printf("req cookies: %s \n", httpReq.Cookies()) fmt.Printf("resp cookies: %s \n", httpResp.Cookies()) // request to get article content httpReq1, _ := http.NewRequest("GET", "https://bbs.byr.cn/article/Golang/842", nil) httpReq1.Header.Add("X-Requested-With", "XMLHttpRequest") httpResp1, _ := httpClient.Do(httpReq1) body, _ := ioutil.ReadAll(httpResp1.Body) fmt.Println(string(body)) } ``` 输出(可见cookie为空): ``` req cookies: [] resp cookies: [] (...省略...) <h5>产生错误的可能原因:</h5><ul><li><samp class="ico-pos-dot"></samp>您未登录,请登录后继续操作</li> (...省略...) ``` **困扰多时,求各位指点**
手动捕获Cookies再Add进去
#2
更多评论
试试其他网站可以吗?
#1
貌似go解析Set-Cookie时认为 [ 是无效的字符,所以httpResp.Cookies()返回空,下面是把httpResp整个打印出来获取的Set-Cookie header: &#34;Set-Cookie&#34;:[]string{&#34;nforum[UTMPUSERID]=guest; path=/; domain=bbs.byr.cn&#34;, &#34;nforum[UTMPKEY]=21970208; path=/; domain=bbs.byr.cn&#34;, &#34;nforum[UTMPNUM]=29282; path=/; domain=bbs.byr.cn&#34;, &#34;nforum[UTMPUSERID]=guest; path=/; domain=bbs.byr.cn&#34;, &#34;nforum[UTMPKEY]=21970208; path=/; domain=bbs.byr.cn&#34;, &#34;nforum[UTMPNUM]=29282; path=/; domain=bbs.byr.cn&#34;} 下面是go解析cookie时依照的RFC标准: &lt;http://tools.ietf.org/html/rfc6265 &gt; cookie-pair = cookie-name &#34;=&#34; cookie-value cookie-name = token cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E ; US-ASCII characters excluding CTLs, ; whitespace DQUOTE, comma, semicolon, ; and backslash token = 1*&lt;any CHAR except CTLs or separators&gt; separators = &#34;(&#34; | &#34;)&#34; | &#34;&lt;&#34; | &#34;&gt;&#34; | &#34;@&#34; | &#34;,&#34; | &#34;;&#34; | &#34;:&#34; | &#34;\&#34; | &lt;&#34;&gt; | &#34;/&#34; | &#34;[&#34; | &#34;]&#34; | &#34;?&#34; | &#34;=&#34; | &#34;{&#34; | &#34;}&#34; | SP | HT
#3

用户登录

没有账号?注册

今日阅读排行

    加载中

一周阅读排行

    加载中