homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Exception raised when an NNTP overview field is absent
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jelie, pitrou, r.david.murray, rhettinger
Priority: normal Keywords: patch

Created on 2010年11月01日 19:58 by jelie, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
nntpover.patch pitrou, 2010年11月02日 22:52
nntpover2.patch pitrou, 2010年11月02日 23:19
Messages (12)
msg120158 - (view) Author: Julien ÉLIE (jelie) Date: 2010年11月01日 19:58
Following the first example of the documentation:
import nntplib
s = nntplib.NNTP('news.trigofacile.com')
resp, count, first, last, name = s.group('fr.comp.lang.python')
print('Group', name, 'has', count, 'articles, range', first, 'to', last)
resp, overviews = s.over((last - 9, last))
for id, over in overviews:
 print(id, nntplib.decode_header(over['subject']))
s.quit()
An exception is raised:
"OVER/XOVER response doesn't include names of additional headers"
I believe the issue comes from the fact that the source code does not
handle the case described in Section 8.3.2 of RFC 3977:
 For all fields, the value is processed by first removing all CRLF
 pairs (that is, undoing any folding and removing the terminating
 CRLF) and then replacing each TAB with a single space. If there is
 no such header in the article, no such metadata item, or no header or
 item stored in the database for that article, the corresponding field
 MUST be empty.
 Example of a successful retrieval of overview information for a range
 of articles:
 [C] GROUP misc.test
 [S] 211 1234 3000234 3002322 misc.test
 [C] OVER 3000234-3000240
 [S] 224 Overview information follows
 [S] 3000234|I am just a test article|"Demo User"
 <nobody@example.com>|6 Oct 1998 04:38:40 -0500|
 <45223423@example.com>|<45454@example.net>|1234|
 17|Xref: news.example.com misc.test:3000363
 [S] 3000235|Another test article|nobody@nowhere.to
 (Demo User)|6 Oct 1998 04:38:45 -0500|<45223425@to.to>||
 4818|37||Distribution: fi
 [S] 3000238|Re: I am just a test article|somebody@elsewhere.to|
 7 Oct 1998 11:38:40 +1200|<kfwer3v@elsewhere.to>|
 <45223423@to.to>|9234|51
 [S] .
 Note the missing "References" and Xref headers in the second line,
 the missing trailing fields in the first and last lines, and that
 there are only results for those articles that still exist.
Also please note that nntplib should also work in case the database is not consistent. Some news servers might be broken and do not follow the MUST NOT...
 The LIST OVERVIEW.FMT command SHOULD list all the fields for which
 the database is consistent at that moment. It MAY omit such fields
 (for example, if it is not known whether the database is consistent
 or inconsistent). It MUST NOT include fields for which the database
 is inconsistent or that are not stored in the database. Therefore,
 if a header appears in the LIST OVERVIEW.FMT output but not in the
 OVER output for a given article, that header does not appear in the
 article (similarly for metadata items).
msg120272 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月02日 22:41
I am wondering how to return the corresponding information. Should the field be totally absent from the returned dictionary, should it map to the empty string, or should it map to None?
I'm leaning towards the latter (map to None), but perhaps the empty string is better?
msg120275 - (view) Author: Julien ÉLIE (jelie) Date: 2010年11月02日 22:45
The empty string would mean the header exists, and is empty (though not RFC-compliant).
For instance:
"User-Agent: \r\n"
I believe None is better.
msg120278 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月02日 22:52
Here is a patch for returning None on absent fields.
(works with trigofacile.com)
msg120282 - (view) Author: Julien ÉLIE (jelie) Date: 2010年11月02日 23:05
OK, thanks.
By the way, why is the token stripped?
 token = token[len(h):].lstrip(" ")
"X-Header: test \r\n" in an header is kept in the overview as-is.
I do not see why " test " should not be the value returned.
Also, with:
 token = token or None
"X-Header: \r\n" becomes None if I understand how the source code works... Yet, it is a real '', not None.
msg120283 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月02日 23:08
> OK, thanks.
> By the way, why is the token stripped?
> token = token[len(h):].lstrip(" ")
> 
> "X-Header: test \r\n" in an header is kept in the overview as-is.
> I do not see why " test " should not be the value returned.
It's a simple way of handling "Xref: foo" and returning "foo" rather
than " foo". If spaces are supposed to be significant I can just strip
the first one, though.
> Also, with:
> token = token or None
> 
> "X-Header: \r\n" becomes None if I understand how the source code
> works... Yet, it is a real '', not None.
Er, so you're disagreeing with your previous message? Or am I missing
something? :)
msg120286 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月02日 23:18
Here is a patch trying to better handle whitespace. Would it be ok for you?
msg120287 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月02日 23:19
Oops, sorry.
msg120299 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年11月03日 02:00
My conclusion in working on the email package is that only the first space after the ':', if it exists, should be stripped. That is, even though the RFC (for email) reads as if the space after the colon is part of the value, in practice it is part of the delimiter, but is optional (and almost always present, in email). Whether additional leading spaces are significant depends on why they are there. Since they are an unusual case, I would choose to preserve them on the theory that someone might care, and that someone who doesn't care can strip them.
msg120332 - (view) Author: Julien ÉLIE (jelie) Date: 2010年11月03日 17:55
> Er, so you're disagreeing with your previous message?
> Or am I missing something? :)
I was saying that if an empty string is returned, then it means that the header exists and is empty. An example was "User-Agent: \r\n".
And my remark "I believe None is better." concerned your initial question "Should the field be totally absent [...]" regarding how to deal with a header that does not exist.
Therefore, "User-Agent: \r\n" becomes a real '', not None. None is only when the User-Agent: header field is absent from the headers.
> Here is a patch trying to better handle whitespace.
> Would it be ok for you?
Yes Antoine, thanks!
msg120333 - (view) Author: Julien ÉLIE (jelie) Date: 2010年11月03日 18:01
> My conclusion in working on the email package is that only
> the first space after the ':', if it exists, should be stripped.
> That is, even though the RFC (for email) reads as if the space
> after the colon is part of the value, in practice it is part
> of the delimiter, but is optional (and almost always present,
> in email).
That is why the RFC (for netnews) explicitly mentions that the space after the colon is not part of the value. See the grammar for OVER in RFC 3977:
 hdr-n-content = [(header-name ":" / metadata-name) SP hdr-content]
So yes, only the first space should be stripped.
msg120335 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月03日 18:19
Ok, committed in r86139.
History
Date User Action Args
2022年04月11日 14:57:08adminsetgithub: 54490
2010年11月03日 18:19:27pitrousetstatus: open -> closed
resolution: fixed
messages: + msg120335

stage: resolved
2010年11月03日 18:01:08jeliesetmessages: + msg120333
2010年11月03日 17:55:54jeliesetmessages: + msg120332
2010年11月03日 02:30:10rhettingersetmessages: - msg120300
2010年11月03日 02:13:20rhettingersetnosy: + rhettinger
messages: + msg120300
2010年11月03日 02:00:49r.david.murraysetnosy: + r.david.murray
messages: + msg120299
2010年11月02日 23:19:31pitrousetfiles: + nntpover2.patch

messages: + msg120287
2010年11月02日 23:19:20pitrousetfiles: - nntpover2.patch
2010年11月02日 23:18:46pitrousetfiles: + nntpover2.patch

messages: + msg120286
2010年11月02日 23:08:37pitrousetmessages: + msg120283
2010年11月02日 23:05:08jeliesetmessages: + msg120282
2010年11月02日 22:52:33pitrousetfiles: + nntpover.patch
keywords: + patch
messages: + msg120278
2010年11月02日 22:45:10jeliesetmessages: + msg120275
2010年11月02日 22:41:23pitrousetmessages: + msg120272
2010年11月01日 20:02:43pitrousetnosy: + pitrou
2010年11月01日 19:58:22jeliecreate

AltStyle によって変換されたページ (->オリジナル) /