homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv.Sniffer.snif doesn't set up the dialect properly for a csv created with dialect=csv.excel_tab and containing quote (") char
Type: behavior Stage:
Components: Versions: Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Antoon.Pardon, GhislainHivon, dmi.baranov
Priority: normal Keywords:

Created on 2013年04月24日 15:06 by GhislainHivon, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
csv_sniffing_excel_tab.py GhislainHivon, 2013年04月24日 15:06 Exemple of sniffing csv with dialect=csv.excel_tab and quote in data
Messages (3)
msg187709 - (view) Author: Ghislain Hivon (GhislainHivon) Date: 2013年04月24日 15:06
When sniffing the dialect of a file created with the csv module with dialect=csv.excel_tab and one of the row contain a quote ("), the delimiter is set to ' ' instead of '\t'.
msg214800 - (view) Author: Antoon Pardon (Antoon.Pardon) Date: 2014年03月25日 09:30
I had a look at this and have the following remarks.
1) the file csv_sniffing_excel_tab.py no longer works with python 3.3. It now produces the folowing traceback:
Traceback (most recent call last):
 File "csv_sniffing_excel_tab.py", line 36, in <module>
 create_file()
 File "csv_sniffing_excel_tab.py", line 23, in create_file
 writer.writerows(test_data)
TypeError: 'str' does not support the buffer interface
2) The problem seems to be in the _guess_quote_and_delimiter method. If you always call _guess_delimiter, the sniffer give the correct result.
3) As far as I understand the problem is the first regular expression:
(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)
Now if we have a line as the following
273:MVREGR1:ByEuPo:"Baryton ""Euphonium"" populaire"
The delim group will match the space, the space group will match nothing the quote group will match " the non-group pattern will match "Euphonium" followed by the quote group matching " again and the delim group matching the space.
And so we get the wrong delimiter.
msg215031 - (view) Author: Antoon Pardon (Antoon.Pardon) Date: 2014年03月28日 10:04
I included a patch (against 2.7) that seems to make the test work.
The patch prohibits the delim group to match a space.
History
Date User Action Args
2022年04月11日 14:57:44adminsetgithub: 62029
2014年03月28日 10:04:37Antoon.Pardonsetmessages: + msg215031
2014年03月25日 09:30:30Antoon.Pardonsetnosy: + Antoon.Pardon
messages: + msg214800
2013年04月30日 21:19:18dmi.baranovsetnosy: + dmi.baranov
2013年04月24日 15:06:00GhislainHivoncreate

AltStyle によって変換されたページ (->オリジナル) /