Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

utf8 encoding issues in python windows

I am processing a file on windows OS in Python. I am getting errors like Unicode error surrogates not allowed.

Sample Text from Document:

Ten states led by Texas Attorney General Ken Paxton (R) filed an antitrust lawsuit against 
Google on Wednesday, alleging the tech giant illegally sought to suppress competition and 
reap massive profits from targeted advertisements placed across the Web.
The lawsuit — filed in a Texas federal court and backed exclusively by Republicans — strikes 
at the heart of Google’s lucrative business in connecting those who seek to buy online ads 
with the websites that sell them. Paxton and his GOP allies contend that Google relied on a 
mix 
of improper tactics to force its ad tools on publishers and solidify its pole position as a 
"middleman" in the invisible transactions that power much of the Web.
Online advertising is expected to generate 42ドル billion in revenue this year for Google, 
which captures a third of all digital ad spending, according to an October projection from 

the firm eMarketer. Google’s vast reach led Texas and other state attorneys general to c onclude in their lawsuit that the tech giant essentially had built the "largest electronic trading market in existence," operating ad systems that are not unlike trades on a stock exchange.

Code1:

return_doc.to_csv(path, index= False)

Error1: UnicodeEncodeError: 'utf-8' codec can't encode character '\udc9d' in position 168: surrogates not allowed

Code2:

return_doc.to_csv(path, index= False, encoding='cp1252')

Error2: UnicodeEncodeError: 'charmap' codec can't encode character '\udc9d' in position 168: character maps to

Code3:

return_doc.to_csv(path, index= False, encoding='ISO 8859-15')

Error3: UnicodeEncodeError: 'charmap' codec can't encode character '\u201d' in position 14: character maps to

I have used Code4:

return_doc.to_csv(path, index= False, encoding='cp1252', errors='replace)

The text from

"The actions harm every person in America," Paxton said in a video statement preceding the 
case, which asked a judge to consider "structural" remedies that could theoretically include 
forcing a breakup of the company.

converted into

“The actions harm every person in America,ï¿1⁄2? Paxton said in a video statement preceding 
the case, which asked a judge to consider “structuralï¿1⁄2? remedies that could 
theoretically include forcing a breakup of the company.

Which I don't want to happen.

PLease suggest me a solution where I don't get any error and don't get text changed.

Answer*

Draft saved
Draft discarded
Cancel
3
  • That is the default. I'm puzzled that you had to do that: it should not be necessary, unless some code that you ran changed it. Commented Feb 26, 2021 at 12:14
  • I think on windows, it is not using utf-8 as we know windows have its own encoding. Commented Mar 1, 2021 at 5:28
  • Not so. Python's standard encoding for stdin and ` stdout` on Windows is UTF-8. And the Windows-1252 encoding would not give the error surrogates not allowed. Commented Mar 1, 2021 at 7:40

lang-py

AltStyle によって変換されたページ (->オリジナル) /