I am using a variety of tools to regularly prepare data for the web. One stage requires me to open a CSV in Excel, make changes and save the file.
Is there a way to force Excel to accept UTF-8 encoding, and to save its files with that encoding?
-
2To clarify, I would like to change a setting so that Excel assumes an input file is UTF-8 by default. As mentioned below, LibreOffice/OpenOffice does this.Dizzley– Dizzley2015年10月30日 12:02:17 +00:00Commented Oct 30, 2015 at 12:02
-
If you don't actually have Unicode characters to deal with, I'd consider using a middleware to convert all CSVs to ASCII and then open in Excel.Vassile– Vassile2016年02月28日 07:54:12 +00:00Commented Feb 28, 2016 at 7:54
-
1Related question here but not exactly the same.hhh– hhh2017年01月27日 08:09:50 +00:00Commented Jan 27, 2017 at 8:09
8 Answers 8
I answered a similar question at Default character encoding for Excel Text Wizard?.
I found my answer at Changing default text import origin type in Excel.
- Close Excel, if it is open.
- Open the Registry Editor.
- Navigate to HKEY_CURRENT_USER → Software → Microsoft → Office → ▒▒ → Excel → Options, where ▒▒ is your version of Office, mostly likely the largest number you see there.
- Right-click an empty space on the right side and select New → DWORD.
- Name the item DefaultCPG, and press Enter to save.
- Right-click on DefaultCPG and select Modify.
- Set the Base to Decimal.
- For Value data, enter 65001 to set your default to UTF-8. For some other encoding, use the code page identifier, which you can find in the Text Import Wizard in Excel or in this list.
- Click OK.
Like Vasille says in the comment to this question, if your file is not actually in UTF-8 format, you may technically want to convert the characters within the file to the encoding you want before opening in Excel. For my purposes, though, UTF-8 does a good enough job of displaying non-corrupted characters.
Not working? Make sure you set Base to Decimal (Step 7).
Screenshot of Edit DWORD 32-bit value showing setting base to decimal
-
9Are there any reasons not to do this? Why is this not the default behavior of Excel? --- My team often deals with datasets containing foreign characters, and I'm wondering if there is any reason to not perform this action on each of our machines.user1318135– user13181352017年08月23日 20:43:27 +00:00Commented Aug 23, 2017 at 20:43
-
7This unfortunately did not work on my machine. Files still open with default encoding (Win 7, Office 2016).user1318135– user13181352017年08月23日 20:46:24 +00:00Commented Aug 23, 2017 at 20:46
-
10Doesn't work on mine also, Win 10 Office 2016Phuah Yee Keat– Phuah Yee Keat2018年06月21日 03:46:11 +00:00Commented Jun 21, 2018 at 3:46
-
5No dice. Excel behaves as stupidly as ever. One would expect an obvious feature like this to be accessible via a user preference in software as mature as Excel.Aaron Bramson– Aaron Bramson2018年09月18日 09:22:26 +00:00Commented Sep 18, 2018 at 9:22
-
1@Robert H. Is there a way to achieve this on a Mac?alex– alex2021年04月29日 12:36:48 +00:00Commented Apr 29, 2021 at 12:36
It seems that Byte Order Mark is required by Microsoft Office software.
Using Notepad++, convert the CSV using menu: Encoding -> Convert to UTF8-BOM.
Using the
sedUnix utility, available in cmder or Git for Windows. See Adding BOM to UTF-8 files.sed -i '1s/^/\xef\xbb\xbf/' file.csv
- Tested with Excel 2016
- This is what avwtp meant in their answer.
-
Setting the BOM in Notepad++ works perfectly fine. Is there a command-line solution for Windows, like at CMD-Prompt or Powershell?snahl– snahl2021年07月12日 20:14:10 +00:00Commented Jul 12, 2021 at 20:14
-
An answer for Python: f = open(out_filename, "w") #set the BOM to UTF-8. Required for importing csv to Excel f.write('\ufeff') # Close the file f.close()snahl– snahl2021年07月13日 14:05:27 +00:00Commented Jul 13, 2021 at 14:05
-
Setting the BOM was the fastest and easiest. ThanksOrsu– Orsu2021年11月17日 14:06:24 +00:00Commented Nov 17, 2021 at 14:06
One easy way to change excel ANSI encoding to UTF-8 is the open the .csv file in notepad then select File > Save As. Now at the bottom you will see encoding is set to ANSI change it to UTF-8 and save the file as a new file and then you're done.
-
I will experiment with this. One problem I have is opening files consistently in the correct encoding so I do not get data in two encodings in a single results file. Then it becomes hard to fix. I really wish Excel allowed me a better way to choose input file encoding.Dizzley– Dizzley2015年05月06日 14:59:33 +00:00Commented May 6, 2015 at 14:59
I solved similar problem before. Unsuccessfully, but you can use LibreOffice, which is UTF-8 in default.
-
LibreOffice is very handy for this. However, sometimes I have VBA macros to run. Thanks LluserDizzley– Dizzley2015年05月06日 14:56:56 +00:00Commented May 6, 2015 at 14:56
You need to use the File > Import option and start with blank document and specify UTF-8
but this is far from optimal to make this a default setting for all files, anyway it is unnecessary to rotate the files around Google Drive or LibreOffice. The defaults are just badly chosen and the disability to change that is irritating.
Related
How to set character encoding when opening Excel but does not find the option to change the defaults such that all files are automatically opened with UTF-8 instead of Macintosh format in OSX
How to change default file option to UTF-8 away from Macintosh file option in Apple.SE.
There's an Excel addin available here to work with Unicode CSV files that should help you.
Here's the developer Jaimon Mathew's note:
Excel treats .csv files as text files and will replace all Unicode characters with "?" when saved normally. If you want to preserve the Unicode characters, you would need to save the file as "Unicode text (*.txt)", which is a Tab delimited file. Since I couldn’t find any existing solution to preserve Unicode characters in CSV format, I thought I’ll give it a go in creating an Excel Addin to do just that.
1
It's not the best solution, but it's an option: upload your Excel file to Google drive, open it with Google Tabs and download as a csv file. It worked for me.
-
5This does not answer the question how to change the encoding of Excel document.Ramhound– Ramhound2015年10月02日 12:01:16 +00:00Commented Oct 2, 2015 at 12:01
-
Of course, when you open the CSV in Excel you lose the cell formatting so some numeric ID strings get converted to integers.Dizzley– Dizzley2015年10月30日 12:00:49 +00:00Commented Oct 30, 2015 at 12:00
Assuming you have a unix shell available (Cygwin), I append to the front of the csv file a little 3 char unicode ID (UTF-8) and then it reads fine into Excel and more importantly it is properly processed as input using VB Excel automation.
UTF-8.txt
EFBBBF << hex edit into a 3 char file.
cat UTF-8.txt file-to-use.csv> new_file.csv
-
1(1) Can you describe how/why this works? (2) Can you describe exactly how to create this
UTF-8.txtfile?Scott - Слава Україні– Scott - Слава Україні2019年04月30日 00:41:55 +00:00Commented Apr 30, 2019 at 0:41 -
I suggest using a hex editor and typing in the 3 character sequence. Why it works, no idea. I figured out this solution by taking an existing UTF-8 csv file, renaming it to txt, importing it into Excel (like suggested in other threads), saving it as CSV with the Unicode plugin (other thread) and then doing a binary diff of the two files. The only difference was the 3 character starting sequence.avwtp– avwtp2019年04月30日 19:08:13 +00:00Commented Apr 30, 2019 at 19:08
-
See this thread why it works [link] (stackoverflow.com/questions/6002256/…)avwtp– avwtp2019年04月30日 19:22:06 +00:00Commented Apr 30, 2019 at 19:22
-
You must log in to answer this question.
Explore related questions
See similar questions with these tags.