Python: Can dumpdata cannot loaddata back. UnicodeDecodeError

Question 1

I have been using Python 2.7, Django 1.5 and PostgreSQL 9.2 for two weeks. Never saw it before. Everything is freshly installed on my Windows 7 machine, so it should have default settings. Django beautifully generates tables in my db. Looks like everything works fine. I am able to dump data from my database by running:

manage.py dumpdata > test.json

or

manage.py dumpdata --indent4 > test.json

I saw that the JSON file it looks as it should.

Then, I truncate some tables and try to load them from the JSON file with:

python manage.py loaddata database = T2 test.json // or without db name

I got the following error:

"UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte"

If I open the test.json file in notepad, save it as utf8 and try again, then I get:

"No JSON object could be decoded"

The file still looks OK, not empty.

By the way, when I open the JSON file with notepad it offers me to save it as Unicode. My database has UTF8 encoding. Please advise. Thank you.

Question 2

Do not use Notepad to modify the code

Question 3

show print(repr(open('test.json', 'rb').read(4)))

Question 4

What worked for me is following these steps:

- Open the file in regular notepad
- Select save as
- Select encoding "UTF-8" (Not "UTF-8 (With BOM)")
- Save the file.

Now you can use loaddata.

However, this only works for files that are small enough for notepad to open.

Question 5

achieved in notepad++ by setting utf-8 via Encoding -> UTF-8, then saving

Question 6

Works in VSCode too

Question 7

0xff in position 0 looks like the start of a little-endian UTF-16 byte order marker to me. Notepad's "Unicode" save mode is little-endian UTF-16, so that makes sense if you saved your json from Notepad after creating it. Notepad will keep the byte order marker even in utf-8, which could plausibly cause loaddata to fail to parse it.

If you don't have your un-edited json still handy, you'll need to remove the BOM - personally I'd use emacs, but another answer suggested this stand-alone Windows .exe:

http://www.bryntyounce.com/filebomdetector.htm

Question 8

Peter,Thank you for your reply. I cannot use emacs since I have Windows7. I did install utility you suggested and run it. Indeed it shows that all files but one doctored by Notepad are UTF-16. However after running the utility I still have the same "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte"

Question 9

Step 1: convert to UTF-8. Step 2: Remove the BOM.

Question 10

"I cannot use emacs since I have Windows7": Yes, you can. gnu.org/software/emacs/download.html

Question 11

After good research, I got the solution. In my case, datadump.json file was having the issue.

Simply Open the file in notepad format
Click on save as option
Go to encoding section below & Click on "UTF-8"
Save the file.

Now you can try running the command. You are good to go :)

For your reference, I have attached images below.

Notepad

Save as

UTF-8

Question 12

On windows, if you run your standard dumpdata command with -Xutf8 it has always solved this problem for me:

python -Xutf8 manage.py dumpdata app.mymodel > app/fixtures/mymodel.json

Here is an article for reference: https://dev.to/methane/python-use-utf-8-mode-on-windows-212i

Question 13

I found one way to solve this issue by manually re-output a new binary json file with following code, rb stand for "read and binary", wb for "write and binary".

First, go to shell:

python manage.py shell

Second, rewrite the test.json to a binary file:

with open('path/to/test.json', 'rb') as f:
 data = f.read()
newdata = open('newfile.json', 'wb')
newdata.write(data)
newdata.close()
exit()

Then you can load the file:

python manage.py loaddata newfile.json

Above code works for me. Hope it can help you as well.

Question 14

i encountered the same problem when loading data. it has a problem with encodings. install notepad ++. and change the encoding format to UTF-8

in the lower right corner you can see the current encoding. if it is not UTF- 8, you can simply change it to UTF-8 form the encoding menu tab.

this solution worked for me.

orginal post

Question 15

If you are using newer versions of windows 10 you can use notepad to change the encoding from UTF-16 to UTF-8 simply by saving the file again and selecting the encoding option on the save dialog. See the example image below.

Question 16

Please can you link to the image

Question 17

Wondering why the Django manage.py dumpdata saves it in UTF-16 to begin with, anyone knows?

Ducktown 4614 silver badges5 bronze badges · Accepted Answer · 2020-01-22 10:05:37Z

36

What worked for me is following these steps:

- Open the file in regular notepad
- Select save as
- Select encoding "UTF-8" (Not "UTF-8 (With BOM)")
- Save the file.

Now you can use loaddata.

However, this only works for files that are small enough for notepad to open.

Share

Improve this answer

edited Jan 27, 2020 at 14:17

answered Jan 22, 2020 at 10:05

Ducktown's user avatar

Ducktown

4614 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

andyw

andyw Over a year ago

achieved in notepad++ by setting utf-8 via Encoding -> UTF-8, then saving

2021年08月06日T05:50:11.51Z+00:00

Psddp

Psddp Over a year ago

Works in VSCode too

2023年12月31日T20:34:08.113Z+00:00

CollectivesTM on Stack Overflow

Python: Can dumpdata cannot loaddata back. UnicodeDecodeError

7 Answers 7

2 Comments

3 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

7 Answers 7

2 Comments

3 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related