1

I am trying to generate some json data from txt files.

The txt files are generated from books, using their ocr, which makes them inestimable (i can't randomly change the chars i don't like, since they could be important) and unreliable (the ocr could have gone wrong, the author could have inserted symbols that would mess with my code).

As of now, i have this :

output_folder = Path(output_folder)
 
value = json.loads('{"nome": "' + file_name[:len(file_name)-4] + '", "testu": "' + (Path(filename).read_text()) + '"}')
 path = output_folder / (file_name[:len(file_name)-4] + "_opare.json")
 with path.open(mode="w+") as working_file:
 working_file.write("[" + str(value) + "]")
 working_file.close()

This throws me the error json.decoder.JSONDecodeError: Invalid control character which i understood is caused by my book starting (yes) with a ' (a quote).

I've read about string literals, that seem to be relevant for my case, but i didn't uderstood how i could use them.

What can i do ?

Thanks

asked Mar 31, 2021 at 22:13
3
  • the probably worst thing would be reading word by word and using try except and pretty much throwin out those words that were excepted but that would certainly work I think Commented Mar 31, 2021 at 22:29
  • 1
    There are a lot of things that may or may not be a problem here. It would help if we could be sure of the exact content of your source file (at a binary level, not just what you think the text is). It's important to make sure you know the encoding of the file. That said, you should not try to build JSON data this way (work the other way around, as in @LuizFerraz's answer). As for "string literals", I think you are confused as to what that means. All a string literal is, is a string that appears "literal"ly in your code. For example, '{"nome": "', or "[". Commented Mar 31, 2021 at 22:57
  • yeah, the difficulty here is that i don't know what the books will be (this is the batch processing phase). I followed @Luis Ferrza's answer, which is correct. For the literals, i hoped they would help me escape all problematic chars, but json.dump() already does it Commented Apr 1, 2021 at 6:10

2 Answers 2

3

Why would you make a json just to parse it again? You can just create a dictionary:

value = {
 "nome": file_name[:len(file_name)-4],
 "testu":Path(filename).read_text(),
}
Pedro Lobito
99.8k36 gold badges274 silver badges278 bronze badges
answered Mar 31, 2021 at 22:29
Sign up to request clarification or add additional context in comments.

1 Comment

you are absolutely right, why did i do that ? it's not exactly the way i was going, but you are 100% correct. thank you
0

Reading between the lines, the JSONDecodeError doesn't actually come from this code, does it? It comes from the code that's reading your file later.

You can't write a dict to a JSON file using str(value). Python's dict-to-string conversion uses single quotes, which is not legal in JSON. You need to convert it back to JSON:

 with path.open(mode="w+") as working_file:
 json.dump( [value], working_file )
answered Mar 31, 2021 at 22:28

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.