2

i have problem encoding this character with json_encode

http://www.fileformat.info/info/unicode/char/92/index.htm

first it give me this error JSON_ERROR_UTF8 which is

'Malformed UTF-8 characters, possibly incorrectly encoded'

so tried this function utf8_encode() before json_encode

now return this result '\u0092'

so i found this one

 function jsonRemoveUnicodeSequences($struct) {
 return preg_replace("/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U1ドル')))", json_encode($struct));
 }

the character show up but with other one

Â’

also tried htmlentities then html_entity_decode

with no result

asked May 12, 2015 at 16:25
6
  • 1
    What is your input encoding? Can you convert to utf8 before json_encode? Commented May 12, 2015 at 16:26
  • @Halcyon my input is object i use this function for utf8 encode function utf8ize($mixed) { if (is_array($mixed) ) { foreach ($mixed as $key => $value) { $mixed[$key] = utf8ize($value); } } else if (is_object($mixed)) { foreach ($mixed as $key => $value) { $mixed->$key = utf8ize($value); } } else if (is_string ($mixed)) { return utf8_encode($mixed); } return $mixed; } Commented May 12, 2015 at 16:29
  • 2
    why not simply json_encode(iconv('UCS-4LE','UTF-8', $text))? Commented May 12, 2015 at 17:52
  • it's create error 'Detected an incomplete multibyte character in input string ' which lead me to this article stackoverflow.com/questions/26092388/… which has function that i have been looking for Commented May 13, 2015 at 1:03
  • if found helpful function here stackoverflow.com/a/29667430/3479609 Commented May 13, 2015 at 1:07

2 Answers 2

2

json_encode() requires input that is

  • null
  • integer, float, boolean
  • string encoded as UTF-8
  • objects implementing JsonSerializable (or whatever it's called, I'm too lazy to look it up)
  • arrays of JSON-encodable objects
  • stdClass instances of JSON-encodable objects

So, if you have a string, you must first transcode it to UTF-8. The correct tool for that is the iconv library, but you need to know which encoding the string currently has in order to correctly transcode it.

Your approach to recursively transcode arrays or objects should work, but I'd strongly suggest not using anything but UTF-8 internally. If you have an interface where you have to accept different encodings, validate and reject immediately and use UTF-8 onwards. Similarly, when replying, keep UTF-8 until the last possible point where you can still signal encoding problems.

answered May 12, 2015 at 18:09
Sign up to request clarification or add additional context in comments.

2 Comments

the weird problem it's stored in database as utf8_general_ci
I have UTF-8 stored in a MariaDB with utf8 collation, too. This works and you don't have to do anything to make it work for JSON either. There must be something else that you are doing. Sit down and create a minimal example, starting with creating the DB table and finally writing the content to it. Anything else is just guessing. Also, your question is not completely clear as to what you see (quote the exact content, don't paraprase!) and what you expected to see instead.
0

If you look at the link you included to the character U+0092, it is a control character, and it is also known as PRIVATE USE TWO. Its existence in your string means that your string is almost certainly not a UTF-8 string. Instead, it is probably a Windows-specific encoding, likely Windows-1252 if your text is English, in which 0x92 is a "smart quote" apostrophe, also known as a right single quotation mark. The Unicode equivalent of this character is U+2019.

Thus your data source is not giving you UTF-8 text. Either you can fix the source data to be UTF-8 encoded, or you can convert the text you receive. For example, the output of

echo iconv('Windows-1252','UTF-8', "\x92")

is

which is probably what you want. However, you want to make sure that all of your input is the same encoding. If some of your data is UTF-8 and some is Windows-1252, the above iconv call will properly handle Windows-1252 encoded apostrophes, but it will convert UTF-8 encoded apostrophes to

â€TM
answered May 13, 2015 at 18:01

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.