I encountered an issue where the quotes in the JSON output from a MySQL query are not being escaped. I executed the following command to extract data from a WordPress database and save it into a file:
echo "SELECT JSON_OBJECT('id', ID, 'title', post_title, 'body', post_content) FROM wp_posts LIMIT 1;" | mysql -u root -p blog_gslin_org > blog.output
After entering the password and checking the content of blog.output
, I noticed that the quotes inside the JSON values are not escaped, which could potentially lead to parsing errors when the JSON is consumed by other applications. Here's a snippet of the output:
{"id": 2, "title": "關於我 (about me)", "body": "這個 Blog 主要是偏技術方面的資訊 (以及各種雜七雜八的抱怨文章),另外有幾個 Blog 是其他方面的:\\r\\n\\r\\n<ul>\\r\\n\\t<li><a href=\\"http://blog.gslin.info/\\" rel=\\"tag\\">blog.gslin.info</a>:<del datetime=\\"2007年07月14日T09:49:18+00:00\\">跟課業有關的 (包括實驗室研究的東西)。</del>改放 ACG 相關的資訊。</li>\\r\\n\\t<li><a href=\\"http://blog.gslin.net/\\" rel=\\"tag\\">blog.gslin.net</a>:跟網路有關的。</li>\\r\\n</ul>\\r\\n\\r\\n除了 Blog 外,你可以在這些地方找到我:\\r\\n\\r\\n<ul>\\r\\n\\t<li><a href=\\"https://abpe.org/@gslin\\">Mastodon</a></li>\\r\\n\\t<li><a href=\\"https://twitter.com/gslin\\">Twitter</a></li>\\r\\n\\t<li><a href=\\"https://www.facebook.com/gslin\\">Facebook</a></li>\\r\\n\\t<li><a href=\\"https://www.plurk.com/gslin\\">Plurk</a></li>\\r\\n\\t<li><a href=\\"https://www.instagram.com/gslin\\">Instagram</a></li>\\r\\n\\t<li><a href=\\"https://www.flickr.com/photos/gslin\\">Flickr</a></li>\\r\\n\\t<li><a href=\\"https://www.linkedin.com/in/gslin/\\">Linkedin</a></li>\\r\\n</ul>\\r\\n\\r\\n關於我的連絡的方法:gslin at gslin.com (主要)、darkkiller at gmail.com (也是要)、gslin at gslin.org (備用)。"}
As you can see, the URLs within the body field include unescaped quotes. I'm curious if there's a way to ensure that the output from MySQL correctly escapes these characters to produce valid JSON that can be safely parsed by any JSON parser.
Is there a specific flag or option I should use with my MySQL query, or is there a recommended approach to handling this type of data extraction and conversion to JSON?
Edit: a minimal example as requested:
$ mysql --version
mysql Ver 8.0.35-27 for Linux on x86_64 (Percona Server (GPL), Release '27', Revision '2f8eeab2'$)
$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 8.0.35-27 Percona Server (GPL), Release '27', Revision '2f8eeab2'$
Copyright (c) 2009-2023 Percona LLC and/or its affiliates
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.35-27 |
+-----------+
1 row in set (0.00 sec)
Create a table and add data:
$ mysql -u root -p test
mysql> CREATE TABLE post (id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, content LONGTEXT NOT NULL);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO post (content) VALUES ('<a href="https://www.example.com/">www.example.com</a>');
Query OK, 1 row affected (0.00 sec)
Then check it:
mysql> SELECT * FROM post;
+----+--------------------------------------------------------+
| id | content |
+----+--------------------------------------------------------+
| 1 | <a href="https://www.example.com/">www.example.com</a> |
+----+--------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT JSON_OBJECT("content", content) FROM post;
+-------------------------------------------------------------------------+
| JSON_OBJECT("content", content) |
+-------------------------------------------------------------------------+
| {"content": "<a href=\"https://www.example.com/\">www.example.com</a>"} |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)
Then, in shell (tail -n +2
here is to skip the first line):
$ echo "SELECT JSON_OBJECT('content', content) FROM post;" | mysql -u root -p test | tail -n +2
Enter password:
{"content": "<a href=\\"https://www.example.com/\\">www.example.com</a>"}
Validate via jq
:
$ echo "SELECT JSON_OBJECT('content', content) FROM post;" | mysql -u root -p test | tail -n +2 | jq .
Enter password:
parse error: Invalid numeric literal at line 1, column 30
-
I see all the double-quote characters in the example you show are escaped, unless they are JSON string delimiters. Can you please be more specific about what you want the output should be? It would also help to make a minimum reproducible example so we don't have to search in such a long string for the characters that might need escaping.Bill Karwin– Bill Karwin2024年02月17日 21:27:11 +00:00Commented Feb 17, 2024 at 21:27
-
I noticed that the quotes inside the JSON values are not escaped, which could potentially lead to parsing errors when the JSON is consumed by other applications. Here's a snippet of the output dbfiddle.uk/GwewxDfR JSON_VALID tells that provided JSON is valid, i.e. all dquote chars are escaped correctly. You forget about double-escaping.Akina– Akina2024年02月17日 22:34:46 +00:00Commented Feb 17, 2024 at 22:34
-
@BillKarwin thanks, I have edited to create a minimal example.Gea-Suan Lin– Gea-Suan Lin2024年02月18日 03:33:26 +00:00Commented Feb 18, 2024 at 3:33
1 Answer 1
So the problem is that the quotes are double-escaped.
{"content": "<a href=\\"https:...
^^ two backslashes
The output of JSON_OBJECT() contains a single backslash before literal quote characters. Then the output of the mysql
client adds another backslash in front of the sole backslash.
You can control this with the --raw
(or -r
) option to the mysql client.
https://dev.mysql.com/doc/refman/8.0/en/mysql-command-options.html#option_mysql_raw
For nontabular output (such as is produced in batch mode or when the
--batch
or--silent
option is given), special characters are escaped in the output so they can be identified easily. Newline, tab, NUL, and backslash are written as\n
,\t
,0円
, and\\
. The--raw
option disables this character escaping.
First I test without --raw
which results in the same error you got:
$ echo "SELECT JSON_UNQUOTE(JSON_OBJECT('content', content)) FROM post;" | mysql -b -N test | jq .
jq: parse error: Invalid numeric literal at line 1, column 30
Then I add the --raw
option (I used -r
for short):
$ echo "SELECT JSON_UNQUOTE(JSON_OBJECT('content', content)) FROM post;" | mysql -b -N -r test | jq .
{
"content": "<a href=\"https://www.example.com/\">www.example.com</a>"
}