At work they use a format similar to JSON but without quotes that looks like this:
{foo:{qux:1,quux:0}, bar:{}}
The reason they don't just use json is because the C# package Newtonsoft.Json
can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode
is not as forgiving.
So here's my attempt at a simple parser:
<?php
namespace Foo\Bar;
class NotJsonParser
{
const STEP_NAME = 0;
const STEP_VALUE = 1;
/**
* @param $string
* @return array
*/
public static function parseNotJSON($string)
{
$generator = self::stringIterator($string);
$data = self::parser($generator);
return $data[''];
}
/**
* @param \Generator $generator
* @return array
*/
private static function parser(\Generator $generator)
{
$data = [];
$step = self::STEP_NAME;
$name = '';
$value = '';
while ($generator->valid()) {
$i = $generator->current();
switch ($i) {
case ' ':
case "\n":
continue;
case '{':
$generator->next();
$value = self::parser($generator);
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case '}':
if ($name) {
$data[$name] = $value;
}
return $data;
case ',':
if ($name) {
$data[$name] = $value;
}
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case ':':
$step = self::STEP_VALUE;
break;
default:
if ($step === self::STEP_NAME) {
$name .= $i;
} else {
$value .= $i;
}
}
$generator->next();
}
return $data;
}
/**
* @param string $str
* @return \Generator
*/
private static function stringIterator($str)
{
for ($i = 0; $i < strlen($str); $i++) {
yield $str[$i];
}
}
}
And here's the usage:
>>> $result = \Foo\Bar\NotJsonParser::parseNotJSON("{foo:{qux:1,quux:0}, bar:{}}");
=> [
"foo" => [
"qux" => "1",
"quux" => "0",
],
"bar" => [],
]
How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: {foo bar: baz}
should be an error).
Also how should I encode it back from an array to a string? I was thinking just using json_encode
and then removing the quote characters.
1 Answer 1
The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.
For now, I'm using \w
to ensure that the space before bar
is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".
If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.
Code: (Demo)
$unquoted_json = <<<NOTJSON
{foo:{qux:1,quux:0}, bar:{}}
NOTJSON;
$quoted_json = preg_replace('~\w[^:{},]*~', '"0ドル"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "\n---\n";
echo json_encode($array);
Output:
array (
'foo' =>
array (
'qux' => '1',
'quux' => '0',
),
'bar' =>
array (
),
)
---
{"foo":{"qux":"1","quux":"0"},"bar":[]}
-
\$\begingroup\$ That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks. \$\endgroup\$solarc– solarc2018年06月03日 21:06:00 +00:00Commented Jun 3, 2018 at 21:06
-
\$\begingroup\$ The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (
{}:,
)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me. \$\endgroup\$mickmackusa– mickmackusa2018年06月06日 22:13:26 +00:00Commented Jun 6, 2018 at 22:13