1
\$\begingroup\$

At work they use a format similar to JSON but without quotes that looks like this:

{foo:{qux:1,quux:0}, bar:{}}

The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.

So here's my attempt at a simple parser:

<?php
namespace Foo\Bar;
class NotJsonParser
{
 const STEP_NAME = 0;
 const STEP_VALUE = 1;
 /**
 * @param $string
 * @return array
 */
 public static function parseNotJSON($string)
 {
 $generator = self::stringIterator($string);
 $data = self::parser($generator);
 return $data[''];
 }
 /**
 * @param \Generator $generator
 * @return array
 */
 private static function parser(\Generator $generator)
 {
 $data = [];
 $step = self::STEP_NAME;
 $name = '';
 $value = '';
 while ($generator->valid()) {
 $i = $generator->current();
 switch ($i) {
 case ' ':
 case "\n":
 continue;
 case '{':
 $generator->next();
 $value = self::parser($generator);
 $data[$name] = $value;
 $step = self::STEP_NAME;
 $name = '';
 $value = '';
 break;
 case '}':
 if ($name) {
 $data[$name] = $value;
 }
 return $data;
 case ',':
 if ($name) {
 $data[$name] = $value;
 }
 $step = self::STEP_NAME;
 $name = '';
 $value = '';
 break;
 case ':':
 $step = self::STEP_VALUE;
 break;
 default:
 if ($step === self::STEP_NAME) {
 $name .= $i;
 } else {
 $value .= $i;
 }
 }
 $generator->next();
 }
 return $data;
 }
 /**
 * @param string $str
 * @return \Generator
 */
 private static function stringIterator($str)
 {
 for ($i = 0; $i < strlen($str); $i++) {
 yield $str[$i];
 }
 }
}

And here's the usage:

>>> $result = \Foo\Bar\NotJsonParser::parseNotJSON("{foo:{qux:1,quux:0}, bar:{}}");
=> [
 "foo" => [
 "qux" => "1",
 "quux" => "0",
 ],
 "bar" => [],
 ]

How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: {foo bar: baz} should be an error).

Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.

asked May 15, 2018 at 5:58
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.

For now, I'm using \w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".

If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.

Code: (Demo)

$unquoted_json = <<<NOTJSON
{foo:{qux:1,quux:0}, bar:{}}
NOTJSON;
$quoted_json = preg_replace('~\w[^:{},]*~', '"0ドル"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "\n---\n";
echo json_encode($array);

Output:

array (
 'foo' => 
 array (
 'qux' => '1',
 'quux' => '0',
 ),
 'bar' => 
 array (
 ),
)
---
{"foo":{"qux":"1","quux":"0"},"bar":[]}
answered May 31, 2018 at 23:15
\$\endgroup\$
2
  • \$\begingroup\$ That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks. \$\endgroup\$ Commented Jun 3, 2018 at 21:06
  • \$\begingroup\$ The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters ({}:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me. \$\endgroup\$ Commented Jun 6, 2018 at 22:13

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.