A module to make JSON in Python easier

Question 1

I recently wrote the livejson module as a way to make working with JSON in Python easier. The idea of the module is that you initialize a livejson.File, and then you can interact with your JSON file with the same interface as a Python dict, except your changes are reflected in the file in realtime.

I'd like to know how I can improve this.

Is my design good, in terms of things like naming, OOP design, etc.?
Are there ways I could improve performance without decreasing reliability? I implemented the context manager with a concept of "grouped writes" specifically so that repetitive operations could achieve the performance of the native objects.
Should I be concerned about thread safety?

The code

Here's the code. I've tried to document it well with comments and docstrings, and also tried to follow PEP8 so I hope it's readable:

Imports:

"""A module implementing a pseudo-dict class which is bound to a JSON file.
As you change the contents of the dict, the JSON file will be updated in
real-time. Magic.
"""
import collections
import os
import json

The code begins with some helper functions and generic base classes:

# MISC HELPERS
def _initfile(path, data="dict"):
 """Initialize an empty JSON file."""
 data = {} if data.lower() == "dict" else []
 # The file will need to be created if it doesn't exist
 if not os.path.exists(path): # The file doesn't exist
 # Raise exception if the directory that should contain the file doesn't
 # exist
 dirname = os.path.dirname(path)
 if dirname and not os.path.exists(dirname):
 raise IOError(
 ("Could not initialize empty JSON file in non-existant "
 "directory '{}'").format(os.path.dirname(path))
 )
 # Write an empty file there
 with open(path, "w") as f:
 json.dump(data, f)
 return True
 elif len(open(path, "r").read()) == 0: # The file is empty
 with open(path, "w") as f:
 json.dump(data, f)
 else: # The file exists and contains content
 return False
class _ObjectBase(object):
 """Class inherited by most things.
 Implements the lowest common denominator for all emulating classes.
 """
 def __getitem__(self, key):
 out = self.data[key]
 # Nesting
 if isinstance(out, (list, dict)):
 # If it's the top level, we can use [] for the path
 pathInData = self.pathInData if hasattr(self, "pathInData") else []
 newPathInData = pathInData + [key]
 # The top level, i.e. the File class, not a nested class. If we're
 # already the top level, just use self.
 toplevel = self.base if hasattr(self, "base") else self
 nestClass = _NestedList if isinstance(out, list) else _NestedDict
 return nestClass(toplevel, newPathInData)
 # Not a list or a dict, don't worry about it
 else:
 return out
 def __len__(self):
 return len(self.data)
 # Methods not-required by the ABC
 def __str__(self):
 return str(self.data)
 def __repr__(self):
 return repr(self.data)
 # MISC
 def _checkType(self, key):
 """Make sure the type of a key is appropriate."""
 pass

One of the major challenges I faced was easily supporting nested JSON structures, with the easy syntax of my_file["a"]["b"] = "c". It was difficult to make these cases behave properly because __setitem__ is not called in this case. To overcome this, I set up simple "nesting" classes to replace Python lists and dicts in nested situations. These mainly just listen for __setitem__ and tell the top-level class to update the file's contents.

# NESTING CLASSES
class _NestedBase(_ObjectBase):
 """Inherited by _NestedDict and _NestedList, implements methods common
 between them. Takes arguments 'fileobj' which specifies the parent File
 object, and 'pathToThis' which specifies where in the JSON file this object
 exists (as a list).
 """
 def __init__(self, fileobj, pathToThis):
 self.pathInData = pathToThis
 self.base = fileobj
 @property
 def data(self):
 # Start with the top-level data
 d = self.base.data
 # Navigate through the object to find where self.pathInData points
 for i in self.pathInData:
 d = d[i]
 # And return the result
 return d
 def __setitem__(self, key, value):
 self._checkType(key)
 # Store the whole data
 data = self.base.data
 # Iterate through and find the right part of the data
 d = data
 for i in self.pathInData:
 d = d[i]
 # It is passed by reference, so modifying the found object modifies
 # the whole thing
 d[key] = value
 # Update the whole file with the modification
 self.base.set_data(data)
 def __delitem__(self, key):
 # See __setitem__ for details on how this works
 data = self.base.data
 d = data
 for i in self.pathInData:
 d = d[i]
 del d[key]
 self.base.set_data(data)
class _NestedDict(_NestedBase, collections.MutableMapping):
 """A pseudo-dict class to replace vanilla dicts inside a livejson.File.
 This "watches" for changes made to its content, then tells
 the base livejson.File instance to update itself so that the file always
 reflects the changes you've made.
 This class is what allows for nested calls like this
 >>> f = livejson.File("myfile.json")
 >>> f["a"]["b"]["c"] = "d"
 to update the file.
 """
 def __iter__(self):
 return iter(self.data)
 def _checkType(self, key):
 if not isinstance(key, str):
 raise TypeError("JSON only supports strings for keys, not '{}'. {}"
 .format(type(key).__name__, "Try using a list for"
 " storing numeric keys" if
 isinstance(key, int) else ""))
class _NestedList(_NestedBase, collections.MutableSequence):
 """A pseudo-list class to replace vanilla lists inside a livejson.File.
 This "watches" for changes made to its content, then tells
 the base livejson.File instance to update itself so that the file always
 reflects the changes you've made.
 This class is what allows for nested calls involving lists like this:
 >>> f = livejson.File("myfile.json")
 >>> f["a"].append("foo")
 to update the file.
 """
 def insert(self, index, value):
 # See _NestedBase.__setitem__ for details on how this works
 data = self.base.data
 d = data
 for i in self.pathInData:
 d = d[i]
 d.insert(index, value)
 self.base.set_data(data)

Here we have the main element of the module, the classes which envelop everything else:

# THE MAIN CLASSES
class _BaseFile(_ObjectBase):
 """Class inherited by DictFile and ListFile.
 This implements all the required methods common between
 collections.MutableMapping and collections.MutableSequence."""
 def __init__(self, path, pretty=False, sort_keys=False):
 self.path = path
 self.path = path
 self.pretty = pretty
 self.sort_keys = sort_keys
 self.indent = 2 # Default indentation level
 _initfile(self.path,
 "list" if isinstance(self, ListFile) else "dict")
 def _data(self):
 """A simpler version of data to avoid infinite recursion in some cases.
 Don't use this.
 """
 if self.is_caching:
 return self.cache
 with open(self.path, "r") as f:
 return json.load(f)
 @property
 def data(self):
 """Get a vanilla dict object to represent the file."""
 # Update type in case it's changed
 self._updateType()
 # And return
 return self._data()
 def __setitem__(self, key, value):
 self._checkType(key)
 data = self.data
 data[key] = value
 self.set_data(data)
 def __delitem__(self, key):
 data = self.data
 del data[key]
 self.set_data(data)
 def _updateType(self):
 """Make sure that the class behaves like the data structure that it
 is, so that we don't get a ListFile trying to represent a dict."""
 data = self._data()
 # Change type if needed
 if isinstance(data, dict) and isinstance(self, ListFile):
 self.__class__ = DictFile
 elif isinstance(data, list) and isinstance(self, DictFile):
 self.__class__ = ListFile
 # Bonus features!
 def set_data(self, data):
 """Overwrite the file with new data. You probably shouldn't do
 this yourself, it's easy to screw up your whole file with this."""
 if self.is_caching:
 self.cache = data
 else:
 fcontents = self.file_contents
 with open(self.path, "w") as f:
 try:
 # Write the file. Keep user settings about indentation, etc
 indent = self.indent if self.pretty else None
 json.dump(data, f, sort_keys=self.sort_keys, indent=indent)
 except Exception as e:
 # Rollback to prevent data loss
 f.seek(0)
 f.truncate()
 f.write(fcontents)
 # And re-raise the exception
 raise e
 self._updateType()
 def remove(self):
 """Delete the file from the disk completely."""
 os.remove(self.path)
 @property
 def file_contents(self):
 """Get the raw file contents of the file."""
 with open(self.path, "r") as f:
 return f.read()
 # Grouped writes
 @property
 def is_caching(self):
 """Returns a boolean value describing whether a grouped write is
 underway.
 """
 return hasattr(self, "cache")
 def __enter__(self):
 self.cache = self.data
 return self # This enables using "as"
 def __exit__(self, *args):
 # We have to write manually here because __setitem__ is set up to write
 # to cache, not to file
 with open(self.path, "w") as f:
 json.dump(self.cache, f)
 del self.cache
class DictFile(_BaseFile, collections.MutableMapping):
 """A class emulating Python's dict that will update a JSON file as it is
 modified.
 """
 def __iter__(self):
 return iter(self.data)
 def _checkType(self, key):
 if not isinstance(key, str):
 raise TypeError("JSON only supports strings for keys, not '{}'. {}"
 .format(type(key).__name__, "Try using a list for"
 " storing numeric keys" if
 isinstance(key, int) else ""))
class ListFile(_BaseFile, collections.MutableSequence):
 """A class emulating a Python list that will update a JSON file as it is
 modified. Use this class directly when creating a new file if you want the
 base object to be an array.
 """
 def insert(self, index, value):
 data = self.data
 data.insert(index, value)
 self.set_data(data)
 def clear(self):
 # Under Python 3, this method is already in place. I've implemented it
 # myself to maximize compatibility with Python 2. Note that the
 # docstring here is stolen from Python 3.
 """L.clear() -> None -- remove all items from L."""
 self.set_data([])

And finally, this is what a user will directly initialize (most of the time). File is a class so that I can have staticmethods on it, but it really behaves like a factory function since it immediately changes self.__class__. Initializing a File() will give you either a ListFile or a DictFile object.

class File(object):
 """The main interface of livejson. Emulates a list or a dict, updating a
 JSON file in real-time as it is modified.
 This will be automatically replaced with either a ListFile or as
 DictFile based on the contents of your file (a DictFile is the default when
 creating a new file).
 """
 def __init__(self, path, pretty=False, sort_keys=True, indent=2):
 # When creating a blank JSON file, it's better to make the top-level an
 # Object ("dict" in Python), rather than an Array ("list" in python),
 # because that's the case for most JSON files.
 self.path = path
 self.pretty = pretty
 self.sort_keys = sort_keys
 self.indent = indent
 _initfile(self.path)
 with open(self.path, "r") as f:
 data = json.load(f)
 if isinstance(data, dict):
 self.__class__ = DictFile
 elif isinstance(data, list):
 self.__class__ = ListFile
 @staticmethod
 def with_data(path, data):
 """Initialize a new file that starts out with some data. Pass data
 as a list, dict, or JSON string.
 """
 # De-jsonize data if necessary
 if isinstance(data, str):
 data = json.loads(data)
 # Make sure this is really a new file
 if os.path.exists(path):
 raise ValueError("File exists, not overwriting data. Use "
 "'set_data' if you really want to do this.")
 else:
 f = File(path)
 f.set_data(data)
 return f
# Aliases for backwards-compatibility
Database = File
ListDatabase = ListFile
DictDatabase = DictFile

Usage

Here's example usage in case that's helpful to critique my API design:

Basic usage:

import livejson
f = livejson.File("test.json")
f["a"] = "b"
# That's it, the file has been written to!

Usage as a context manager:

import livejson
with livejson.File("test.json") as f:
 f["a"] = "b"

If I could get some feedback on this that would be great.

Question 2

From a first look pretty clean code, and you also presented it in a nice fashion, so off to a good start.

I'll go with your questions first:

Naming looks mostly good, though I'd wager that at some point generic names with "Base" in it will make it harder to understand than necessary - unfortunately I don't have a good suggestion as to what other names would be more helpful.
Any unspecific question about performance should get the answer "take a profiler and look" IMO. Really, there's more to worry about than raw numbers, including the fact that this is Python. For starts the regular json module also isn't the fastest. However correctness comes first, so
not only is thread safety a concern, access from multiple processes is a concern even more so. The one pattern that should really be used here at least is to first create a temporary file with the to-be-written content, then atomically moving that to the destination. I'm not sure what's the canonical source for this, but this blog post is a nice start. The set_data method is particularly bad in that respect.

Okay, so the code is almost PEP8 compatible, but there's still a lot of camel case in there. IMO that's fine as long as it's consistent.

One note:

Replacing self.__class__ ... that works? Even if it did that looks extremely fishy, c.f. this SO post for example. I share the sentiment that a different mechanism would be better - if I see ...File() as a constructor call I definitely don't expect a subclass of that to be the object in question. Not doing "clever" things will save people minutes to hours of debugging time, and this is definitely too clever. Just replace it with a regular function like with livejson.mapped("test.json") as f: or so and make it clear that you get a File subclass as a result would be enough for me.

Question 3

hmm, I read that SO post, but it doesn't seem like many of those points apply to my case. It works fine in my unit tests, which cover 100% of my code. Can you think of a scenario in my specific case when this might make debugging harder or confuse users? I feel like for my specific case, in which the class is replaced immediately, before the user can do anything with the class, it's roughly equivalent to a factory function. Could you elaborate on some downsides for my specific case? Otherwise, I agree completely with you, thanks a lot :D

Question 4

Yes, I get your point and yes it's roughly equivalent; the coverage and testing is really great btw., so then I'd maybe downgrade it and say "if you document this behaviour clearly" it should be fine.

ferada ferada 11.4k25 silver badges65 bronze badges · Accepted Answer · 2016-07-05 20:10:29Z

From a first look pretty clean code, and you also presented it in a nice fashion, so off to a good start.

I'll go with your questions first:

Naming looks mostly good, though I'd wager that at some point generic names with "Base" in it will make it harder to understand than necessary - unfortunately I don't have a good suggestion as to what other names would be more helpful.
Any unspecific question about performance should get the answer "take a profiler and look" IMO. Really, there's more to worry about than raw numbers, including the fact that this is Python. For starts the regular json module also isn't the fastest. However correctness comes first, so
not only is thread safety a concern, access from multiple processes is a concern even more so. The one pattern that should really be used here at least is to first create a temporary file with the to-be-written content, then atomically moving that to the destination. I'm not sure what's the canonical source for this, but this blog post is a nice start. The set_data method is particularly bad in that respect.

Okay, so the code is almost PEP8 compatible, but there's still a lot of camel case in there. IMO that's fine as long as it's consistent.

One note:

Replacing self.__class__ ... that works? Even if it did that looks extremely fishy, c.f. this SO post for example. I share the sentiment that a different mechanism would be better - if I see ...File() as a constructor call I definitely don't expect a subclass of that to be the object in question. Not doing "clever" things will save people minutes to hours of debugging time, and this is definitely too clever. Just replace it with a regular function like with livejson.mapped("test.json") as f: or so and make it clear that you get a File subclass as a result would be enough for me.

hmm, I read that SO post, but it doesn't seem like many of those points apply to my case. It works fine in my unit tests, which cover 100% of my code. Can you think of a scenario in my specific case when this might make debugging harder or confuse users? I feel like for my specific case, in which the class is replaced immediately, before the user can do anything with the class, it's roughly equivalent to a factory function. Could you elaborate on some downsides for my specific case? Otherwise, I agree completely with you, thanks a lot :D
Yes, I get your point and yes it's roughly equivalent; the coverage and testing is really great btw., so then I'd maybe downgrade it and say "if you document this behaviour clearly" it should be fine.

Stack Exchange Network

A module to make JSON in Python easier

The code

Usage

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

A module to make JSON in Python easier

The code

Usage

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions