I am developing a python package that needs to be able to read/write from/to multiple formats. E.g. foo
format and bar
format. I am trying to contain the functions relating to each format in a single file, as below. I would also like the user interface to be along the lines of:
c = my_package.io.read_foo(...)
# do stuff
c.write_to_bar(...)
Structure
my_class.py
: contains main class of my package
from io.write import WriteMixin
class MyClass(WriteMixin):
...
io/write.py
:
import foo, bar
class WriteMixin:
# register write functions, e.g.:
def write_to_foo(self, ...):
foo.write_foo(...)
def write_to_bar(self, ...):
bar.write_bar(...)
io/__init__.py
:
from foo import read_foo
from bar import read_bar
io/foo.py
:
from my_class import MyClass
def read_foo():
# ... read/parse foo into MyClass
return MyClass(foos)
def write_foo():
pass
# other functions that deal with foo, might be used in read_foo, write_foo
def foo_func1():
pass
...
io/bar.py
:
from my_class import MyClass
def read_bar():
# ... read/parse bar into MyClass
return MyClass(bars)
def write_bar():
pass
# other functions that deal with bar, might be used in read_bar, write_bar
def bar_func1():
pass
...
This however results in a circular import (my_class
<-- io
<-- my_class
). I know I can split the read/write files, or importing the class within the function, but is there is a way to keep foo.py
and bar.py
as is, without any "hacky" fixes?
More generally, is there an accepted/recommended structure for i/o submodules in python, reading and writing multiple formats and using OOP?
2 Answers 2
It sounds like you need double dispatch: the ability to choose
which function to call based on the types of two arguments, not just one.
Python only provides single dispatch (dispatching on type of the self
argument), so for double dispatch, you need to
write your own dispatcher, one way or another. I don't know of a standard or recommended Pythonic way to solve this. So, this solution will be
a little bit hacky, but hopefully not enough to cause problems.
my_class.py
:
from io.write import WriteMixin
class MyClass(WriteMixin):
...
io/write.py
:
# no need to import foo or bar now
class WriteMixin:
writer_table = {} # (class, data_format): write_func
@classmethod
add_writer(cls, clas, data_format, write_func):
cls.writer_table[(clas, data_format)] = write_func
def write_to(self, data_format): # data_format is Foo, Bar, etc.
write_func = self.writer_table[(self.__class__, data_format)]
write_func(self)
io/foo.py
import io.write
from my_class import MyClass
from other_class import OtherClass
...
class Foo:
pass
def read_foo(...):
# read data, figure out what class it is
if ...:
...
return MyClass(...)
elif ...:
...
return OtherClass(...)
...
def write_myclass_foo(...):
...
io.write.WriteMixin.add_writer(MyClass, Foo, write_myclass_foo)
def write_otherclass_foo(...):
...
io.write.WriteMixin.add_writer(OtherClass, Foo, write_otherclass_foo)
io/bar.py
import io.write
from my_class import MyClass
from other_class import OtherClass
...
class Bar:
pass
def read_bar(...):
# read data, figure out what class it is
if ...:
...
return MyClass(...)
elif ...:
...
return OtherClass(...)
...
def write_myclass_bar(...):
...
io.write.WriteMixin.add_writer(MyClass, Bar, write_myclass_bar)
def write_otherclass_bar(...):
...
io.write.WriteMixin.add_writer(OtherClass, Bar, write_otherclass_bar)
Now in your main code, you can say:
obj = io.read_foo(...) # the class of obj is determined at run-time
# do stuff
obj.write_to(Bar) # This will call the writer appropriate to obj
You'll probably need to add a function parameter to tell what file to read from or write to.
If you wanted to get really fancy, you could write a decorator that, if you
wrap a write_
method's definition with it, will automatically call
WriteMixin.add_writer
with the correct arguments. There is a nonstandard Python
library multipledispatch
that provides decorators similar this, but I'm not sure they fit your needs in this case.
I haven't tested the above code, but it might put you on the right track. The initialization code that registers the write_
functions might be considered a bit hacky
because you have to specify the types of their parameters when you call add_writer
,
which introduces the possibility of error. But even the decorators in multipledispatch
have this problem. You could avoid this problem by adding type hinting and writing code that reads the type hints, but that may be much more trouble than it's worth.
-
This does not solve the problem at all, as you have coventiently replaced deserialization with ellipsis. And moved serialization elsewhere.Basilevs– Basilevs05/20/2025 09:00:40Commented May 20 at 9:00
-
@Basilevs Oops, you're right! I completely misunderstood the problem. Thanks for explaining the downvote. I just posted a complete rewrite.Ben Kovitz– Ben Kovitz05/23/2025 00:48:37Commented May 23 at 0:48
I don't understand this interface to be honest:
c = my_package.io.read_foo(...)
# do stuff
c.write_to_bar(...)
Why would c
be aware of foo
and bar
? Will you add 20 methods to that class for 20 (de)serializers in the future? It would be much cleaner and elegant to have something along the lines:
c: MyClass
with FooReader(input_stream) as reader:
c = reader.read(MyClass) # <-- Note the passed type here
with BarWriter(output_stream) as writer:
writer.write(c) # <-- but not here, it can be deduced from c
don't you think? The context managers are here only because those classes take ownership of the passed streams and should probably gracefully close them, although YMMV.
And then there are at least three options (from what I suggest the most to the least):
- You add a registration mechanism on each
Reader/Writer
which tells it how to (de)serialize itself. Basically a method onReader/Writer
class (or better: on a builder) that accepts a type and callback accepting a stream and either accepting or returning an instance of that type. Something like this:
# IReader/IWriter are common interfaces defined by you
# which BarReader and FooWriter implement
def _read_my_class(reader: IReader, input_stream: BinaryIO) -> MyClass:
# implement deserialization
BarReader.register(MyClass, _read_my_class)
def _write_my_class(writer: IWriter, output_stream: BinaryIO, instance: MyClass):
# implement serialization
FooWriter.register(MyClass, _write_my_class)
# Note: "register" can and should be a generic method.
In a more sophisticated context you can even use a visitor pattern 1 on dataclasses. Of course you need a common interface for all Readers
and all Writers
(likely with generic read/write
methods). Note that you need to pass IReader/IWriter
in case of recursive (de)serialization.
Here Reader/Writer
are essentially dicts that map types to concrete read/write
methods.
Reader/Writer
is simply aware of all the classes it needs to work with (which is ok for small number of classes) and implements the mechanism internally.- You add a magic method on each class that converts it to well-understood, (de)serializable format, e.g. into a dict, list, string or number, etc. And then
Reader/Writer
only works with those basic types. This is the simplest solution, but incurs a non-trivial overhead if those objects are big. And is a leaky abstraction, although not a big leak.
Either way MyClass
stays as a normal dataclass, it does not have to be even aware of (de)serializers. And even if you use the visitor pattern, it is only aware of the interface, not particular implementations. Which as a side effect avoids the circular dependency issue.
The keywords here are: separation of concerns. And depend on abstract interfaces instead of concrete classes.
1 It's a shame that you cannot extend existing Python classes in a clean, simple, safe and non-hacky way, like in Rust.
-
@Basilevs I don't even know what "in-memory format" is supposed to mean, "binding to serialization format" is even more cryptic. And more importantly: how is that relevant? The read/write methods can be implemented for example as one-line
json.load
andjson.dump
calls, depending on needs. The "reflection" is of course necessary when implementing any (de)serialization mechanism. This of course can literally mean simply going manually through class fields, it doesn't require any sophisticated class scan. Which nevertheless can be done, I don't see anything special or problematic about it.freakish– freakish05/23/2025 06:52:42Commented May 23 at 6:52 -
There's of course a missing implementation of other methods. But basically reader/writer is just a dictionary that maps type to correct read/write method. Is that "heavy" use of reflection? And even if it is (whatever that means), then so what?freakish– freakish05/23/2025 07:05:25Commented May 23 at 7:05
-
@Basilevs I'm not saying JSON is the only format. You can literally implement any format you want with this. If order of fields is relevant then this needs to be configured per class representation anyway. And there are various ways to do this. I don't see any problem with extending my design to support this or anything else.freakish– freakish05/23/2025 10:44:16Commented May 23 at 10:44
data = mylib.foo.read(); ...; mylib.bar.write(data)
. This avoids dependencies between the different formats.