How to structuring a read/write submodule in OOP Python

Question 1

I am developing a python package that needs to be able to read/write from/to multiple formats. E.g. foo format and bar format. I am trying to contain the functions relating to each format in a single file, as below. I would also like the user interface to be along the lines of:

c = my_package.io.read_foo(...)
# do stuff
c.write_to_bar(...)

Structure

my_class.py: contains main class of my package

from io.write import WriteMixin
class MyClass(WriteMixin):
 ...

io/write.py:

import foo, bar
class WriteMixin:
 # register write functions, e.g.:
 def write_to_foo(self, ...):
 foo.write_foo(...)
 def write_to_bar(self, ...):
 bar.write_bar(...)

io/__init__.py:

from foo import read_foo
from bar import read_bar

io/foo.py:

from my_class import MyClass
def read_foo():
 # ... read/parse foo into MyClass
 return MyClass(foos)
def write_foo():
 pass
# other functions that deal with foo, might be used in read_foo, write_foo
def foo_func1():
 pass
...

io/bar.py:

from my_class import MyClass
def read_bar():
 # ... read/parse bar into MyClass
 return MyClass(bars)
def write_bar():
 pass
# other functions that deal with bar, might be used in read_bar, write_bar
def bar_func1():
 pass
...

This however results in a circular import (my_class <-- io <-- my_class). I know I can split the read/write files, or importing the class within the function, but is there is a way to keep foo.py and bar.py as is, without any "hacky" fixes?

More generally, is there an accepted/recommended structure for i/o submodules in python, reading and writing multiple formats and using OOP?

Question 2

Lazy imports are by far the easiest way to solve this. An alternative could be to remove your mixin and use a more procedural design instead. Roughly: data = mylib.foo.read(); ...; mylib.bar.write(data). This avoids dependencies between the different formats.

Question 3

It sounds like you need double dispatch: the ability to choose which function to call based on the types of two arguments, not just one. Python only provides single dispatch (dispatching on type of the self argument), so for double dispatch, you need to write your own dispatcher, one way or another. I don't know of a standard or recommended Pythonic way to solve this. So, this solution will be a little bit hacky, but hopefully not enough to cause problems.

my_class.py:

from io.write import WriteMixin
class MyClass(WriteMixin):
 ...

io/write.py:

# no need to import foo or bar now
class WriteMixin:
 writer_table = {} # (class, data_format): write_func
 @classmethod
 add_writer(cls, clas, data_format, write_func):
 cls.writer_table[(clas, data_format)] = write_func
 def write_to(self, data_format): # data_format is Foo, Bar, etc.
 write_func = self.writer_table[(self.__class__, data_format)]
 write_func(self)

io/foo.py

import io.write
from my_class import MyClass
from other_class import OtherClass
...
class Foo:
 pass
def read_foo(...):
 # read data, figure out what class it is
 if ...:
 ...
 return MyClass(...)
 elif ...:
 ...
 return OtherClass(...)
 ...
def write_myclass_foo(...):
 ...
io.write.WriteMixin.add_writer(MyClass, Foo, write_myclass_foo)
def write_otherclass_foo(...):
 ...
io.write.WriteMixin.add_writer(OtherClass, Foo, write_otherclass_foo)

io/bar.py

import io.write
from my_class import MyClass
from other_class import OtherClass
...
class Bar:
 pass
def read_bar(...):
 # read data, figure out what class it is
 if ...:
 ...
 return MyClass(...)
 elif ...:
 ...
 return OtherClass(...)
 ...
def write_myclass_bar(...):
 ...
io.write.WriteMixin.add_writer(MyClass, Bar, write_myclass_bar)
def write_otherclass_bar(...):
 ...
io.write.WriteMixin.add_writer(OtherClass, Bar, write_otherclass_bar)

Now in your main code, you can say:

obj = io.read_foo(...) # the class of obj is determined at run-time
# do stuff
obj.write_to(Bar) # This will call the writer appropriate to obj

You'll probably need to add a function parameter to tell what file to read from or write to.

If you wanted to get really fancy, you could write a decorator that, if you wrap a write_ method's definition with it, will automatically call WriteMixin.add_writer with the correct arguments. There is a nonstandard Python library multipledispatch that provides decorators similar this, but I'm not sure they fit your needs in this case.

I haven't tested the above code, but it might put you on the right track. The initialization code that registers the write_ functions might be considered a bit hacky because you have to specify the types of their parameters when you call add_writer, which introduces the possibility of error. But even the decorators in multipledispatch have this problem. You could avoid this problem by adding type hinting and writing code that reads the type hints, but that may be much more trouble than it's worth.

Question 4

This does not solve the problem at all, as you have coventiently replaced deserialization with ellipsis. And moved serialization elsewhere.

Question 5

@Basilevs Oops, you're right! I completely misunderstood the problem. Thanks for explaining the downvote. I just posted a complete rewrite.

Question 6

I don't understand this interface to be honest:

c = my_package.io.read_foo(...)
# do stuff
c.write_to_bar(...)

Why would c be aware of foo and bar? Will you add 20 methods to that class for 20 (de)serializers in the future? It would be much cleaner and elegant to have something along the lines:

c: MyClass
with FooReader(input_stream) as reader:
 c = reader.read(MyClass) # <-- Note the passed type here
with BarWriter(output_stream) as writer:
 writer.write(c) # <-- but not here, it can be deduced from c

don't you think? The context managers are here only because those classes take ownership of the passed streams and should probably gracefully close them, although YMMV.

And then there are at least three options (from what I suggest the most to the least):

You add a registration mechanism on each Reader/Writer which tells it how to (de)serialize itself. Basically a method on Reader/Writer class (or better: on a builder) that accepts a type and callback accepting a stream and either accepting or returning an instance of that type. Something like this:

# IReader/IWriter are common interfaces defined by you
# which BarReader and FooWriter implement
def _read_my_class(reader: IReader, input_stream: BinaryIO) -> MyClass:
 # implement deserialization
BarReader.register(MyClass, _read_my_class)
def _write_my_class(writer: IWriter, output_stream: BinaryIO, instance: MyClass):
 # implement serialization
FooWriter.register(MyClass, _write_my_class)
# Note: "register" can and should be a generic method.

In a more sophisticated context you can even use a visitor pattern ¹ on dataclasses. Of course you need a common interface for all Readers and all Writers (likely with generic read/write methods). Note that you need to pass IReader/IWriter in case of recursive (de)serialization.

Here Reader/Writer are essentially dicts that map types to concrete read/write methods.

Reader/Writer is simply aware of all the classes it needs to work with (which is ok for small number of classes) and implements the mechanism internally.
You add a magic method on each class that converts it to well-understood, (de)serializable format, e.g. into a dict, list, string or number, etc. And then Reader/Writer only works with those basic types. This is the simplest solution, but incurs a non-trivial overhead if those objects are big. And is a leaky abstraction, although not a big leak.

Either way MyClass stays as a normal dataclass, it does not have to be even aware of (de)serializers. And even if you use the visitor pattern, it is only aware of the interface, not particular implementations. Which as a side effect avoids the circular dependency issue.

The keywords here are: separation of concerns. And depend on abstract interfaces instead of concrete classes.

¹ It's a shame that you cannot extend existing Python classes in a clean, simple, safe and non-hacky way, like in Rust.

Question 7

@Basilevs I don't even know what "in-memory format" is supposed to mean, "binding to serialization format" is even more cryptic. And more importantly: how is that relevant? The read/write methods can be implemented for example as one-line json.load and json.dump calls, depending on needs. The "reflection" is of course necessary when implementing any (de)serialization mechanism. This of course can literally mean simply going manually through class fields, it doesn't require any sophisticated class scan. Which nevertheless can be done, I don't see anything special or problematic about it.

Question 8

There's of course a missing implementation of other methods. But basically reader/writer is just a dictionary that maps type to correct read/write method. Is that "heavy" use of reflection? And even if it is (whatever that means), then so what?

Question 9

@Basilevs I'm not saying JSON is the only format. You can literally implement any format you want with this. If order of fields is relevant then this needs to be configured per class representation anyway. And there are various ways to do this. I don't see any problem with extending my design to support this or anything else.

Ben Kovitz Ben Kovitz 1093 bronze badges · Answer 1 · 2025-05-20 07:31:52Z

It sounds like you need double dispatch: the ability to choose which function to call based on the types of two arguments, not just one. Python only provides single dispatch (dispatching on type of the self argument), so for double dispatch, you need to write your own dispatcher, one way or another. I don't know of a standard or recommended Pythonic way to solve this. So, this solution will be a little bit hacky, but hopefully not enough to cause problems.

my_class.py:

from io.write import WriteMixin
class MyClass(WriteMixin):
 ...

io/write.py:

# no need to import foo or bar now
class WriteMixin:
 writer_table = {} # (class, data_format): write_func
 @classmethod
 add_writer(cls, clas, data_format, write_func):
 cls.writer_table[(clas, data_format)] = write_func
 def write_to(self, data_format): # data_format is Foo, Bar, etc.
 write_func = self.writer_table[(self.__class__, data_format)]
 write_func(self)

io/foo.py

import io.write
from my_class import MyClass
from other_class import OtherClass
...
class Foo:
 pass
def read_foo(...):
 # read data, figure out what class it is
 if ...:
 ...
 return MyClass(...)
 elif ...:
 ...
 return OtherClass(...)
 ...
def write_myclass_foo(...):
 ...
io.write.WriteMixin.add_writer(MyClass, Foo, write_myclass_foo)
def write_otherclass_foo(...):
 ...
io.write.WriteMixin.add_writer(OtherClass, Foo, write_otherclass_foo)

io/bar.py

import io.write
from my_class import MyClass
from other_class import OtherClass
...
class Bar:
 pass
def read_bar(...):
 # read data, figure out what class it is
 if ...:
 ...
 return MyClass(...)
 elif ...:
 ...
 return OtherClass(...)
 ...
def write_myclass_bar(...):
 ...
io.write.WriteMixin.add_writer(MyClass, Bar, write_myclass_bar)
def write_otherclass_bar(...):
 ...
io.write.WriteMixin.add_writer(OtherClass, Bar, write_otherclass_bar)

Now in your main code, you can say:

obj = io.read_foo(...) # the class of obj is determined at run-time
# do stuff
obj.write_to(Bar) # This will call the writer appropriate to obj

You'll probably need to add a function parameter to tell what file to read from or write to.

If you wanted to get really fancy, you could write a decorator that, if you wrap a write_ method's definition with it, will automatically call WriteMixin.add_writer with the correct arguments. There is a nonstandard Python library multipledispatch that provides decorators similar this, but I'm not sure they fit your needs in this case.

I haven't tested the above code, but it might put you on the right track. The initialization code that registers the write_ functions might be considered a bit hacky because you have to specify the types of their parameters when you call add_writer, which introduces the possibility of error. But even the decorators in multipledispatch have this problem. You could avoid this problem by adding type hinting and writing code that reads the type hints, but that may be much more trouble than it's worth.

This does not solve the problem at all, as you have coventiently replaced deserialization with ellipsis. And moved serialization elsewhere.
@Basilevs Oops, you're right! I completely misunderstood the problem. Thanks for explaining the downvote. I just posted a complete rewrite.

freakish freakish 2,9061 gold badge11 silver badges16 bronze badges · Answer 2 · 2025-05-20 09:03:58Z

I don't understand this interface to be honest:

c = my_package.io.read_foo(...)
# do stuff
c.write_to_bar(...)

Why would c be aware of foo and bar? Will you add 20 methods to that class for 20 (de)serializers in the future? It would be much cleaner and elegant to have something along the lines:

c: MyClass
with FooReader(input_stream) as reader:
 c = reader.read(MyClass) # <-- Note the passed type here
with BarWriter(output_stream) as writer:
 writer.write(c) # <-- but not here, it can be deduced from c

don't you think? The context managers are here only because those classes take ownership of the passed streams and should probably gracefully close them, although YMMV.

And then there are at least three options (from what I suggest the most to the least):

You add a registration mechanism on each Reader/Writer which tells it how to (de)serialize itself. Basically a method on Reader/Writer class (or better: on a builder) that accepts a type and callback accepting a stream and either accepting or returning an instance of that type. Something like this:

# IReader/IWriter are common interfaces defined by you
# which BarReader and FooWriter implement
def _read_my_class(reader: IReader, input_stream: BinaryIO) -> MyClass:
 # implement deserialization
BarReader.register(MyClass, _read_my_class)
def _write_my_class(writer: IWriter, output_stream: BinaryIO, instance: MyClass):
 # implement serialization
FooWriter.register(MyClass, _write_my_class)
# Note: "register" can and should be a generic method.

In a more sophisticated context you can even use a visitor pattern ¹ on dataclasses. Of course you need a common interface for all Readers and all Writers (likely with generic read/write methods). Note that you need to pass IReader/IWriter in case of recursive (de)serialization.

Here Reader/Writer are essentially dicts that map types to concrete read/write methods.

Reader/Writer is simply aware of all the classes it needs to work with (which is ok for small number of classes) and implements the mechanism internally.
You add a magic method on each class that converts it to well-understood, (de)serializable format, e.g. into a dict, list, string or number, etc. And then Reader/Writer only works with those basic types. This is the simplest solution, but incurs a non-trivial overhead if those objects are big. And is a leaky abstraction, although not a big leak.

Either way MyClass stays as a normal dataclass, it does not have to be even aware of (de)serializers. And even if you use the visitor pattern, it is only aware of the interface, not particular implementations. Which as a side effect avoids the circular dependency issue.

The keywords here are: separation of concerns. And depend on abstract interfaces instead of concrete classes.

¹ It's a shame that you cannot extend existing Python classes in a clean, simple, safe and non-hacky way, like in Rust.

@Basilevs I don't even know what "in-memory format" is supposed to mean, "binding to serialization format" is even more cryptic. And more importantly: how is that relevant? The read/write methods can be implemented for example as one-line json.load and json.dump calls, depending on needs. The "reflection" is of course necessary when implementing any (de)serialization mechanism. This of course can literally mean simply going manually through class fields, it doesn't require any sophisticated class scan. Which nevertheless can be done, I don't see anything special or problematic about it.
There's of course a missing implementation of other methods. But basically reader/writer is just a dictionary that maps type to correct read/write method. Is that "heavy" use of reflection? And even if it is (whatever that means), then so what?
@Basilevs I'm not saying JSON is the only format. You can literally implement any format you want with this. If order of fields is relevant then this needs to be configured per class representation anyway. And there are various ways to do this. I don't see any problem with extending my design to support this or anything else.

Stack Exchange Network

How to structuring a read/write submodule in OOP Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to structuring a read/write submodule in OOP Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions