Serialized document inventory management splicing/slicing function

Question 1

Unlike other inventory management systems that their items are fungible, in this system they are not supposed to. Each item is serialized and tracked since the printing press. We are trying to move away from manually keep track of it. Due the volume (more than 4 million documents per year) and intrinsic characteristics of the serialization (incrementing integers) we made the following model:

class Documents(models.Model):
 document_type = models.IntegerField(choices=DocumentsStatusHistory.DocumentTypes.choices)
 series = models.PositiveIntegerField()
 serial_range = IntegerRangeField()
 located_at = ForeignKey(Warehouse)

where the document type describes an arbitrary and predefined type, series change to indicate the printing cycle and serial_range is the "compressed" serialized number. Due compliance requirement that full tracking records of the movement of that inventory per individual documents, we adopted the system that was already in use and that seems most familiar with our would be users. Since it is not expected that the entire inventory moves in bulk, but more of a subset of the entire bulk in each of the specific warehouses for either use or distribution, we need to slice and splice the serial_range value, which would "pop out" that range. This is what we came with:

document_record_to_splice = Documents.objects.get(id=1)
# For this example, I'm presuming that id=1 has the correct series, document type
# and the range is contained by the object being sliced/spliced.
if document_record_to_splice.serial_range == new_serial_range:
 document_record_to_splice
 return document_record_to_splice
# We return early if the range is equal to the entire record 
# (ie. new_range == existing_range).
if Documents.objects.filter(serial_range__overlaps=new_serial_range).count() > 1:
 raise Exception
# We also raise a exception if the range is between two separated records.
# Here is where everything starts
orig_bin = document_record_to_splice
# We create a copy of all of the fields of the original
o_serial_range = orig_bin.serial_range
document_type = orig_bin.document_type
series = orig_bin.series
located_at = orig_bin.located_at
# We create our new range objects that are not part of the intersection of the ranges.
# Something to improve here is that we presume that the intersection doesn't match any
# of the bounds of the existing object.
from psycopg.types.range import Range
lower_serial_range = Range(o_serial_range.lower, serial_range.upper - 1)
upper_serial_range = Range(serial_range.upper + 1, o_serial_range.upper)
# We delete the original object. Since we need to make sure that all records 
# have a continuous set of adjacent ranges, for any document type or series,
# and save those that are not part of the transferred range
orig_bin.delete()
for n_serial_range in [lower_serial_range, upper_serial_range]:
 orig_bin.pk = None
 orig_bin.serial_range = n_serial_range
 orig_bin.full_clean()
 orig_bin.save()
# We create our new record, with a copy of all the information that we cached
# before about it.
new_bin = Documents(
 document_type=document_type,
 series=series,
 serial_range=serial_range,
 document_status=o_document_status,
)
# The record of the transaction was saved on the function above.

This is how we would do it if we had to do it with raw SQL without much of django ORM but the .raw() helper:

with old_documents as (
delete from manager_documents
where document_type = %(document_type)s
and series = %(series)s
and serial_range @> '[%(new_range)s]'::numrange
returning document_type, document_status, created_at, series, multirange(serial_range) - multirange('[%(new_range)s]'::numrange) as kept_ranges, parent_transaction_id, located_at_id
)
insert into manager_documents (document_type, document_status, created_at, series, serial_range, parent_transaction_id, located_at_id)
select document_type, document_status, created_at, series, unnest(kept_ranges), parent_transaction_id, located_at_id
from old_documents;

The last sql would replace all the dancing around trying to copy the existing object.

Now my concerns are:

Is the SQL unavoidable if I don't want to have a logical complexity in python?
Are there any concerns with race conditions under any of the solutions? (technically, any code calling the splicing function should be bound in transaction.atomic())
Am I missing a standard Django ORM API that would avoid writing raw SQL?

Question 2

A concern not detailed here: what if I want to do this with multiple records at the same time? I could see an argument that by that time I should just loop this function and would be satisfied.

Question 3

As a python&django developer, I must confess that your SQL version is more readable than its python counterpart. I'd write an answer, but unfortunately your snippet is not reviewable. There are multiple undefined variables, some unused variables, import straight in the middle of a function, let alone missing def ... around something containing a return statement. You don't have to verbosely copy model attributes to variables unless you have a very weird custom delete - all instance fields (except for pk? not sure) remain available after calling .delete().

Question 4

And please don't ever write comments explaining code above. For if/else branches you may put comments inside them to comment on that branch selection or behaviour, but on top level comments should explain the following piece of code. Now it reads as We return early if the range is equal to the entire record... raise Exception - exception is certainly a kind of an early return, but not the one other people can comprehend this way.

Question 5

@STerliakov the comments are not on the code. They are only in this question trying to explain in a very narrow example, what it is doing and why. I don't blame you that it seems daunting, because inventory management software rarely isn't, specially when you are dealing with serialized items (there's only 1 combination of 3 characteristic described in the model, the located_at is something that should be implied of a inventory system).

Question 6

Hm, but CodeReview requires that you post a code to be reviewed (it should compile/run without obvious errors and be correct to the best of your knowledge) and be representative: we review the code included in your post. Now it certainly can't be run, and my comment above is part of review of the implementation that you posted and isn't an answer precisely because the snippet does not represent a runnable piece of software. Generic questions "is my approach reasonable", even if accompanied with some pseudocode, are off-topic here.

Question 7

complex DELETE

I confess I don't quite understand the motivation behind that SQL statement. It seems to be about finding set difference of kept_ranges and then un-nesting?

I predict that when you recruit additional engineers to this project, they will be reluctant to maintain that statement, as it looks "dangerous", looks like it would be easy to accidentally discard important rows in the course of testing it. Consider adding an is_old column to the base relation, or perhaps to a VIEW that ties a two-column table to the base relation. We update it to show which rows are "old", INSERT corresponding rows, and DELETE the original rows. Notice that an interactive tester can perform just the first step and verify that it seems to be going as expected.

automated tests

The OP would be more easily understood if it were accompanied by a small test suite. Tests have documentation value.

Including the output of SHOW CREATE TABLE ... would also make it easier to understand the OP code.

diagnostic error message

This is terrible:

 raise Exception

Maybe it "never" happens? We need a better description of what went south, so a maintenance engineer has some hope of diagnosing root cause and fixing the bug.

informative identifier names

for n_serial_range in ... :

I think that means for new_serial_range? Saving two ew characters hardly seems worth it. If you want a shorter identifier, elide the word _serial.

simplicity

I don't know how big your ranges are, nor what memory constraints you work within. But consider finding set difference with something like

 range2 = range(lo, hi)
 range3 = set(range1) - set(range2)

It's horribly inefficient, but there's no special cases. Possibly you'll want a helper which inspects the set(), verifies "no gaps!", and turns it back into a range().

Or write a proper SerialRange class that uses arithmetic to accomplish the same thing. I think your code avoids MultiRanges?

Question 8

I will go backwards commenting: IntegerRangeField is a special range type of postgres (int4range) which probably would be familiar if implementing a schedule application. It's not a python range because it has bounds. I did mention how we could manage 4 million records in a single operation, that means range(1, 4000000). Last time I did that my system froze for a while while trying to do that for a form, that's why the "compact" option was preferred.

Question 9

There are automated tests, but help pages made no indication of such being looked for. I tried to describe the system as best as I could. The purpose of the sql query is to do the whole range operation and deletion of the preexisting record so it can be recorded. I could pass an PK to identify the record, but I was trying my best to keep everything explicit. Something like

target_record = Documents.objects.filter(...) Documents.objects.raw(..., params={id: target_record.pk, new_range: new_range})

Question 10

There's an obvious drawback and is that the sql must have new columns specified if the model ever changes (which I believe it should not, knock on wood), but unless postgres changes something it should be something that doesn't need to be touched upon. Also, the function is in a hot path, because every transaction has the potential of hitting it.

J_H J_H 41.7k3 gold badges38 silver badges157 bronze badges · Answer 1 · 2025-01-07 21:35:53Z

complex DELETE

I confess I don't quite understand the motivation behind that SQL statement. It seems to be about finding set difference of kept_ranges and then un-nesting?

I predict that when you recruit additional engineers to this project, they will be reluctant to maintain that statement, as it looks "dangerous", looks like it would be easy to accidentally discard important rows in the course of testing it. Consider adding an is_old column to the base relation, or perhaps to a VIEW that ties a two-column table to the base relation. We update it to show which rows are "old", INSERT corresponding rows, and DELETE the original rows. Notice that an interactive tester can perform just the first step and verify that it seems to be going as expected.

automated tests

The OP would be more easily understood if it were accompanied by a small test suite. Tests have documentation value.

Including the output of SHOW CREATE TABLE ... would also make it easier to understand the OP code.

diagnostic error message

This is terrible:

 raise Exception

Maybe it "never" happens? We need a better description of what went south, so a maintenance engineer has some hope of diagnosing root cause and fixing the bug.

informative identifier names

for n_serial_range in ... :

I think that means for new_serial_range? Saving two ew characters hardly seems worth it. If you want a shorter identifier, elide the word _serial.

simplicity

I don't know how big your ranges are, nor what memory constraints you work within. But consider finding set difference with something like

 range2 = range(lo, hi)
 range3 = set(range1) - set(range2)

It's horribly inefficient, but there's no special cases. Possibly you'll want a helper which inspects the set(), verifies "no gaps!", and turns it back into a range().

Or write a proper SerialRange class that uses arithmetic to accomplish the same thing. I think your code avoids MultiRanges?

I will go backwards commenting: IntegerRangeField is a special range type of postgres (int4range) which probably would be familiar if implementing a schedule application. It's not a python range because it has bounds. I did mention how we could manage 4 million records in a single operation, that means range(1, 4000000). Last time I did that my system froze for a while while trying to do that for a form, that's why the "compact" option was preferred.
There are automated tests, but help pages made no indication of such being looked for. I tried to describe the system as best as I could. The purpose of the sql query is to do the whole range operation and deletion of the preexisting record so it can be recorded. I could pass an PK to identify the record, but I was trying my best to keep everything explicit. Something like target_record = Documents.objects.filter(...) Documents.objects.raw(..., params={id: target_record.pk, new_range: new_range})
There's an obvious drawback and is that the sql must have new columns specified if the model ever changes (which I believe it should not, knock on wood), but unless postgres changes something it should be something that doesn't need to be touched upon. Also, the function is in a hot path, because every transaction has the potential of hitting it.

Stack Exchange Network

Serialized document inventory management splicing/slicing function

1 Answer 1

complex DELETE

automated tests

diagnostic error message

informative identifier names

simplicity

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Serialized document inventory management splicing/slicing function

1 Answer 1

complex DELETE

automated tests

diagnostic error message

informative identifier names

simplicity

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions