Django -- use some production data in test database

Question 1

My project is using Django's test database framework (w/ and w/o Selenium) and it's been working well for several years, with a few read-only tests on the production database (mostly integrity tests that our DB can't enforce) but with the vast majority being R/W on the test database.

We have one model/table on the production database that provides important metadata for the site which is getting too big to code into the fixtures, and which we would like to see current values for in our tests. I would love our setUp() code to be able to do something like:

def setUp(self):
 with self.activate_production_database():
 metadata_info = MetadataTable.objects.values_list(
 'title', flat=True)
 # back to test_database
 MetadataTable.objects.bulk_create([MetadataTable(title=t)
 for t in metadata_info])

I'm wondering what, if anything, exists that is like the with self.activate_production_database() line?

Question 2

Why not create a fixture?

Question 3

@willeM_VanOnsem -- it would be good to avoid needing to run dumpdata after every change to the db particularly since that part of the setup happens on a part of our testing pipeline that doesn't have access to the same FS as the code running the test.

Question 4

Or may be some kind of Factory for test data creation (e.g. factoryboy.readthedocs.io/en/stable/orms.html )? Connecting test environment with production database sounds bit dangerous to me too.

Question 5

Whatever you do, do NOT allow cross read on the production database from other (non prod) systems. It might be fine for this one use case, but the usage will bleed out into other things as people realise it exists, and you will end up with problems.

Some possible solutions:

Have a replicated read-only database containing only the table(s) you want to read, and use this.
It feels this data is part of the system, rather than data, so take it out of the table and include it in the repo as eg. json, yaml etc. Then load (automatically) it to the table each time you do a deploy - this means your other environments will stay up to date as long as you have the latest repo.
Put this common data into a SQLite database instead and have django read 2 databases. Include the SQLite db file in the repo.

I think what you're doing rn is as I mentioned above - you've put system variables that are the same in all environments into the database, when they need to be part of the system. So the solution is to find a sensible way to include them in the system and have them as part of the repo, even if this also means the deployment process recreates the existing table each time.

Question 6

Your second and third solutions involve this data being part of the code repo and committed to the code repo. It is not something determined by the programmers of the project, but it determines the validity of the program code. If values are entered into that table that cannot pass our tests (that is, cannot be properly rendered) then the fault is w/ the programming not our editors (the table(s) is/are far, far, more complex than just title and other metadata) -- either we have not structured our validators properly or we need to write code to make possible what the editors want. :-)

Question 7

I of course don't know your exact use case, but any value that needs to be exactly the same in every environment could basically be a constant in the codebase. You say yourself it's only in the DB becasue it's a large amount of data. Managing this in the DB in production and trying to replicate this across everything is not the right approach.

Question 8

Thanks for your comment. I am in this case, not asking for criticism for the approach, I'm asking about whether a solution that meets the question's specs is possible.

Question 9

And I gave some suggestions and some advice. In this case, the requirement is bad because the current solution is bad - this is your chance to improve it. If you don't like the advice or suggestions though, don't use them. Entirely up to you and no skin off my nose.

Question 10

Beside the fact that I agree with michjnich that this seems not to be a good Idea. You could try to setup specific test settings with two databases configured like this:

# settings.py
DATABASES = {
 'default': {
 'ENGINE': 'django.db.backends.postgresql',
 'NAME': 'test_db',
 # other settings...
 },
 'production': {
 'ENGINE': 'django.db.backends.postgresql',
 'NAME': 'production_db',
 # other settings...
 }
}

In your setup method you could access production with .using

from django.db import connections
with connections['production'].cursor() as cursor:
 qs = YourModel.objects.using('production').values_list('title', flat=True)
#outside of with statement access test database

https://docs.djangoproject.com/en/5.1/ref/models/querysets/#using

I did not used this during tests so far, but it may work.

Question 11

this does not in itself work, but it's pointing to new areas and parts of the codebase that are proving fruitful. More than worth the bounty. Thank you. (Again, I'm part of a group moving into a project where the current extensive test suite all takes place on a copy of the production db; a movement to only a bit of the testing using the production db is a major security win).

michjnich 3,4253 gold badges20 silver badges39 bronze badges · Accepted Answer · 2024-01-02 08:46:22Z

Whatever you do, do NOT allow cross read on the production database from other (non prod) systems. It might be fine for this one use case, but the usage will bleed out into other things as people realise it exists, and you will end up with problems.

Some possible solutions:

Have a replicated read-only database containing only the table(s) you want to read, and use this.
It feels this data is part of the system, rather than data, so take it out of the table and include it in the repo as eg. json, yaml etc. Then load (automatically) it to the table each time you do a deploy - this means your other environments will stay up to date as long as you have the latest repo.
Put this common data into a SQLite database instead and have django read 2 databases. Include the SQLite db file in the repo.

I think what you're doing rn is as I mentioned above - you've put system variables that are the same in all environments into the database, when they need to be part of the system. So the solution is to find a sensible way to include them in the system and have them as part of the repo, even if this also means the deployment process recreates the existing table each time.

Your second and third solutions involve this data being part of the code repo and committed to the code repo. It is not something determined by the programmers of the project, but it determines the validity of the program code. If values are entered into that table that cannot pass our tests (that is, cannot be properly rendered) then the fault is w/ the programming not our editors (the table(s) is/are far, far, more complex than just title and other metadata) -- either we have not structured our validators properly or we need to write code to make possible what the editors want. :-)
I of course don't know your exact use case, but any value that needs to be exactly the same in every environment could basically be a constant in the codebase. You say yourself it's only in the DB becasue it's a large amount of data. Managing this in the DB in production and trying to replicate this across everything is not the right approach.
Thanks for your comment. I am in this case, not asking for criticism for the approach, I'm asking about whether a solution that meets the question's specs is possible.
And I gave some suggestions and some advice. In this case, the requirement is bad because the current solution is bad - this is your chance to improve it. If you don't like the advice or suggestions though, don't use them. Entirely up to you and no skin off my nose.

CollectivesTM on Stack Overflow

Django -- use some production data in test database

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related