Imagine following scenario:
Our team is working on a mobile project in biometrics. The team delivers a client facing SDK. Our work relies on another internal team, that is delivering algorithms in a form of a black box library. SDK has access to the Internet, but processing of biometrics happens on the phone.
That makes our project a wrapper for the library, where we are providing a platform-specific, user-friendly API for the clients. We are also responsible for the quality of the product - making sure it works as expected in client's environment.
Testing of the product can be somewhat automated by sending a video file to the library, that imitates phone camera feed. Tests like this are very flaky, as in biometrics there is a lot of variables that can decide if a capture is successful or not. Things like lightning conditions, device camera, background scene, size and shape of a face or fingers have impact.
As algorithms are refined, new versions are constantly breaking our test suite, making it impossible to determine if algorithm is broken or if test video file was just not good enough.
Manual tests can help validate false-negatives, but are time-consuming and don't cover much of the surface, if performed by the same engineers.
Constrains
we cannot impact how the other team operates, so we can't require more testing on their side or just blame them for production issues unless you caught and recorded a problematic scenario
SDK size matters very much to our clients, so we can't deploy both old and new algorithm (algorithm code and data make the majority of the size of the SDK) in a single SDK release
there is no mechanism to make canary releases, if SDK is deployed it goes to all clients
What would be a good strategy for integration testing in this project to validate systems behavior and minimize risk of bad version of algorithms hitting the production?
-
1Is a question like this where "You" refers to a different person for each reader suitable for the stack overflow format? It doesn't make sense to vote on answers if each person is answering a question about themselves. Maybe reword the question to ask something more general than what each individual would do, where two people can answer the same question and agree or disagree on the answer?bdsl– bdsl10/10/2024 13:18:06Commented Oct 10, 2024 at 13:18
-
@bdsl It's a bit unusual but I don't see it causing the problem you're concerned about. Seemed more of a stylistic choice.candied_orange– candied_orange10/10/2024 13:42:29Commented Oct 10, 2024 at 13:42
-
1I think this is a good question that explains enough of the situation to garner good answers. +1Greg Burghardt– Greg Burghardt10/10/2024 14:58:17Commented Oct 10, 2024 at 14:58
-
I took the freedom to change the POV from "you (=the reader)" to "we (= OPs team)". Please double check if I got your intentions right.Doc Brown– Doc Brown10/11/2024 08:39:19Commented Oct 11, 2024 at 8:39
-
@DocBrown Yes, maybe I didn't read well enough and missed the first word "imagine". I think your edit helps though. I didn't downvote. I'd rather leave the comment there, I think it helps to explain the edit history etc.bdsl– bdsl10/11/2024 10:17:11Commented Oct 11, 2024 at 10:17
2 Answers 2
What I see here is that you're taking on two completely different jobs.
Our team delivers a client facing SDK.
This is one job. This job doesn't care about the problems of the black box library. You can't fix it. You just need it to work. So for your own tests, that show that your own stuff works, mock the hell out of it. Prove that your own stuff works before worrying if the other teams stuff works.
We are also responsible for the quality of the product - making sure it works as expected in client's environment.
This is the integrated job where you have to show that everything, including this black box, works together. Here, you can't mock them out. But if you did the other job correctly, when this fails it shouldn't be hard to show where the failure is coming from.
As algorithms are refined, new versions are constantly breaking our test suite, making it impossible to determine if algorithm is broken or if test video file was just not good enough.
This is not your job. This is the other teams job. Make them do that. If the new algorithms require new videos make them make them.
we cannot impact how the other team operates, so we can't require more testing on their side or just blame them for production issues unless you caught and recorded a problematic scenario
Make it easy to record problematic scenarios. Make it easy to prove where the problem isn't. This is work. You don't get it for free. So plan for it.
SDK size matters very much to clients, so we can't deploy both old and new algorithm (algorithm code and data make the majority of the size of the SDK) in a single SDK release
Who cares? Give yourself a way to install every version you've ever made. You don't have to deploy them all. Just create the capability so you don't lose the ability to test a video against both the past and current versions. Now you can show what abilities you're gaining and losing.
What would be a good strategy for integration testing in this project to validate systems behavior and minimize risk of bad version of algorithms hitting the production?
Test, test, test, and test.
Don't accept work from the black box team without them showing you what videos they claim will work. If old videos that used to pass now fail make them state that they are OK with that.
You can't control the other team. But if you're responsible for the whole thing then you decide when it's ready. Make clear what you want to see passing tests before you make that call. If they don't respond, just don't use their new stuff.
Politically, it'd be much easier if you weren't wearing both these hats. Maybe look for a way to fix that.
-
@TomaszBąk: Regarding SDK size, you can still build in the ability to do A/B testing without deploying multiple SDKs to the same device. You could have a "developer" mode for the app so you can deploy the old and new SDKs side-by-side for testing purposes. You just don't want both SDKs deployed in the real production product downloaded by real end users. That way you can introduce additional logging and give the app to the other team. Have them read the logs and debug.Greg Burghardt– Greg Burghardt10/10/2024 15:01:10Commented Oct 10, 2024 at 15:01
The purpose of integration testing is to verify that things still work when you bring individually tested components together.
When doing integration testing, you basically want to know if the test harness used by the algorithm team matches good enough the way that the SDK wrapper uses the algorithm library and conversely that the test doubles used by the SDK team match good enough to the interface expectations of the algorithm library.
For integration testing, you don't need to use a wide variety in the quality of the video feeds you use. Depending on the range of results that the algorithm library can give, it might even be sufficient to have two videos. One made under ideal conditions that must always result in a positive outcome and one that must always result in a negative outcome.
That said, it might be that you are actually asked to do system testing: the verification that the system as a whole works correctly.
In that case, my approach would be to seek a closer collaboration with the algorithm team to get a better understanding of which tests might fail due to an update, if those failures are acceptable and what actions to take to get the tests passing again.