Readable unit test - lists of complex objects

Question 1

Goal: Writing more readable tests.

I have a couple of functions, which basically merge and converts two lists of Datasets together, written using Scala and Spark. Each of these Datasets has a lot of fields inside it. For testing, I'm creating three Datasets: New records, existing records, and expected result.

The problem is, tests are long and hard to read. An example:

test("Merging Movies") {
val newMovies: Dataset[ATMMovie] = Seq(
 ATMMovie(
 id = 123L,
 utc_insert_timestamp = Some(1524522274),
 movie_title = Some("New movie from ATM"),
 censor_rating_id = Some(0),
 release_year = Some(2018),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 internal_pos_movie_code = Some("P1"),
 internal_pos_movie_id = Some("ID1"),
 temporary = 0,
 utc_last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 0
 ),
 ATMMovie(
 id = 456L,
 utc_insert_timestamp = Some(34567522274L),
 movie_title = Some("Title updated"),
 censor_rating_id = Some(0),
 release_year = Some(2016),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 internal_pos_movie_code = Some("Movie2"),
 internal_pos_movie_id = Some("MovieID2"),
 temporary = 0,
 utc_last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 3
 )
 ).toDS
val existingMovies: Dataset[ODSMovie] = Seq(
 ODSMovie(
 movie_row_id = 2L,
 movie_source_id = Some("234"),
 movie_entity_id = 7777L,
 utc_insert_timestamp = Some(1524522000),
 movie_title = Some("Old ODS Movie"),
 censor_rating_id = Some(0),
 release_year = Some(2017),
 release_date = Some(1524522987),
 primary_language_id = Some(1),
 distributor_id = Some(5),
 internal_pos_movie_id = Some("Movie 1"),
 temporary = 0,
 utc_Last_modified_timestamp = Some(1524522666),
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 1
 ),
 ODSMovie(
 movie_row_id = 764L,
 movie_entity_id = 658L,
 utc_insert_timestamp = Some(94567522333L),
 movie_title = Some("Old title"),
 censor_rating_id = Some(0),
 release_year = Some(2016),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 movie_source_id = Some("Movie2"),
 internal_pos_movie_id = Some("MovieID2-old"),
 temporary = 0,
 utc_Last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 0
 )
).toDS
val expectedODSMovies: Dataset[ODSMovie] = Seq(
 ODSMovie(
 movie_row_id = 765L,
 movie_source_id = Some("P1"),
 movie_entity_id = 123L,
 utc_insert_timestamp = Some(1524522274),
 movie_title = Some("New movie from ATM"),
 censor_rating_id = Some(0),
 release_year = Some(2018),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 internal_pos_movie_id = Some("ID1"),
 temporary = 0,
 utc_Last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 0
 ),
 ODSMovie(
 movie_row_id = 764L,
 movie_entity_id = 456L,
 utc_insert_timestamp = Some(34567522274L),
 movie_title = Some("Title updated"),
 censor_rating_id = Some(0),
 release_year = Some(2016),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 movie_source_id = Some("Movie2"),
 internal_pos_movie_id = Some("MovieID2"),
 temporary = 0,
 utc_Last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 3
 ),
 ODSMovie( // Movie we had.
 movie_row_id = 2L,
 movie_source_id = Some("234"),
 movie_entity_id = 7777L,
 utc_insert_timestamp = Some(1524522000),
 movie_title = Some("Old ODS Movie"),
 censor_rating_id = Some(0),
 release_year = Some(2017),
 release_date = Some(1524522987),
 primary_language_id = Some(1),
 distributor_id = Some(5),
 internal_pos_movie_id = Some("Movie 1"),
 temporary = 0,
 utc_Last_modified_timestamp = Some(1524522666),
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 1
 )
).toDS

As you see, each test is very hard to read and follow. I'm looking to find a better way to write these tests.

Update: I don't care about the value of most of the fields. I'm going to test the logic of merging.

Question 2

I don't know about Scala syntax, but have you considered migrating data to JSON and parse it from the code instead of having it everything in the code?

Question 3

Without seeing the code that performs this "merging" I find it hard to tell what's going on in here. Without understanding what are you testing, it's hard to suggest improvements for the tests.

Question 4

My team and I typically find that having a "defaults" file in our API test package to be very useful when writing data-oriented tests. If these domain objects need to be used in future, we can rely on the fact that they're determinstic and centralized.

Question 5

You say you don't care about the value of most of the fields. I don't understand that, since when testing the merging of data sets, that's about the only interesting thing: whether each field is mapped and merged correctly.

Question 6

Yes, we want to check if movie_title in input record is the same as movie_title in the expected record. However, we don't care if it's Avengers: End game or Auei3894.

Question 7

Our best solution to this was to use a generator-like class or function.

val newMovies = DataFrameBuilder().add(1).add(2, movie_id=71)

It generates values for fields based on the number we provide. If type is a number, value will become the number itself. If it's a string, value will become name of the field plus the number (e.g. MOVIE_TITLE 1). We can also pass custom value for fields.

Another solution was to use a function that takes some of the fields we care about, and fills the rest with constant values. Like, it takes id and utc_insert_timestamp, and fills everything e

Aidin Aidin 1915 bronze badges · Answer 1 · 2018-10-29 00:37:01Z

Our best solution to this was to use a generator-like class or function.

val newMovies = DataFrameBuilder().add(1).add(2, movie_id=71)

It generates values for fields based on the number we provide. If type is a number, value will become the number itself. If it's a string, value will become name of the field plus the number (e.g. MOVIE_TITLE 1). We can also pass custom value for fields.

Another solution was to use a function that takes some of the fields we care about, and fills the rest with constant values. Like, it takes id and utc_insert_timestamp, and fills everything e

Stack Exchange Network

Readable unit test - lists of complex objects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Readable unit test - lists of complex objects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions