7
\$\begingroup\$

Goal: Writing more readable tests.

I have a couple of functions, which basically merge and converts two lists of Datasets together, written using Scala and Spark. Each of these Datasets has a lot of fields inside it. For testing, I'm creating three Datasets: New records, existing records, and expected result.

The problem is, tests are long and hard to read. An example:

test("Merging Movies") {
val newMovies: Dataset[ATMMovie] = Seq(
 ATMMovie(
 id = 123L,
 utc_insert_timestamp = Some(1524522274),
 movie_title = Some("New movie from ATM"),
 censor_rating_id = Some(0),
 release_year = Some(2018),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 internal_pos_movie_code = Some("P1"),
 internal_pos_movie_id = Some("ID1"),
 temporary = 0,
 utc_last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 0
 ),
 ATMMovie(
 id = 456L,
 utc_insert_timestamp = Some(34567522274L),
 movie_title = Some("Title updated"),
 censor_rating_id = Some(0),
 release_year = Some(2016),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 internal_pos_movie_code = Some("Movie2"),
 internal_pos_movie_id = Some("MovieID2"),
 temporary = 0,
 utc_last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 3
 )
 ).toDS
val existingMovies: Dataset[ODSMovie] = Seq(
 ODSMovie(
 movie_row_id = 2L,
 movie_source_id = Some("234"),
 movie_entity_id = 7777L,
 utc_insert_timestamp = Some(1524522000),
 movie_title = Some("Old ODS Movie"),
 censor_rating_id = Some(0),
 release_year = Some(2017),
 release_date = Some(1524522987),
 primary_language_id = Some(1),
 distributor_id = Some(5),
 internal_pos_movie_id = Some("Movie 1"),
 temporary = 0,
 utc_Last_modified_timestamp = Some(1524522666),
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 1
 ),
 ODSMovie(
 movie_row_id = 764L,
 movie_entity_id = 658L,
 utc_insert_timestamp = Some(94567522333L),
 movie_title = Some("Old title"),
 censor_rating_id = Some(0),
 release_year = Some(2016),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 movie_source_id = Some("Movie2"),
 internal_pos_movie_id = Some("MovieID2-old"),
 temporary = 0,
 utc_Last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 0
 )
).toDS
val expectedODSMovies: Dataset[ODSMovie] = Seq(
 ODSMovie(
 movie_row_id = 765L,
 movie_source_id = Some("P1"),
 movie_entity_id = 123L,
 utc_insert_timestamp = Some(1524522274),
 movie_title = Some("New movie from ATM"),
 censor_rating_id = Some(0),
 release_year = Some(2018),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 internal_pos_movie_id = Some("ID1"),
 temporary = 0,
 utc_Last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 0
 ),
 ODSMovie(
 movie_row_id = 764L,
 movie_entity_id = 456L,
 utc_insert_timestamp = Some(34567522274L),
 movie_title = Some("Title updated"),
 censor_rating_id = Some(0),
 release_year = Some(2016),
 release_date = Some(1524522000),
 primary_language_id = Some(0),
 distributor_id = Some(0),
 movie_source_id = Some("Movie2"),
 internal_pos_movie_id = Some("MovieID2"),
 temporary = 0,
 utc_Last_modified_timestamp = None,
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 3
 ),
 ODSMovie( // Movie we had.
 movie_row_id = 2L,
 movie_source_id = Some("234"),
 movie_entity_id = 7777L,
 utc_insert_timestamp = Some(1524522000),
 movie_title = Some("Old ODS Movie"),
 censor_rating_id = Some(0),
 release_year = Some(2017),
 release_date = Some(1524522987),
 primary_language_id = Some(1),
 distributor_id = Some(5),
 internal_pos_movie_id = Some("Movie 1"),
 temporary = 0,
 utc_Last_modified_timestamp = Some(1524522666),
 force_update = 0,
 utc_last_import_attempt_timestamp = None,
 import_attempts = 1
 )
).toDS

As you see, each test is very hard to read and follow. I'm looking to find a better way to write these tests.

Update: I don't care about the value of most of the fields. I'm going to test the logic of merging.

asked May 2, 2018 at 21:47
\$\endgroup\$
5
  • \$\begingroup\$ I don't know about Scala syntax, but have you considered migrating data to JSON and parse it from the code instead of having it everything in the code? \$\endgroup\$ Commented May 3, 2018 at 6:31
  • \$\begingroup\$ Without seeing the code that performs this "merging" I find it hard to tell what's going on in here. Without understanding what are you testing, it's hard to suggest improvements for the tests. \$\endgroup\$ Commented May 3, 2018 at 12:25
  • 2
    \$\begingroup\$ My team and I typically find that having a "defaults" file in our API test package to be very useful when writing data-oriented tests. If these domain objects need to be used in future, we can rely on the fact that they're determinstic and centralized. \$\endgroup\$ Commented May 11, 2018 at 11:42
  • \$\begingroup\$ You say you don't care about the value of most of the fields. I don't understand that, since when testing the merging of data sets, that's about the only interesting thing: whether each field is mapped and merged correctly. \$\endgroup\$ Commented Apr 27, 2019 at 3:30
  • \$\begingroup\$ Yes, we want to check if movie_title in input record is the same as movie_title in the expected record. However, we don't care if it's Avengers: End game or Auei3894. \$\endgroup\$ Commented Apr 28, 2019 at 4:43

1 Answer 1

2
\$\begingroup\$

Our best solution to this was to use a generator-like class or function.

val newMovies = DataFrameBuilder().add(1).add(2, movie_id=71)

It generates values for fields based on the number we provide. If type is a number, value will become the number itself. If it's a string, value will become name of the field plus the number (e.g. MOVIE_TITLE 1). We can also pass custom value for fields.

Another solution was to use a function that takes some of the fields we care about, and fills the rest with constant values. Like, it takes id and utc_insert_timestamp, and fills everything e

answered Oct 29, 2018 at 0:37
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.