Commit 35417e9

committed

More updates.

1 parent ce4e404 commit 35417e9Copy full SHA for 35417e9

File tree

1 file changed

+129

-22

lines changed

talk.md

1 file changed

+129

-22

lines changed

`‎talk.md`

Lines changed: 129 additions & 22 deletions

Original file line number	Diff line number	Diff line change
`@@ -21,6 +21,7 @@ revealOptions:`
`21`	`21`	`- 📍 Principal Data Scientist, DSAI, Moderna`
`22`	`22`	`- 🎓 ScD, MIT Biological Engineering.`
`23`	`23`	`- 🧬 Inverse protein, mRNA, and molecule design.`
	`24`	`+- 🎉 Accelerated and enriched analysis of data.`
`24`	`25`
`25`	`26`	`---`
`26`	`27`
`@@ -43,6 +44,13 @@ If you write automated tests for your work, then:`
`43`	`44`
`44`	`45`	`---`
`45`	`46`
	`47`	`+## ⭕️ Outline`
	`48`	`+`
	`49`	`+- Testing in Software`
	`50`	`+- Testing in Data Science`
	`51`	`+`
	`52`	`+---`
	`53`	`+`
`46`	`54`	`## 💻 Testing in Software`
`47`	`55`
`48`	`56`	`- 🤔 Why do testing?`
`@@ -59,6 +67,10 @@ Tests help falsify the hypothesis that our code _works_.`
`59`	`67`
`60`	`68`	`----`
`61`	`69`
	`70`	`+Without testing, we will have untested assumptions about whether our code works.`
	`71`	`+`
	`72`	`+----`
	`73`	`+`
`62`	`74`	`### 🧪 What does a test look like?`
`63`	`75`
`64`	`76`	`----`
`@@ -137,7 +149,9 @@ mamba env update -f environment.yml`
`137`	`149`	With `pytest` installed, use it to run your tests:
`138`	`150`
`139`	`151`	```bash
`140`		`-pytest`
	`152`	`+cd /path/to/my_project`
	`153`	`+conda activate my_project`
	`154`	`+pytest .`
`141`	`155`	```
`142`	`156`
`143`	`157`	`---`
`@@ -212,6 +226,8 @@ We update the test to establish new expectations.`
`212`	`226`	`1. ✅ Guarantees against breaking changes.`
`213`	`227`	`2. 🤔 Example-based documentation for your code.`
`214`	`228`
	`229`	`+> Testing is a contract between yourself (now) and yourself (in the future).`
	`230`	`+`
`215`	`231`	`---`
`216`	`232`
`217`	`233`	`### 👆 What kind of tests exist?`
`@@ -220,25 +236,55 @@ We update the test to establish new expectations.`
`220`	`236`
`221`	`237`	`#### 1️⃣ Unit Test`
`222`	`238`
`223`		`-A test that checks that an individual function works correctly.`
	`239`	+```python
	`240`	`+def func1(data):`
	`241`	`+ ...`
	`242`	`+ return stuff`
	`243`	`+`
	`244`	`+def test_func1(data):`
	`245`	`+ stuff = func1(data)`
	`246`	`+ assert stuff == ...`
	`247`	+```
`224`	`248`
`225`		`-_Strive to write this type of test!_`
	`249`	`+_A test that checks that an individual function works correctly. Strive to write this type of test!_`
`226`	`250`
`227`	`251`	`----`
`228`	`252`
`229`	`253`	`#### 2️⃣ Execution Test`
`230`	`254`
`231`		`-A test that only checks that a function executes without erroring.`
	`255`	+```python
	`256`	`+def func1(data):`
	`257`	`+ ...`
	`258`	`+ return stuff`
`232`	`259`
`233`		`-_Use only in a pinch._`
	`260`	`+def test_func1(data):`
	`261`	`+ func1(data)`
	`262`	+```
	`263`	`+`
	`264`	`+_A test that only checks that a function executes without erroring. Use only in a pinch._`
`234`	`265`
`235`	`266`	`----`
`236`	`267`
`237`	`268`	`#### 3️⃣ Integration Test`
`238`	`269`
`239`		`-A test that checks that multiple functions work correctly together.`
	`270`	+```python
	`271`	`+def func1(data):`
	`272`	`+ ...`
	`273`	`+ return stuff`
	`274`	`+`
	`275`	`+def func2(data):`
	`276`	`+ ...`
	`277`	`+ return stuff`
	`278`	`+`
	`279`	`+def pipeline(data):`
	`280`	`+ return func2(func1(data))`
	`281`	`+`
	`282`	`+def test_pipeline(data):`
	`283`	`+ output = pipeline(data)`
	`284`	`+ assert output = ...`
	`285`	+```
`240`	`286`
`241`		`-_Used to check that a system is working properly._`
	`287`	`+_Checks that a system is working properly. Use this sparingly if the tests are long to execute!_`
`242`	`288`
`243`	`289`	`---`
`244`	`290`
`@@ -273,6 +319,10 @@ Testing your DS code will be good for you!`
`273`	`319`
`274`	`320`	`## 😎Testing in Data Science`
`275`	`321`
	`322`	`+- Machine Learning Model Code`
	`323`	`+- Data`
	`324`	`+- Pipelines`
	`325`	`+`
`276`	`326`	`----`
`277`	`327`
`278`	`328`	`### 🧠 Testing Machine Learning Model Code`
@@ -305,9 +355,29 @@ of the shape that `model` accepts.
`305`	`355`
`306`	`356`	`#### 🤔 What can we test here?`
`307`	`357`
`308`		`-1. Our model accepts the correct inputs and outputs.`
`309`		`-2. Our model and datamodules work together.`
`310`		`-3. Our model does not fail in training loop.`
	`358`	+1. ___Unit test:___ `dm` produces correctly-shaped outputs when executed.
	`359`	+2. ___Unit test:___ Given random inputs, `model` produces correctly-shaped outputs.
	`360`	+3. ___Integration test:___ Given `dm` outputs, `model` produces correctly-shaped outputs.
	`361`	+4. ___Execution test:___ `model` does not fail in training loop with `trainer` and `dm`.
	`362`	`+`
	`363`	`+----`
	`364`	`+`
	`365`	`+#### 🟩 DataModule output shapes`
	`366`	`+`
	`367`	+```python
	`368`	`+def test_datamodule_shapes():`
	`369`	`+ # Arrange`
	`370`	`+ batch_size = 3`
	`371`	`+ input_dims = 4`
	`372`	`+ dm = DataModule(batch_size=batch_size)`
	`373`	`+`
	`374`	`+ # Act`
	`375`	`+ x, y = next(iter(dm.train_loader()))`
	`376`	`+`
	`377`	`+ # Assert`
	`378`	`+ assert x.shape == (batch_size, data_dims)`
	`379`	`+ assert y.shape == (batch_size, 1)`
	`380`	+```
`311`	`381`
`312`	`382`	`----`
`313`	`383`
@@ -317,12 +387,17 @@ of the shape that `model` accepts.
`317`	`387`	`from jax import random, vmap, numpy as np`
`318`	`388`
`319`	`389`	`def test_model_shapes():`
	`390`	`+ # Arrange`
`320`	`391`	`key = random.PRNGKey(55)`
`321`		`- num_samples = 7`
`322`		`- num_input_dims = 211`
`323`		`- inputs = random.normal(shape=(num_samples, num_input_dims))`
`324`		`- model = Model(num_input_dims=num_input_dims)`
	`392`	`+ batch_size = 3`
	`393`	`+ input_dims = 4`
	`394`	`+ inputs = random.normal(shape=(num_samples, input_dims))`
	`395`	`+ model = Model(input_dims=input_dims)`
	`396`	`+`
	`397`	`+ # Act`
`325`	`398`	`outputs = vmap(model)(inputs)`
	`399`	`+`
	`400`	`+ # Assert`
`326`	`401`	`assert outputs.shape == (num_samples, 1)`
`327`	`402`	```
`328`	`403`
`@@ -332,11 +407,16 @@ def test_model_shapes():`
`332`	`407`
`333`	`408`	```python
`334`	`409`	`def test_model_datamodule_compatibility():`
	`410`	`+ # Arrange`
`335`	`411`	`dm = DataModule()`
`336`	`412`	`model = Model()`
`337`	`413`	`x, y = next(iter(dm.train_dataloader()))`
	`414`	`+`
	`415`	`+ # Act`
`338`	`416`	`pred = vmap(model)(x)`
`339`		`- assert x.shape == y.shape`
	`417`	`+`
	`418`	`+ # Assert`
	`419`	`+ assert pred.shape == y.shape`
`340`	`420`	```
`341`	`421`
`342`	`422`	`----`
`@@ -345,18 +425,21 @@ def test_model_datamodule_compatibility():`
`345`	`425`
`346`	`426`	```python
`347`	`427`	`def test_model():`
	`428`	`+ # Arrange`
`348`	`429`	`model = Model()`
`349`	`430`	`dm = DataModule()`
`350`	`431`	`trainer = default_trainer(epochs=2)`
	`432`	`+`
	`433`	`+ # Act`
`351`	`434`	`trainer.fit(model, dm)`
`352`	`435`	```
`353`	`436`
`354`		`-Ensure that model can be trained for at least 2 epochs.`
`355`		`-`
`356`	`437`	`---`
`357`	`438`
`358`	`439`	`### 📀 Testing Data`
`359`	`440`
	`441`	`+_a.k.a. Data Validation_`
	`442`	`+`
`360`	`443`	`----`
`361`	`444`
`362`	`445`	`#### 👆 What data guarantees do we need?`
`@@ -396,7 +479,7 @@ df_schema = pa.DataFrameSchema(`
`396`	`479`
`397`	`480`	```python
`398`	`481`	`def func(df):`
`399`		`- df_schema.validate(df)`
	`482`	`+ df =df_schema.validate(df)`
`400`	`483`	`# The rest of the logic`
`401`	`484`	`...`
`402`	`485`	```
`@@ -415,11 +498,13 @@ Code is much more readable.`
`415`	`498`
`416`	`499`	```python
`417`	`500`	`def pipeline(data):`
	`501`	`+ data = df_schema.validate(data)`
`418`	`502`	`d1 = func1(data)`
`419`	`503`	`d2 = func2(d1)`
`420`	`504`	`d3 = func3(d1)`
`421`	`505`	`d4 = func4(d2, d3)`
`422`		`- return outfunc(d4)`
	`506`	`+ output = outfunc(d4)`
	`507`	`+ return output_schema.validate(output)`
`423`	`508`	```
`424`	`509`
`425`	`510`	`----`
`@@ -440,6 +525,28 @@ def test_func4(data):`
`440`	`525`	`...`
`441`	`526`	```
`442`	`527`
	`528`	`+----`
	`529`	`+`
	`530`	`+#### 🤝 The whole pipeline can be integration tested`
	`531`	`+`
	`532`	+```python
	`533`	`+def test_pipeline()`
	`534`	`+ # Arrange`
	`535`	`+ data = pd.DataFrame(...)`
	`536`	`+`
	`537`	`+ # Act`
	`538`	`+ output = pipeline(data)`
	`539`	`+`
	`540`	`+ # Assert`
	`541`	`+ assert output = ...`
	`542`	+```
	`543`	`+`
	`544`	`+_We assume your pipeline is quick to run._`
	`545`	`+`
	`546`	`+---`
	`547`	`+`
	`548`	`+### 🕓 One more thing`
	`549`	`+`
`443`	`550`	`---`
`444`	`551`
`445`	`552`	`### 💰 Mock-up Realistic Fake Data`
`@@ -509,9 +616,9 @@ _Do unto others what you would have others do unto you._`
`509`	`616`
`510`	`617`	`## 😎 Summary`
`511`	`618`
`512`		`-1. ✅ Write tests for your code.`
`513`		`-2. ✅ Write tests for your data.`
`514`		`-3. ✅ Write tests for your models.`
	`619`	`+1. ✅ Write tests for your __code__.`
	`620`	`+2. ✅ Write tests for your __data__.`
	`621`	`+3. ✅ Write tests for your __models__.`
`515`	`622`
`516`	`623`	`---`
`517`	`624`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 35417e9

File tree

1 file changed

1 file changed

`‎talk.md`

0 commit comments