Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 35417e9

Browse files
committed
More updates.
1 parent ce4e404 commit 35417e9

File tree

1 file changed

+129
-22
lines changed

1 file changed

+129
-22
lines changed

‎talk.md

Lines changed: 129 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ revealOptions:
2121
- 📍 Principal Data Scientist, DSAI, Moderna
2222
- 🎓 ScD, MIT Biological Engineering.
2323
- 🧬 Inverse protein, mRNA, and molecule design.
24+
- 🎉 Accelerated and enriched analysis of data.
2425

2526
---
2627

@@ -43,6 +44,13 @@ If you write automated tests for your work, then:
4344

4445
---
4546

47+
## ⭕️ Outline
48+
49+
- Testing in Software
50+
- Testing in Data Science
51+
52+
---
53+
4654
## 💻 Testing in Software
4755

4856
- 🤔 Why do testing?
@@ -59,6 +67,10 @@ Tests help falsify the hypothesis that our code _works_.
5967

6068
----
6169

70+
Without testing, we will have untested assumptions about whether our code works.
71+
72+
----
73+
6274
### 🧪 What does a test look like?
6375

6476
----
@@ -137,7 +149,9 @@ mamba env update -f environment.yml
137149
With `pytest` installed, use it to run your tests:
138150

139151
```bash
140-
pytest
152+
cd /path/to/my_project
153+
conda activate my_project
154+
pytest .
141155
```
142156

143157
---
@@ -212,6 +226,8 @@ We update the test to establish new expectations.
212226
1. ✅ Guarantees against breaking changes.
213227
2. 🤔 Example-based documentation for your code.
214228

229+
> Testing is a contract between yourself (now) and yourself (in the future).
230+
215231
---
216232

217233
### 👆 What kind of tests exist?
@@ -220,25 +236,55 @@ We update the test to establish new expectations.
220236

221237
#### 1️⃣ Unit Test
222238

223-
A test that checks that an individual function works correctly.
239+
```python
240+
def func1(data):
241+
...
242+
return stuff
243+
244+
def test_func1(data):
245+
stuff = func1(data)
246+
assert stuff == ...
247+
```
224248

225-
_Strive to write this type of test!_
249+
_A test that checks that an individual function works correctly. Strive to write this type of test!_
226250

227251
----
228252

229253
#### 2️⃣ Execution Test
230254

231-
A test that only checks that a function executes without erroring.
255+
```python
256+
def func1(data):
257+
...
258+
return stuff
232259

233-
_Use only in a pinch._
260+
def test_func1(data):
261+
func1(data)
262+
```
263+
264+
_A test that only checks that a function executes without erroring. Use only in a pinch._
234265

235266
----
236267

237268
#### 3️⃣ Integration Test
238269

239-
A test that checks that multiple functions work correctly together.
270+
```python
271+
def func1(data):
272+
...
273+
return stuff
274+
275+
def func2(data):
276+
...
277+
return stuff
278+
279+
def pipeline(data):
280+
return func2(func1(data))
281+
282+
def test_pipeline(data):
283+
output = pipeline(data)
284+
assert output = ...
285+
```
240286

241-
_Used to check that a system is working properly._
287+
_Checks that a system is working properly. Use this sparingly if the tests are long to execute!_
242288

243289
---
244290

@@ -273,6 +319,10 @@ Testing your DS code will be good for you!
273319

274320
## 😎Testing in Data Science
275321

322+
- Machine Learning Model Code
323+
- Data
324+
- Pipelines
325+
276326
----
277327

278328
### 🧠 Testing Machine Learning Model Code
@@ -305,9 +355,29 @@ of the shape that `model` accepts.
305355

306356
#### 🤔 What can we test here?
307357

308-
1. Our model accepts the correct inputs and outputs.
309-
2. Our model and datamodules work together.
310-
3. Our model does not fail in training loop.
358+
1. ___Unit test:___ `dm` produces correctly-shaped outputs when executed.
359+
2. ___Unit test:___ Given random inputs, `model` produces correctly-shaped outputs.
360+
3. ___Integration test:___ Given `dm` outputs, `model` produces correctly-shaped outputs.
361+
4. ___Execution test:___ `model` does not fail in training loop with `trainer` and `dm`.
362+
363+
----
364+
365+
#### 🟩 DataModule output shapes
366+
367+
```python
368+
def test_datamodule_shapes():
369+
# Arrange
370+
batch_size = 3
371+
input_dims = 4
372+
dm = DataModule(batch_size=batch_size)
373+
374+
# Act
375+
x, y = next(iter(dm.train_loader()))
376+
377+
# Assert
378+
assert x.shape == (batch_size, data_dims)
379+
assert y.shape == (batch_size, 1)
380+
```
311381

312382
----
313383

@@ -317,12 +387,17 @@ of the shape that `model` accepts.
317387
from jax import random, vmap, numpy as np
318388

319389
def test_model_shapes():
390+
# Arrange
320391
key = random.PRNGKey(55)
321-
num_samples = 7
322-
num_input_dims = 211
323-
inputs = random.normal(shape=(num_samples, num_input_dims))
324-
model = Model(num_input_dims=num_input_dims)
392+
batch_size = 3
393+
input_dims = 4
394+
inputs = random.normal(shape=(num_samples, input_dims))
395+
model = Model(input_dims=input_dims)
396+
397+
# Act
325398
outputs = vmap(model)(inputs)
399+
400+
# Assert
326401
assert outputs.shape == (num_samples, 1)
327402
```
328403

@@ -332,11 +407,16 @@ def test_model_shapes():
332407

333408
```python
334409
def test_model_datamodule_compatibility():
410+
# Arrange
335411
dm = DataModule()
336412
model = Model()
337413
x, y = next(iter(dm.train_dataloader()))
414+
415+
# Act
338416
pred = vmap(model)(x)
339-
assert x.shape == y.shape
417+
418+
# Assert
419+
assert pred.shape == y.shape
340420
```
341421

342422
----
@@ -345,18 +425,21 @@ def test_model_datamodule_compatibility():
345425

346426
```python
347427
def test_model():
428+
# Arrange
348429
model = Model()
349430
dm = DataModule()
350431
trainer = default_trainer(epochs=2)
432+
433+
# Act
351434
trainer.fit(model, dm)
352435
```
353436

354-
Ensure that model can be trained for at least 2 epochs.
355-
356437
---
357438

358439
### 📀 Testing Data
359440

441+
_a.k.a. Data Validation_
442+
360443
----
361444

362445
#### 👆 What data guarantees do we need?
@@ -396,7 +479,7 @@ df_schema = pa.DataFrameSchema(
396479

397480
```python
398481
def func(df):
399-
df_schema.validate(df)
482+
df =df_schema.validate(df)
400483
# The rest of the logic
401484
...
402485
```
@@ -415,11 +498,13 @@ Code is much more readable.
415498

416499
```python
417500
def pipeline(data):
501+
data = df_schema.validate(data)
418502
d1 = func1(data)
419503
d2 = func2(d1)
420504
d3 = func3(d1)
421505
d4 = func4(d2, d3)
422-
return outfunc(d4)
506+
output = outfunc(d4)
507+
return output_schema.validate(output)
423508
```
424509

425510
----
@@ -440,6 +525,28 @@ def test_func4(data):
440525
...
441526
```
442527

528+
----
529+
530+
#### 🤝 The whole pipeline can be integration tested
531+
532+
```python
533+
def test_pipeline()
534+
# Arrange
535+
data = pd.DataFrame(...)
536+
537+
# Act
538+
output = pipeline(data)
539+
540+
# Assert
541+
assert output = ...
542+
```
543+
544+
_We assume your pipeline is quick to run._
545+
546+
---
547+
548+
### 🕓 One more thing
549+
443550
---
444551

445552
### 💰 Mock-up Realistic Fake Data
@@ -509,9 +616,9 @@ _Do unto others what you would have others do unto you._
509616

510617
## 😎 Summary
511618

512-
1. ✅ Write tests for your **code**.
513-
2. ✅ Write tests for your **data**.
514-
3. ✅ Write tests for your **models**.
619+
1. ✅ Write tests for your __code__.
620+
2. ✅ Write tests for your __data__.
621+
3. ✅ Write tests for your __models__.
515622

516623
---
517624

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /