Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 5d3ccf5

Browse files
Create ML pipeline stages
1 parent 6c5ba05 commit 5d3ccf5

File tree

4 files changed

+75
-0
lines changed

4 files changed

+75
-0
lines changed

‎.gitignore‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
.venv/
2+
/model.pkl

‎data/.gitignore‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
/data.xml
22
/prepared
3+
/features

‎dvc.lock‎

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,47 @@ stages:
2121
md5: 153aad06d376b6595932470e459ef42a.dir
2222
size: 8437363
2323
nfiles: 2
24+
featurize:
25+
cmd: python src/featurization.py data/prepared data/features
26+
deps:
27+
- path: data/prepared
28+
hash: md5
29+
md5: 153aad06d376b6595932470e459ef42a.dir
30+
size: 8437363
31+
nfiles: 2
32+
- path: src/featurization.py
33+
hash: md5
34+
md5: e22789fc9581cad11ef7a6fa3aa3f17b
35+
size: 4158
36+
params:
37+
params.yaml:
38+
featurize.max_features: 100
39+
featurize.ngrams: 1
40+
outs:
41+
- path: data/features
42+
hash: md5
43+
md5: f8f5cbc3188008a7542d02d63054d9d2.dir
44+
size: 1556290
45+
nfiles: 2
46+
train:
47+
cmd: python src/train.py data/features model.pkl
48+
deps:
49+
- path: data/features
50+
hash: md5
51+
md5: f8f5cbc3188008a7542d02d63054d9d2.dir
52+
size: 1556290
53+
nfiles: 2
54+
- path: src/train.py
55+
hash: md5
56+
md5: 324001573ed724e5ae092226fcf9ca30
57+
size: 1666
58+
params:
59+
params.yaml:
60+
train.min_split: 0.01
61+
train.n_est: 50
62+
train.seed: 20170428
63+
outs:
64+
- path: model.pkl
65+
hash: md5
66+
md5: cfa72ff6e2575c44f78f423cada5b783
67+
size: 1855075

‎dvc.yaml‎

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,14 @@ artifacts:
33
path: data/data.xml
44
type: dataset
55
desc: Initial XML StackOverflow dataset (raw data)
6+
text-classification:
7+
path: model.pkl
8+
desc: Detect whether the given stackoverflow question should have R language tag
9+
type: model
10+
labels:
11+
- nlp
12+
- classification
13+
- stackoverflow
614
stages:
715
prepare:
816
cmd: python src/prepare.py data/data.xml
@@ -14,3 +22,24 @@ stages:
1422
- prepare.split
1523
outs:
1624
- data/prepared
25+
featurize:
26+
cmd: python src/featurization.py data/prepared data/features
27+
deps:
28+
- data/prepared
29+
- src/featurization.py
30+
params:
31+
- featurize.max_features
32+
- featurize.ngrams
33+
outs:
34+
- data/features
35+
train:
36+
cmd: python src/train.py data/features model.pkl
37+
deps:
38+
- data/features
39+
- src/train.py
40+
params:
41+
- train.min_split
42+
- train.n_est
43+
- train.seed
44+
outs:
45+
- model.pkl

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /