Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 865c4c1

Browse files
Add draft bash programming homework
1 parent 367603c commit 865c4c1

File tree

1 file changed

+161
-2
lines changed

1 file changed

+161
-2
lines changed

‎hw/homework.md‎

Lines changed: 161 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,169 @@
22

33
Submit source code and write-up (including program output) through Sakai.
44

5-
## Instructions
5+
## Background
66

7+
A bunch of your friends really like wine, specifically Portuguese wine. One
8+
night you all are up all night debating on what physicohemical aspects of wine
9+
(like pH or acidity) make good wine. Being the sleuth you are, you find out
10+
that there happens to be [a study and dataset][wine] looking at just this!
711

12+
You are conveniently learning Python and bash scripting, and figured this may
13+
be a good opportunity to provide some evidence for what may be contributing to
14+
good wine.
815

9-
## Deliverables
16+
[wine]: http://archive.ics.uci.edu/ml/datasets/Wine+Qualityhttp://archive.ics.uci.edu/ml/datasets/Wine+Quality
17+
18+
19+
## Problem
20+
21+
The study you reference looked at both red and white wine and you want to find
22+
out what makes good red and white wine. You wish to conduct a very simple
23+
analysis.
24+
25+
26+
## Instruction
27+
28+
Create a bash script to automate the entirety of your data acquisition and
29+
analysis to faithfully reproduce your analysis. Your analysis will contain
30+
Python scripts as well.
31+
32+
33+
**Download Data**
34+
35+
Use wget or cURL to help [download the data][data].
36+
37+
| Wine Type | File Name |
38+
|-----------|-------------------------|
39+
| Red | `winequality-red.csv` |
40+
| White | `winequality-white.csv` |
41+
42+
Download these data into a directory named `download`.
43+
44+
**Hint**: Use `mkdir -p` to create a directory if it doesn't exist yet.
45+
46+
[data]: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/
47+
48+
49+
**Convert Data**
50+
51+
You like to deal with comma-separated files (CSVs). Unfortunately, you find out
52+
that the data comes in a "semi-colon" separated file.
53+
54+
Use `sed` to convert these "semi-colon" separated files into a comma-separated
55+
files.
56+
57+
Save these converted data into the directory `data`.
58+
59+
60+
**Subset Data**
61+
62+
For your analysis you only want a couple physicochemical variables to check.
63+
There are a total of 12 variables, but you're only interested in:
64+
65+
- Citric acid
66+
- Chlorides
67+
- pH
68+
- Alcohol
69+
- Quality (your outcome)
70+
71+
In addition to these variables, you want only the good wine and the bad quality
72+
wine. Create four datasets, each with the threshold of 5 as being the cutoff
73+
for good wine.
74+
75+
| File Name | Quality Threshold | Description |
76+
|-----------------------|-------------------|-------------------------|
77+
| `red_wine_poor.csv` | <= 5 | Poor quality red wine |
78+
| `red_wine_good.csv` | > 5 | Good quality red wine |
79+
| `white_wine_poor.csv` | <= 5 | Poor quality white wine |
80+
| `white_wine_good.csv` | > 5 | Good quality white wine |
81+
82+
Put there four files into the `data` directory.
83+
84+
85+
**Compare Low and High Quality**
1086

87+
Let's use Python to help us figure out what makes wine good or not.
88+
89+
Create a Python function to read in data from a given path and calculate the
90+
average value of a given variable name.
91+
92+
```python
93+
# Example use
94+
avg_chloride_results = calculate_avg_value(data, "chlorides")
95+
```
96+
97+
You want to be lazy and automate as much as possible. So let's create a Python
98+
function that takes in an array of the file names and returns a dictionary.
99+
100+
The dictionary will have four keys equal to just the file names they come from
101+
e.g. the key of `white_wine_good.csv` will be `white_wine_good`. The values of
102+
each key will be another dictionary with each key being the average value of
103+
one of the four variables we're interested in:
104+
105+
- Citric acid
106+
- Chlorides
107+
- pH
108+
- Alcohol
109+
110+
```python
111+
wine_paths = ["white_wine_good.csv", ...]
112+
avg_values = find_average_wines(wine_paths)
113+
```
114+
115+
116+
**Save Results**
117+
118+
119+
Write a Python function to save your dictionary of results to four separate
120+
files. Save your dictionaries as JavaScript Object Notation (JSON) files.
121+
122+
Use the built-in `json` Python package. Here's a hint on using it.
123+
124+
```python
125+
import json
126+
127+
your_dictionary = {"some_date" : "date"}
128+
f = open('destFile.txt', 'w+')
129+
f.write(json.dumps(your_dictionary))
130+
f.close()
131+
```
132+
133+
Save your four results into a directory `results`.
134+
135+
136+
**Challenge**
137+
138+
You want to automate everything as much as possible, so you want to create a
139+
Makefile to make everything.
140+
141+
142+
```shell
143+
# Run the entire analysis
144+
make all
145+
146+
# Remove all downloaded and intermediate files from data/, download/, results/
147+
make clean
148+
```
149+
150+
151+
## Homework File Structure
152+
153+
To make things organized, please use the following structure for your data
154+
analysis.
155+
156+
```
157+
.
158+
|-- analyze_wine.py
159+
|-- analysis.sh
160+
|-- data/
161+
|-- results/
162+
`-- download/
163+
```
164+
165+
166+
## Deliverables
11167

168+
- A single bash script to automate your analysis
169+
- A Python script to calculate the average citric acid, chlorides, pH, and
170+
alcohol values of good and poor quality red and white wine.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /