1
- # Homework
1
+ # BMI 565/665 Bioinformatics Programming and Scripting
2
2
3
3
Submit source code and write-up (including program output) through Sakai.
4
4
5
+
5
6
## Background
6
7
7
8
A bunch of your friends really like wine, specifically Portuguese wine. One
@@ -52,14 +53,12 @@ You like to deal with comma-separated files (CSVs). Unfortunately, you find out
52
53
that the data comes in a "semi-colon" separated file.
53
54
54
55
Use ` sed ` to convert these "semi-colon" separated files into a comma-separated
55
- files.
56
-
57
- Save these converted data into the directory ` data ` .
56
+ files, and save these converted data into the directory ` data ` .
58
57
59
58
60
59
** Subset Data**
61
60
62
- For your analysis you only want a couple physicochemical variables to check.
61
+ For your analysis, you only want a couple physicochemical variables to check.
63
62
There are a total of 12 variables, but you're only interested in:
64
63
65
64
- Citric acid
@@ -79,28 +78,28 @@ for good wine.
79
78
| ` white_wine_poor.csv ` | <= 5 | Poor quality white wine |
80
79
| ` white_wine_good.csv ` | > 5 | Good quality white wine |
81
80
81
+ ** Hint** : ` awk ` can be used to quickly subset the data and create the 4 files.
82
+
82
83
Put there four files into the ` data ` directory.
83
84
84
85
85
86
** Compare Low and High Quality**
86
87
87
- Let's use Python to help us figure out what makes wine good or not.
88
-
89
- Create a Python function to read in data from a given path and calculate the
90
- average value of a given variable name.
88
+ Now use Python code to determine what makes wine good or not. Create a Python
89
+ function to read in data from a given path and calculate the average value of a
90
+ given variable name.
91
91
92
92
``` python
93
93
# Example
94
94
avg_chloride_results = calculate_avg_value(data, " chlorides" )
95
95
```
96
96
97
- You want to be lazy and automate as much as possible. So let's create a Python
98
- function that takes in an array of the file names and returns a dictionary.
97
+ You want to automate this as much as possible. So create a Python function
98
+ that takes in a list of the file names and returns a dictionary.
99
99
100
- The dictionary will have four keys equal to just the file names they come from
101
- e.g. the key of ` white_wine_good.csv ` will be ` white_wine_good ` . The values of
102
- each key will be another dictionary with each key being the average value of
103
- one of the four variables we're interested in:
100
+ The dictionary will have four keys equal to the file names (e.g. the key of ` white_wine_good.csv ` will be ` white_wine_good ` ). The values of
101
+ each filename key will be another dictionary with keys being the averages of
102
+ each variable:
104
103
105
104
- Citric acid
106
105
- Chlorides
@@ -116,37 +115,16 @@ avg_values = find_average_wines(wine_paths)
116
115
117
116
** Save Results**
118
117
119
-
120
- Write a Python function to save your dictionary of results to four separate
121
- files. Save your dictionaries as JavaScript Object Notation (JSON) files.
122
-
123
- Use the built-in ` json ` Python package. Here's a hint on using it.
124
-
125
- ``` python
126
- # Example on using the json package
127
- import json
128
-
129
- your_dictionary = {" some_date" : " date" }
130
- f = open (' destFile.txt' , ' w+' )
131
- f.write(json.dumps(your_dictionary))
132
- f.close()
133
- ```
134
-
135
- Save your four results into a directory ` results ` .
118
+ Use the ` cPickle ` Python module to save the resulting dictionary to a file in a
119
+ directory called ` results ` (Note: you'll have to create this directory
120
+ beforehand).
136
121
137
122
138
- ** Challenge **
123
+ ** Wrap-Up Workflow **
139
124
140
- You want to automate everything as much as possible, so you want to create a
141
- Makefile to make everything. There are two Make rule: ` all ` and ` clean ` .
142
-
143
- ``` shell
144
- # Run the entire analysis
145
- make all
146
-
147
- # Remove all downloaded and created files from data/, download/, results/
148
- make clean
149
- ```
125
+ Now, to automate the entire workflow, create bash scripts that will
126
+ automatically download and subset the data, then run the analysis (calculating
127
+ average values) and save the results.
150
128
151
129
152
130
## Homework File Structure
@@ -156,8 +134,8 @@ analysis.
156
134
157
135
```
158
136
.
159
- |-- analyze_wine.py
160
- |-- analysis.sh
137
+ |-- LastName_hw2.sh
138
+ |-- LastName_analyze_wine.py
161
139
|-- data/
162
140
|-- results/
163
141
`-- download/
@@ -169,3 +147,5 @@ analysis.
169
147
- A single bash script to automate your analysis
170
148
- A Python script to calculate the average citric acid, chlorides, pH, and
171
149
alcohol values of good and poor quality red and white wine.
150
+ - A brief write-up describing the workflow that was implemented and results
151
+ produced (` LastName_hw2.doc ` )
0 commit comments