Processing and Formatting a csv

Question 1

I have a series of data in a csv file as follows.

 Hour L Dr Tag L 0 1 2 3 4 5 6 7 8 9 10
 0 L5 XI PS 4R 6 3 6 6 5 6 1 9 11 2 
 0 L5 XI PS 4R 5 8 10 7 7 8 3 9 5 8 
 1 L0 St v2T 4R 1 0 0 0 0 0 0 0 0 6
 1 L2 TI sst 4R 8 8 8 8 8 8 8 8 8 8

The first row denotes the column header. The data on the right of the column with header L are to be numbered sequentially from 0 - 59 out of which only data till 9 are shown.

As you can see the data is sorted based on the hour column i.e hour 0 followed by hour1

I want to change this to add the hour rows for hours "1 and up" as columns at the end of hour 0 rows. It should search for tag field of row 0 and update the 60 values as new columns at the end. The column header should be updated to denote hour.minute (e.g 0.0,0.1 ..... , 1.0,1.1.....)

In case you encouter a new tag that is not present for hour 0 it should be added and only the 60 values for that hour should be updated. All other values should be set to 0'

I am attemptin to do the above in python. As a very first step, I am trying to detect when there is a change in hour, once I do that I am planning to write code to read all records with hour n and merge the minute values to the right hand side based on the Tag.Is my approach correct ? Or can someone suggest a better approach ?

import csv
import os
import sys
from glob import glob
hour = 0
p_hour = -1
c_hour = -1
rownum = 0
row_header = []
file_list = []
def format_minute_field(row_header,hour):
 hdr_len = len(row_header)
 for i in range((hdr_len-60),hdr_len):
 row_header[i] = '{}:{}'.format(hour,row_header[i]) 
 return row_header
if __name__ == '__main__':
 fd = open('test.csv','rt')
 rownum = 0
 reader = csv.reader(fd)
 for row in reader:
 if rownum == 0:
 row_header = row
 row_header = format_minute_field(row_header,0)
 print('row_header {}'.format(row_header))
 rownum +=1
 else:
 if rownum == 1:
 previous_row = row
 if (row[0] != previous_row[0]) and (rownum > 1):
 print('hour changed from {} to {}'.format(previous_row[0],row[0]))
 previous_row = row
 rownum +=1

The expected output will be as follows. If a particular tag is present in hour 0 list as well as hour 1 list the values should be recorded as below. If a new tag is encountered while processing hour 1 recoreds it should be appended to the list.

 Hour L Dr Tag L 0 0:1 0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 0:10 ..........0:59 1:0 1:1 1:2 1:3

Question 2

Can you provide the output that you expect for the example data?

Question 3

@Colin - Please see update to the question above.

Question 4

Is that just the expected header? Are you trying to collapse the data into a single row below that?

Question 5

Just because I thought it'd be fun, I produced some code. However, your test data doesn't really match what you explained, especially the two first data line have the same Tag, and it's not clear what to do then. Nevertheless, here is the code you can hack to your needs. Hope it helps.

The idea is to use DictReader and DictWriter to be able to manage cells with their column names, and not caring about the order they are read/written until it is really necessary. Then, I use a data dictionnary that helps me merge the rows together based on an arbitrary key, which is defined by the value of some specific cells, using the fact that tuples can be used as keys in dicts.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
# This will store the lines by tag, in order to join them
data = {}
# This will tell us how to order the columns, will be extended later
columns = ['L', 'Dr', 'Tag']
# Controls extension of the columns
maxhour = 0
# Controls the order the keys are found in the original CSV. Maybe not necessary
keys = []
with open("in.csv", 'r', newline='') as fin:
 reader = csv.DictReader(fin)
 for row in reader:
 hour = int(row['Hour'])
 # Form a unique key to match lines. Adjust to your needs
 key = (row['L'], row['Dr'], row['Tag'])
 if key not in data:
 # This is a future row, a dict with column as key, cell as value
 data[key] = {'L': row['L'], 'Dr': row['Dr'], 'Tag': row['Tag']}
 # Remember the order we've seen the keys
 keys.append(key)
 # Now, add data to the row for each minutes
 # 1 to 59
 for minute in range(1,60):
 # Copy data from column 'minute' to column 'Hour:minute'
 src_colname = str(minute)
 dest_colname = row['Hour'] + ':' + src_colname
 data[key][dest_colname] = row[src_colname]
 # There seems to be a special treatment for minute 0, at column "L 0"
 if hour == 0:
 data[key]['L 0'] = row['L 0']
 else:
 data[key][row['Hour'] + ':0'] = row['L 0']
 # Plan to generate enough columns when writing resulting file
 maxhour = max(maxhour, hour)
with open("out.csv", 'w', newline='') as fout:
 # Okay, now everything was merged into data
 # We need to tell DictWriter how to order columns
 # Treat special first column
 columns.append('L 0')
 # Then add the rest
 for hour in range(0, maxhour+1):
 # Do not include "0:0"
 for minute in range(0 if hour > 0 else 1,60):
 columns.append('{:d}:{:d}'.format(hour, minute))
 # Let's write that
 writer = csv.DictWriter(fout, columns, restval = "0")
 writer.writeheader()
 for key in keys: # or for key in data.keys(): if you don't mind the order
 writer.writerow(data[key])

The test data in.csv:

Hour,L,Dr,Tag,L 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,L5,XI,PS,4R,6,3,6,6,5,6,1,9,11,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L0,St,v2T,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L2,TI,sst,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L5,XI,PS,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

And output:

L,Dr,Tag,L 0,0:1,0:2,0:3,0:4,0:5,0:6,0:7,0:8,0:9,0:10,0:11,0:12,0:13,0:14,0:15,0:16,0:17,0:18,0:19,0:20,0:21,0:22,0:23,0:24,0:25,0:26,0:27,0:28,0:29,0:30,0:31,0:32,0:33,0:34,0:35,0:36,0:37,0:38,0:39,0:40,0:41,0:42,0:43,0:44,0:45,0:46,0:47,0:48,0:49,0:50,0:51,0:52,0:53,0:54,0:55,0:56,0:57,0:58,0:59,1:0,1:1,1:2,1:3,1:4,1:5,1:6,1:7,1:8,1:9,1:10,1:11,1:12,1:13,1:14,1:15,1:16,1:17,1:18,1:19,1:20,1:21,1:22,1:23,1:24,1:25,1:26,1:27,1:28,1:29,1:30,1:31,1:32,1:33,1:34,1:35,1:36,1:37,1:38,1:39,1:40,1:41,1:42,1:43,1:44,1:45,1:46,1:47,1:48,1:49,1:50,1:51,1:52,1:53,1:54,1:55,1:56,1:57,1:58,1:59
L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L0,St,v2T,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L2,TI,sst,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Question 6

What order does dictReader display key-value pairs. It doesn't seem to be in any particular order.If I did a print row in the code the order of key value paris seems to be quite random.

Question 7

Sure. With DictReader, rows are dictionnaries. Dictionnaries are hash tables, which are unordered types. You don't access the cells by using their column number anymore, but their column name. This is also the reason you provide the columns argument (a list, i.e. an ordered type) to DictWriter, so it knows in which order to write the columns. And finally, this is the reason I used the keys list to keep the order the tags were first seen.

Question 8

thanks. Can you take a look at stackoverflow.com/questions/33681911/…

Cilyan 8,6011 gold badge32 silver badges39 bronze badges · Accepted Answer · 2015-11-12 03:41:33Z

Just because I thought it'd be fun, I produced some code. However, your test data doesn't really match what you explained, especially the two first data line have the same Tag, and it's not clear what to do then. Nevertheless, here is the code you can hack to your needs. Hope it helps.

The idea is to use DictReader and DictWriter to be able to manage cells with their column names, and not caring about the order they are read/written until it is really necessary. Then, I use a data dictionnary that helps me merge the rows together based on an arbitrary key, which is defined by the value of some specific cells, using the fact that tuples can be used as keys in dicts.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
# This will store the lines by tag, in order to join them
data = {}
# This will tell us how to order the columns, will be extended later
columns = ['L', 'Dr', 'Tag']
# Controls extension of the columns
maxhour = 0
# Controls the order the keys are found in the original CSV. Maybe not necessary
keys = []
with open("in.csv", 'r', newline='') as fin:
 reader = csv.DictReader(fin)
 for row in reader:
 hour = int(row['Hour'])
 # Form a unique key to match lines. Adjust to your needs
 key = (row['L'], row['Dr'], row['Tag'])
 if key not in data:
 # This is a future row, a dict with column as key, cell as value
 data[key] = {'L': row['L'], 'Dr': row['Dr'], 'Tag': row['Tag']}
 # Remember the order we've seen the keys
 keys.append(key)
 # Now, add data to the row for each minutes
 # 1 to 59
 for minute in range(1,60):
 # Copy data from column 'minute' to column 'Hour:minute'
 src_colname = str(minute)
 dest_colname = row['Hour'] + ':' + src_colname
 data[key][dest_colname] = row[src_colname]
 # There seems to be a special treatment for minute 0, at column "L 0"
 if hour == 0:
 data[key]['L 0'] = row['L 0']
 else:
 data[key][row['Hour'] + ':0'] = row['L 0']
 # Plan to generate enough columns when writing resulting file
 maxhour = max(maxhour, hour)
with open("out.csv", 'w', newline='') as fout:
 # Okay, now everything was merged into data
 # We need to tell DictWriter how to order columns
 # Treat special first column
 columns.append('L 0')
 # Then add the rest
 for hour in range(0, maxhour+1):
 # Do not include "0:0"
 for minute in range(0 if hour > 0 else 1,60):
 columns.append('{:d}:{:d}'.format(hour, minute))
 # Let's write that
 writer = csv.DictWriter(fout, columns, restval = "0")
 writer.writeheader()
 for key in keys: # or for key in data.keys(): if you don't mind the order
 writer.writerow(data[key])

The test data in.csv:

Hour,L,Dr,Tag,L 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,L5,XI,PS,4R,6,3,6,6,5,6,1,9,11,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L0,St,v2T,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L2,TI,sst,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L5,XI,PS,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

And output:

L,Dr,Tag,L 0,0:1,0:2,0:3,0:4,0:5,0:6,0:7,0:8,0:9,0:10,0:11,0:12,0:13,0:14,0:15,0:16,0:17,0:18,0:19,0:20,0:21,0:22,0:23,0:24,0:25,0:26,0:27,0:28,0:29,0:30,0:31,0:32,0:33,0:34,0:35,0:36,0:37,0:38,0:39,0:40,0:41,0:42,0:43,0:44,0:45,0:46,0:47,0:48,0:49,0:50,0:51,0:52,0:53,0:54,0:55,0:56,0:57,0:58,0:59,1:0,1:1,1:2,1:3,1:4,1:5,1:6,1:7,1:8,1:9,1:10,1:11,1:12,1:13,1:14,1:15,1:16,1:17,1:18,1:19,1:20,1:21,1:22,1:23,1:24,1:25,1:26,1:27,1:28,1:29,1:30,1:31,1:32,1:33,1:34,1:35,1:36,1:37,1:38,1:39,1:40,1:41,1:42,1:43,1:44,1:45,1:46,1:47,1:48,1:49,1:50,1:51,1:52,1:53,1:54,1:55,1:56,1:57,1:58,1:59
L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L0,St,v2T,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L2,TI,sst,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

What order does dictReader display key-value pairs. It doesn't seem to be in any particular order.If I did a print row in the code the order of key value paris seems to be quite random.
Sure. With DictReader, rows are dictionnaries. Dictionnaries are hash tables, which are unordered types. You don't access the cells by using their column number anymore, but their column name. This is also the reason you provide the columns argument (a list, i.e. an ordered type) to DictWriter, so it knows in which order to write the columns. And finally, this is the reason I used the keys list to keep the order the tags were first seen.
thanks. Can you take a look at stackoverflow.com/questions/33681911/…

CollectivesTM on Stack Overflow

Processing and Formatting a csv

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related