I have a series of data in a csv file as follows.
Hour L Dr Tag L 0 1 2 3 4 5 6 7 8 9 10
0 L5 XI PS 4R 6 3 6 6 5 6 1 9 11 2
0 L5 XI PS 4R 5 8 10 7 7 8 3 9 5 8
1 L0 St v2T 4R 1 0 0 0 0 0 0 0 0 6
1 L2 TI sst 4R 8 8 8 8 8 8 8 8 8 8
The first row denotes the column header. The data on the right of the column with header L are to be numbered sequentially from 0 - 59 out of which only data till 9 are shown.
As you can see the data is sorted based on the hour column i.e hour 0 followed by hour1
I want to change this to add the hour rows for hours "1 and up" as columns at the end of hour 0 rows. It should search for tag field of row 0 and update the 60 values as new columns at the end. The column header should be updated to denote hour.minute (e.g 0.0,0.1 ..... , 1.0,1.1.....)
In case you encouter a new tag that is not present for hour 0 it should be added and only the 60 values for that hour should be updated. All other values should be set to 0'
I am attemptin to do the above in python. As a very first step, I am trying to detect when there is a change in hour, once I do that I am planning to write code to read all records with hour n and merge the minute values to the right hand side based on the Tag.Is my approach correct ? Or can someone suggest a better approach ?
import csv
import os
import sys
from glob import glob
hour = 0
p_hour = -1
c_hour = -1
rownum = 0
row_header = []
file_list = []
def format_minute_field(row_header,hour):
hdr_len = len(row_header)
for i in range((hdr_len-60),hdr_len):
row_header[i] = '{}:{}'.format(hour,row_header[i])
return row_header
if __name__ == '__main__':
fd = open('test.csv','rt')
rownum = 0
reader = csv.reader(fd)
for row in reader:
if rownum == 0:
row_header = row
row_header = format_minute_field(row_header,0)
print('row_header {}'.format(row_header))
rownum +=1
else:
if rownum == 1:
previous_row = row
if (row[0] != previous_row[0]) and (rownum > 1):
print('hour changed from {} to {}'.format(previous_row[0],row[0]))
previous_row = row
rownum +=1
The expected output will be as follows. If a particular tag is present in hour 0 list as well as hour 1 list the values should be recorded as below. If a new tag is encountered while processing hour 1 recoreds it should be appended to the list.
Hour L Dr Tag L 0 0:1 0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 0:10 ..........0:59 1:0 1:1 1:2 1:3
-
1Can you provide the output that you expect for the example data?Colin– Colin2015年11月12日 01:50:15 +00:00Commented Nov 12, 2015 at 1:50
-
@Colin - Please see update to the question above.liv2hak– liv2hak2015年11月12日 02:19:21 +00:00Commented Nov 12, 2015 at 2:19
-
Is that just the expected header? Are you trying to collapse the data into a single row below that?Colin– Colin2015年11月12日 11:51:07 +00:00Commented Nov 12, 2015 at 11:51
1 Answer 1
Just because I thought it'd be fun, I produced some code. However, your test data doesn't really match what you explained, especially the two first data line have the same Tag, and it's not clear what to do then. Nevertheless, here is the code you can hack to your needs. Hope it helps.
The idea is to use DictReader and DictWriter to be able to manage cells with their column names, and not caring about the order they are read/written until it is really necessary. Then, I use a data dictionnary that helps me merge the rows together based on an arbitrary key, which is defined by the value of some specific cells, using the fact that tuples can be used as keys in dicts.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
# This will store the lines by tag, in order to join them
data = {}
# This will tell us how to order the columns, will be extended later
columns = ['L', 'Dr', 'Tag']
# Controls extension of the columns
maxhour = 0
# Controls the order the keys are found in the original CSV. Maybe not necessary
keys = []
with open("in.csv", 'r', newline='') as fin:
reader = csv.DictReader(fin)
for row in reader:
hour = int(row['Hour'])
# Form a unique key to match lines. Adjust to your needs
key = (row['L'], row['Dr'], row['Tag'])
if key not in data:
# This is a future row, a dict with column as key, cell as value
data[key] = {'L': row['L'], 'Dr': row['Dr'], 'Tag': row['Tag']}
# Remember the order we've seen the keys
keys.append(key)
# Now, add data to the row for each minutes
# 1 to 59
for minute in range(1,60):
# Copy data from column 'minute' to column 'Hour:minute'
src_colname = str(minute)
dest_colname = row['Hour'] + ':' + src_colname
data[key][dest_colname] = row[src_colname]
# There seems to be a special treatment for minute 0, at column "L 0"
if hour == 0:
data[key]['L 0'] = row['L 0']
else:
data[key][row['Hour'] + ':0'] = row['L 0']
# Plan to generate enough columns when writing resulting file
maxhour = max(maxhour, hour)
with open("out.csv", 'w', newline='') as fout:
# Okay, now everything was merged into data
# We need to tell DictWriter how to order columns
# Treat special first column
columns.append('L 0')
# Then add the rest
for hour in range(0, maxhour+1):
# Do not include "0:0"
for minute in range(0 if hour > 0 else 1,60):
columns.append('{:d}:{:d}'.format(hour, minute))
# Let's write that
writer = csv.DictWriter(fout, columns, restval = "0")
writer.writeheader()
for key in keys: # or for key in data.keys(): if you don't mind the order
writer.writerow(data[key])
The test data in.csv:
Hour,L,Dr,Tag,L 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,L5,XI,PS,4R,6,3,6,6,5,6,1,9,11,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L0,St,v2T,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L2,TI,sst,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L5,XI,PS,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
And output:
L,Dr,Tag,L 0,0:1,0:2,0:3,0:4,0:5,0:6,0:7,0:8,0:9,0:10,0:11,0:12,0:13,0:14,0:15,0:16,0:17,0:18,0:19,0:20,0:21,0:22,0:23,0:24,0:25,0:26,0:27,0:28,0:29,0:30,0:31,0:32,0:33,0:34,0:35,0:36,0:37,0:38,0:39,0:40,0:41,0:42,0:43,0:44,0:45,0:46,0:47,0:48,0:49,0:50,0:51,0:52,0:53,0:54,0:55,0:56,0:57,0:58,0:59,1:0,1:1,1:2,1:3,1:4,1:5,1:6,1:7,1:8,1:9,1:10,1:11,1:12,1:13,1:14,1:15,1:16,1:17,1:18,1:19,1:20,1:21,1:22,1:23,1:24,1:25,1:26,1:27,1:28,1:29,1:30,1:31,1:32,1:33,1:34,1:35,1:36,1:37,1:38,1:39,1:40,1:41,1:42,1:43,1:44,1:45,1:46,1:47,1:48,1:49,1:50,1:51,1:52,1:53,1:54,1:55,1:56,1:57,1:58,1:59
L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L0,St,v2T,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L2,TI,sst,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3 Comments
keys list to keep the order the tags were first seen.