Looking for direction

Wed May 13 21:12:53 EDT 2015

On 05/13/2015 08:45 PM, 20/20 Lab wrote:>
You accidentally replied to me, rather than the mailing list. Please 
use reply-list, or if your mailer can't handle that, do a Reply-All, and 
remove the parts you don't want.
 >
 > On 05/13/2015 05:07 PM, Dave Angel wrote:
 >> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
 >>> I'm a beginner to python. Reading here and there. Written a couple of
 >>> short and simple programs to make life easier around the office.
 >>>
 >> Welcome to Python, and to this mailing list.
 >>
 >>> That being said, I'm not even sure what I need to ask for. I've never
 >>> worked with external data before.
 >>>
 >>> I have a LARGE csv file that I need to process. 110+ columns, 72k
 >>> rows.
 >>
 >> That's not very large at all.
 >>
 > In the grand scheme, I guess not. However I'm currently doing this
 > whole process using office. So it can be a bit daunting.
I'm not familiar with the "office" operating system.
 >>> I managed to write enough to reduce it to a few hundred rows, and
 >>> the five columns I'm interested in.
 >>
 >>>
 >>> Now is were I have my problem:
 >>>
 >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
 >>> [72976, "YYY", "Item", "Qty", "Noise"],
 >>> [123, "XXX" "ItemTypo", "Qty", "Noise"] ]
 >>>
 >>
 >> It'd probably be useful to identify names for your columns, even if
 >> it's just in a comment. Guessing from the paragraph below, I figure
 >> the first two columns are "account" & "staff"
 >
 > The columns that I pull are Account, Staff, Item Sold, Quantity sold,
 > and notes about the sale (notes arent particularly needed, but the
 > higher ups would like them in the report)
 >>
 >>> Basically, I need to check for rows with duplicate accounts row[0] and
 >>> staff (row[1]), and if so, remove that row, and add it's Qty to the
 >>> original row.
 >>
 >> And which column is that supposed to be? Shouldn't there be a number
 >> there, rather than a string?
 >>
 >>> I really dont have a clue how to go about this. The
 >>> number of rows change based on which run it is, so I couldnt even get
 >>> away with using hundreds of compare loops.
 >>>
 >>> If someone could point me to some documentation on the functions I 
would
 >>> need, or a tutorial it would be a great help.
 >>>
 >>
 >> Is the order significant? Do you have to preserve the order that the
 >> accounts appear? I'll assume not.
 >>
 >> Have you studied dictionaries? Seems to me the way to handle the
 >> problem is to read in a row, create a dictionary with key of (account,
 >> staff), and data of the rest of the line.
 >>
 >> Each time you read a row, you check if the key is already in the
 >> dictionary. If not, add it. If it's already there, merge the data as
 >> you say.
 >>
 >> Then when you're done, turn the dict back into a list of lists.
 >>
 > The order is irrelevant. No, I've not really studied dictionaries, but
 > a few people have mentioned it. I'll have to read up on them and, more
 > importantly, their applications. Seems that they are more versatile
 > then I thought.
 >
 > Thank you.
You have to realize that a tuple can be used as a key, in your case a 
tuple of Account and Staff.
You'll have to decide how you're going to merge the ItemSold, 
QuantitySold, and notes.
-- 
DaveA
-- 
DaveA