5
\$\begingroup\$

I made this little tool for something I was trying to accomplish at work. It works as intended but a personal goal of mine is to improve my code design so it's more simple, readable, and contributable.

Here is the repo but a snippet of relevant code is at bottom of the question.

The main thing I'm seeking help on is the passing of arguments through the whole stack. In the below code abstraction (or the repo, if you checked that out), you'll notice that function definitions and calling the functions involve passing a ton of arguments repeatedly. I'd prefer to make that cleaner if possible. I've looked into *args and **kwargs but not sure that's necessarily a best practice here. I'm also thinking about using Dataclasses or NamedTuples, perhaps. BTW, I'm leaning toward NamedTuples because they can be iterated on. I guess Dataclasses can't so that might be too inflexible for me? I.e., if I created a NamedTuple called args out of the args from parse_args(), I would want to pass it to functions as *args, right? Only doable with a NamedTuple?

Update: Added full code below:

 
import time
import argparse
import random
import string
import datetime
letters = string.ascii_lowercase
def _gen_string(num):
 result_str = ''.join(random.choice(letters) for i in range(num))
 return(result_str)
def _gen_int(num):
 lower_bound = 10 ** (num - 1)
 upper_bound = 10 ** num - 1
 return(random.randint(lower_bound,upper_bound))
def gen_tag(tag_key_size, tag_value_size, key=''):
 # return string Tag key-value pair
 if key:
 held_key = 'tag_' + key
 held_key = held_key[:-4]
 val = _gen_string(tag_value_size)
 pair = f",{held_key}={val}"
 else:
 key = 'tag_' + _gen_string((tag_key_size-4))
 val = _gen_string(tag_value_size)
 pair = f",{key}={val}"
 return(pair)
def gen_str_field(field_key_size, field_value_size, key=''):
 # return string Tag key-value pair
 if key:
 held_key = 'str_' + key
 held_key = held_key[:-4]
 val = _gen_string(field_key_size)
 pair = f',{held_key}="{val}"'
 else:
 key = 'str_' + _gen_string(field_key_size-4)
 val = _gen_string(field_value_size)
 pair = f',{key}="{val}"'
 return(pair)
def gen_int_field(field_key_size, int_value_size, key=''):
 # return int Field key-value pair
 if key:
 held_key = 'int_' + key
 held_key = held_key[:-4]
 val = _gen_int(int_value_size)
 pair = f",{held_key}={val}"
 else:
 key = 'int_' + _gen_string(field_key_size-4)
 val = _gen_int(int_value_size)
 pair = f",{key}={val}i"
 return(pair)
def gen_float_field(field_key_size, float_value_size, key=''):
 # return float Field key-value pair
 if key:
 held_key = 'fl_' + key
 held_key = held_key[:-3]
 val = _gen_int(float_value_size)
 pair = f",{held_key}={val}"
 else:
 key = 'fl_' + _gen_string(field_key_size-3)
 val = round(random.uniform(10,99), float_value_size-2)
 pair = f",{key}={val}"
 return(pair)
def gen_ts(precision = 's'):
 now = datetime.datetime.now()
 ts = now.timestamp()
 if precision == ('s' or 'S'):
 ts = round(ts)
 return(ts*1000000000)
 elif precision == ('ms' or 'MS'):
 ts = round(ts*1000)
 return(ts*1000000)
 elif precision == ('us' or 'US'):
 ts = round(ts*1000000)
 return(ts*1000)
 elif precision == ('ns' or 'NS'):
 ts = round(ts*1000000000)
 return(ts)
 else:
 print("Warn: gen_ts() only takes `s`, `ms`, `us`, or `ns` as inputs")
 return ts
# Return set of Tag key-value pairs with trailing space
def gen_tagset(num_tags, tag_key_size, tag_value_size, tag_keys):
 if tag_keys:
 tagset = ''.join(gen_tag(tag_key_size,tag_value_size, key=i) for i in tag_keys) + ' ' 
 else:
 tagset = ''.join(gen_tag(tag_key_size,tag_value_size) for i in range(num_tags)) + ' '
 
 return(tagset[1:]) # remove leading comma
# Following 3 functions are helper functions for `gen_fieldset()`
def _gen_int_fieldset(int_fields, field_key_size, int_value_size, int_field_keys):
 if int_field_keys:
 int_fieldset = ''.join(gen_int_field(field_key_size,int_value_size, key=i) for i in int_field_keys) 
 else:
 int_fieldset = ''.join(gen_int_field(field_key_size,int_value_size) for i in range(int_fields))
 return(int_fieldset[1:])
def _gen_float_fieldset(float_fields, field_key_size, float_value_size, float_field_keys):
 if float_field_keys:
 float_fieldset = ''.join(gen_float_field(field_key_size,float_value_size, key=i) for i in float_field_keys) 
 else:
 float_fieldset = ''.join(gen_float_field(field_key_size,float_value_size) for i in range(float_fields))
 
 return(float_fieldset[1:])
def _gen_str_fieldset(str_fields, field_key_size, str_value_size, str_field_keys):
 if str_field_keys:
 str_fieldset = ''.join(gen_str_field(field_key_size,str_value_size, key=i) for i in str_field_keys)
 else:
 str_fieldset = ''.join(gen_str_field(field_key_size,str_value_size) for i in range(str_fields))
 return(str_fieldset[1:])
# Use fieldset helper functions to generate a full fieldset
def gen_fieldset(int_fields, 
 float_fields, 
 str_fields, 
 field_key_size,
 int_value_size,
 float_value_size,
 str_value_size,
 int_field_keys,
 float_field_keys,
 str_field_keys):
 int_fieldset = _gen_int_fieldset(int_fields, field_key_size, int_value_size, int_field_keys)
 float_fieldset = _gen_float_fieldset(float_fields, field_key_size, float_value_size, float_field_keys)
 str_fieldset = _gen_str_fieldset(str_fields, field_key_size, str_value_size, str_field_keys)
 fieldset = ''
 if int_fieldset:
 fieldset += f"{int_fieldset},"
 if float_fieldset:
 fieldset += f"{float_fieldset},"
 if str_fieldset:
 fieldset += f"{str_fieldset},"
 return(fieldset[:-1]+' ') # remove leading and trailing comma
def gen_line(measurement,
 num_tags,
 int_fields,
 float_fields,
 str_fields,
 tag_key_size,
 tag_value_size,
 field_key_size,
 int_value_size,
 float_value_size,
 str_value_size,
 precision,
 tag_keys,
 int_field_keys,
 float_field_keys,
 str_field_keys):
 tagset = gen_tagset(num_tags, tag_key_size, tag_value_size, tag_keys)
 fieldset = gen_fieldset(int_fields, float_fields, str_fields, field_key_size, int_value_size, float_value_size, str_value_size, int_field_keys, float_field_keys, str_field_keys)
 timestamp = gen_ts(precision)
 line = f"{measurement},{tagset}{fieldset}{timestamp}"
 return(line)
def gen_batch(measurement, 
 batch_size,
 num_tags,
 int_fields,
 float_fields,
 str_fields,
 tag_key_size,
 tag_value_size,
 field_key_size,
 int_value_size,
 float_value_size,
 str_value_size,
 precision,
 keep_keys_batch,
 keep_keys_session,
 tag_keys,
 int_field_keys,
 float_field_keys,
 str_field_keys):
 # Generate keys at batch level if `hold_keys` is True. 
 # This will keep Tag keys constant per line, reducing Series creation at database level.
 if keep_keys_session:
 tag_keys = tag_keys
 int_field_keys = int_field_keys
 float_field_keys = float_field_keys
 str_field_keys = str_field_keys
 elif keep_keys_batch:
 tag_keys = [_gen_string(tag_key_size) for i in range(num_tags)]
 int_field_keys = [_gen_string(tag_key_size) for i in range(int_fields)]
 float_field_keys = [_gen_string(tag_key_size) for i in range(float_fields)]
 str_field_keys = [_gen_string(tag_key_size) for i in range(str_fields)]
 else:
 tag_keys = []
 int_field_keys = []
 float_field_keys = []
 str_field_keys = []
 
 batch = []
 for i in range(batch_size):
 line = gen_line(measurement, 
 num_tags,
 int_fields,
 float_fields,
 str_fields,
 tag_key_size,
 tag_value_size,
 field_key_size,
 int_value_size,
 float_value_size,
 str_value_size,
 precision,
 tag_keys,
 int_field_keys,
 float_field_keys,
 str_field_keys)
 batch.append(line)
 return(batch)
if __name__ == "__main__":
 parser = argparse.ArgumentParser(description="Generate a batch of Line Protocol points of a specified shape")
 parser.add_argument('measurement', type=str, default='cpu')
 parser.add_argument('--batch_size', type=int, default=10)
 parser.add_argument('--num_tags', type=int, default=3)
 parser.add_argument('--int_fields', type=int, default=2)
 parser.add_argument('--float_fields', type=int, default=1)
 parser.add_argument('--str_fields', type=int, default=1)
 parser.add_argument('--tag_key_size', type=int, default=8)
 parser.add_argument('--tag_value_size', type=int, default=10) 
 parser.add_argument('--field_key_size', type=int, default=8)
 parser.add_argument('--int_value_size', type=int, default=4)
 parser.add_argument('--float_value_size', type=int, default=4)
 parser.add_argument('--str_value_size', type=int, default=8)
 parser.add_argument('--precision', type=str, choices = ['s','S','ms','MS','us','US','ns','NS'], default='s')
 parser.add_argument('--keep_keys_batch', action='store_true')
 parser.add_argument('--keep_keys_session', action='store_true', help="")
 args = parser.parse_args()
 if args.keep_keys_session:
 tag_keys = [_gen_string(args.tag_key_size) for i in range(args.num_tags)]
 int_field_keys = [_gen_string(args.tag_key_size) for i in range(args.int_fields)]
 float_field_keys = [_gen_string(args.tag_key_size) for i in range(args.float_fields)]
 str_field_keys = [_gen_string(args.tag_key_size) for i in range(args.str_fields)]
 args.keep_keys_batch = True
 else:
 tag_keys = []
 int_field_keys = []
 float_field_keys = []
 str_field_keys = []
 batch = gen_batch(args.measurement, 
 args.batch_size,
 args.num_tags,
 args.int_fields,
 args.float_fields,
 args.str_fields,
 args.tag_key_size,
 args.tag_value_size,
 args.field_key_size,
 args.int_value_size,
 args.float_value_size,
 args.str_value_size,
 args.precision,
 args.keep_keys_batch,
 args.keep_keys_session,
 tag_keys,
 int_field_keys,
 float_field_keys,
 str_field_keys)
 for line in batch:
 print(line)

The entire set of arguments is generated from flags at the command line so I generate a Namespace of these args. I imagine there's a much cleaner way to pass all these arguments through the stack of this application?

Thanks in advance to anyone willing to help out here!

asked Nov 11, 2020 at 22:30
\$\endgroup\$
5
  • \$\begingroup\$ @Reinderien Ok -- I added the full code. It's laid out differently than this in the repo but I moved all the code into a single place for this. \$\endgroup\$ Commented Nov 11, 2020 at 23:00
  • \$\begingroup\$ Yep will fix. Was deleting calls the module that I flattened out of this. \$\endgroup\$ Commented Nov 11, 2020 at 23:03
  • 1
    \$\begingroup\$ @Reinderien errors should be fixed. Sorry about that. turning that into one file led to some hiccups. \$\endgroup\$ Commented Nov 11, 2020 at 23:12
  • \$\begingroup\$ No worries; it's basically a good question now 😊 \$\endgroup\$ Commented Nov 11, 2020 at 23:20
  • 1
    \$\begingroup\$ refactoring.guru/smells/long-parameter-list \$\endgroup\$ Commented Nov 11, 2020 at 23:54

2 Answers 2

5
\$\begingroup\$

I'm not sure why you think that kwargs isn't a good idea. It would simplify the code quite a bit, and it is a pretty common way of doing this. You could do a data class, but it's trivial to copy the argparse args into a dictionary using the vars() function.

I didn't redo all of your code, but you can get the idea from this snippet:

import argparse
def gen_batch(**kwargs):
 # Generate keys at batch level if `hold_keys` is True. 
 # This will keep Tag keys constant per line, reducing Series creation at database level.
 # Simplified...
 # if keep_keys_session:
 # tag_keys = tag_keys
 # int_field_keys = int_field_keys
 # float_field_keys = float_field_keys
 # str_field_keys = str_field_keys
 if not keep_keys_session and keep_keys_batch:
 tag_key_size = kwargs.get('tag_key_size')
 kwargs['tag_keys'] = [_gen_string(tag_key_size) for i in range(kwargs.get('num_tags'))]
 kwargs['int_field_keys'] = [_gen_string(tag_key_size) for i in range(kwargs.get('int_fields'))]
 kwargs['float_field_keys'] = [_gen_string(tag_key_size) for i in range(kwargs.get('float_fields'))]
 kwargs['str_field_keys'] = [_gen_string(tag_key_size) for i in range(kwargs.get('str_fields'))]
 elif not keep_keys_session:
 kwargs['tag_keys'] = []
 kwargs['int_field_keys'] = []
 kwargs['float_field_keys'] = []
 kwargs['str_field_keys'] = []
 
 batch = []
 for i in range(batch_size):
 line = gen_line(**kwargs)
 batch.append(line)
 return(batch) 
if __name__ == "__main__":
 parser = argparse.ArgumentParser(description="Generate a batch of Line Protocol points of a specified shape")
 parser.add_argument('measurement', type=str, default='cpu')
 parser.add_argument('--batch_size', type=int, default=10)
 parser.add_argument('--num_tags', type=int, default=3)
 parser.add_argument('--int_fields', type=int, default=2)
 parser.add_argument('--float_fields', type=int, default=1)
 parser.add_argument('--str_fields', type=int, default=1)
 parser.add_argument('--tag_key_size', type=int, default=8)
 parser.add_argument('--tag_value_size', type=int, default=10) 
 parser.add_argument('--field_key_size', type=int, default=8)
 parser.add_argument('--int_value_size', type=int, default=4)
 parser.add_argument('--float_value_size', type=int, default=4)
 parser.add_argument('--str_value_size', type=int, default=8)
 parser.add_argument('--precision', type=str, choices = ['s','S','ms','MS','us','US','ns','NS'], default='s')
 parser.add_argument('--keep_keys_batch', action='store_true')
 parser.add_argument('--keep_keys_session', action='store_true', help="")
 args = parser.parse_args()
 
 # make a dictionary out of the parser args
 batch_args = vars(args)
 # add the other items to the dictionary
 if args.keep_keys_session:
 batch_args['tag_keys'] = [_gen_string(args.tag_key_size) for i in range(args.num_tags)]
 batch_args['int_field_keys'] = [_gen_string(args.tag_key_size) for i in range(args.int_fields)]
 batch_args['float_field_keys'] = [_gen_string(args.tag_key_size) for i in range(args.float_fields)]
 batch_args['str_field_keys'] = [_gen_string(args.tag_key_size) for i in range(args.str_fields)]
 batch_args['keep_keys_batch'] = True
 else:
 batch_args['tag_keys'] = []
 batch_args['int_field_keys'] = []
 batch_args['float_field_keys'] = []
 batch_args['str_field_keys'] = []
 batch = gen_batch(**batch_args)
 for line in batch:
 print(line)

You can do the same type of transformation to get_line() and get_fieldset(). Beyond that, it might be cleaner to just extract the couple of args and pass them directly to the rest of the gen_xxx() functions rather than propagating kwargs everywhere.

answered Nov 13, 2020 at 1:29
\$\endgroup\$
1
  • \$\begingroup\$ I ended up doing just this! \$\endgroup\$ Commented Nov 13, 2020 at 23:12
1
\$\begingroup\$

Updated to include Tomerikoo's suggestion to just use the args object instead of creating another object.

In thinking about this some more, it's nearly identical to use a class rather than a dictionary to store the arguments, but it makes the code a little easier to read IMHO. For example, args.str_fields is more intuitive than kwargs.get('str_fields'). And the easiest class to use (thanks @Tomerikoo ) is the object returned from parse_args(). So if it's not too late, here is an updated version using an object to store the arguments rather than a dictionary. Also, I used the a = x if b else y syntax to condense some of the assignment statements.

import argparse
def gen_batch(bargs):
 # Generate keys at batch level if `hold_keys` is True. 
 # This will keep Tag keys constant per line, reducing Series creation at database level.
 # Simplified...
 if not keep_keys_session:
 # these local variables are just to shorten the assignment lines
 tag_key_size = _gen_string(bargs.tag_key_size)
 keep = bargs.keep_keys_batch
 bargs.tag_keys = [tag_key_size for i in range(bargs.num_tags)] if keep else []
 bargs.int_field_keys = [tag_key_size for i in range(bargs.int_fields)] if keep else []
 bargs.float_field_keys = [tag_key_size for i in range(bargs.float_fields)] if keep else []
 bargs.str_field_keys = [tag_key_size for i in range(bargs.str_fields)] if keep else []
 
 batch = []
 for i in range(batch_size):
 line = gen_line(bargs)
 batch.append(line)
 return(batch) 
if __name__ == "__main__":
 parser = argparse.ArgumentParser(description="Generate a batch of Line Protocol points of a specified shape")
 parser.add_argument('measurement', type=str, default='cpu')
 parser.add_argument('--batch_size', type=int, default=10)
 parser.add_argument('--num_tags', type=int, default=3)
 parser.add_argument('--int_fields', type=int, default=2)
 parser.add_argument('--float_fields', type=int, default=1)
 parser.add_argument('--str_fields', type=int, default=1)
 parser.add_argument('--tag_key_size', type=int, default=8)
 parser.add_argument('--tag_value_size', type=int, default=10) 
 parser.add_argument('--field_key_size', type=int, default=8)
 parser.add_argument('--int_value_size', type=int, default=4)
 parser.add_argument('--float_value_size', type=int, default=4)
 parser.add_argument('--str_value_size', type=int, default=8)
 parser.add_argument('--precision', type=str, choices = ['s','S','ms','MS','us','US','ns','NS'], default='s')
 parser.add_argument('--keep_keys_batch', action='store_true')
 parser.add_argument('--keep_keys_session', action='store_true', help="")
 args = parser.parse_args()
 
 # add the other items
 tks = _gen_string(args.tag_key_size)
 args.tag_keys = [tks for i in range(args.num_tags)] if args.keep_keys_session else []
 args.int_field_keys = [tks for i in range(args.int_fields)] if args.keep_keys_session else []
 args.float_field_keys = [tks for i in range(args.float_fields)] if args.keep_keys_session else []
 args.str_field_keys = [tks for i in range(args.str_fields)] if args.keep_keys_session else []
 args.keep_keys_batch = True if args.keep_keys_session else False
 batch = gen_batch(args)
 for line in batch:
 print(line)
answered Nov 14, 2020 at 17:41
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Why not just use args? It is already a simple container object holding some attributes, and all you do is copy its attributes to a new object... Just use args instead of bargs... \$\endgroup\$ Commented Nov 14, 2020 at 17:45
  • 2
    \$\begingroup\$ You are correct, there's no reason unless you need to add more customization to the container object, which was not a requirement from the OP. I guess that I was staring at it too long to see the obvious. I'll edit my answer to implement your suggestion. \$\endgroup\$ Commented Nov 14, 2020 at 21:15
  • 2
    \$\begingroup\$ I see you already edited what I was going to reply on your previous comment that you can add any attributes you want to the existing args object :) \$\endgroup\$ Commented Nov 14, 2020 at 22:51
  • 1
    \$\begingroup\$ I did :-). Mentally, I always isolate the args object from the rest of my code, usually pushing them into local variables, but I'll keep this model in mind moving forward where it makes sense. \$\endgroup\$ Commented Nov 14, 2020 at 23:42

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.