I am trying to build a dataframe containing columns which are conditional. My code:
from faker import Faker
import pandas as pd
import random
fake = Faker()
def create_rows_faker(num=1):
output = [{"name":fake.name(),
"address":fake.address(),
"name":fake.name(),
"email":fake.email()} for x in range(num)]
return output
produces
df = pd.DataFrame(create_rows_faker(3))
df
How can I change the definition of ouput so that if I had a variable if name_column == '1' then include this in output (and don't include otherwise), and similarly with name and email?
2 Answers 2
Use a standard for loop instead of overcomplicating the comprehension.
def create_rows_faker(num=1, name_col = True, address_col = True, email_col = False):
output = []
for x in range(num):
out = {}
if name_col:
out["name"] = fake.name()
if address_col:
out["address"] = fake.address()
if email_col:
out["email"] = fake.email()
output.append(out)
return output
answered May 4, 2022 at 12:45
matszwecja
8,2372 gold badges13 silver badges22 bronze badges
Sign up to request clarification or add additional context in comments.
1 Comment
user309575
I tried this initially but I didn't like the amount of ifs inside the loop.
Here is an option using a dictionary of function and a list of the keys:
def create_rows_faker(num=1, use=('name', 'address', 'email')):
options = {"name":fake.name,
"address":fake.address,
"email":fake.email}
use = set(use)
options = {k:f for k,f in options.items() if k in use}
output = [{k:f() for k,f in options.items()} for x in range(num)]
return output
pd.DataFrame(create_rows_faker(3, use=['name']))
output:
name
0 Tracy Alexander MD
1 Mark Winters
2 Lori Edwards
answered May 4, 2022 at 12:49
mozway
267k13 gold badges56 silver badges106 bronze badges
4 Comments
matszwecja
List as a default parameter is a shortcut to serious bugs later on.
mozway
@matszwecja like what? The list is not mutated here
mozway
I made it a tuple, but this doesn't change a thing (I agree this would if I was mutating
use in the function)matszwecja
I just think it's something that should be avoided wherever possible. You never know how the code might change in the future and such things might be a nightmare to debug later on.
lang-py
"name": fake.name(),from your current code and instead addingif name_column == '1': for x in range(num): output[x]["name"] = fake.name()?