I have a dictionary where for each key, a single value is stored. Say
import pandas as pd
dd = {'Alice': 40,
'Bob': 50,
'Charlie': 35}
Now, I want to cast this dictionary to a pd.Dataframe with two columns. The first column contains the keys of the dictionary, the second column the values and give the columns a name (Say "Name" and "Age"). I expect to have a function call like:
pd.DataFrame(dd, columns=['Name', 'Age'])
which gives not desired output, since it only has 0 rows.
Currently I have two "solutions":
# Rename the index and reset it:
pd.DataFrame.from_dict(dd, orient='index', columns=['Age']).rename_axis('Name').reset_index()
pd.DataFrame(list(dd.items()), columns=['Name', 'Age'])
# Both result in the desired output:
Name Age
0 Alice 40
1 Bob 50
2 Charlie 35
However, both appear a bit hacky and thus inefficient and error-prone to me. Is there a more pythonic way to achieve this?
1 Answer 1
The advantage of your call to from_dict
is that the method name makes the conversion a little obvious (though the rest of the index manipulation makes this less obvious). Don't rename_axis()
; instead pass a names
parameter in reset_index()
.
Your call to dd.items()
is probably the best approach in terms of simplicity, just drop the call to list
.
I show two other options: one makes it even more obvious what's going on by sending in separate key and value series; and the fourth is a variant of your I expect to have a function call like but repaired.
import typing
import pandas as pd
def method_a(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
return pd.DataFrame.from_dict(
data=dd, orient='index', columns=columns[1:],
).reset_index(names=columns[0])
def method_b(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
return pd.DataFrame(data=dd.items(), columns=columns)
def method_c(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
kcol, vcol = columns
return pd.DataFrame({kcol: dd.keys(), vcol: dd.values()})
def method_d(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
df = pd.DataFrame(dd, index=columns[1:])
return df.T.reset_index(names=columns[0])
def test() -> None:
dd = {'Alice': 40,
'Bob': 50,
'Charlie': 35}
ref = method_a(dd=dd, columns=('Name', 'Age'))
for method in (method_b, method_c, method_d):
result = method(dd=dd, columns=('Name', 'Age'))
assert ref.equals(result)
if __name__ == '__main__':
test()
pd.DataFrame(dd.items(), columns=['Name', 'Age'])
to get the needed result in your case \$\endgroup\$