Pythonic way to cast a dictionary into a pd.DataFrame with two columns?

Question 1

I have a dictionary where for each key, a single value is stored. Say

import pandas as pd
dd = {'Alice': 40,
 'Bob': 50,
 'Charlie': 35}

Now, I want to cast this dictionary to a pd.Dataframe with two columns. The first column contains the keys of the dictionary, the second column the values and give the columns a name (Say "Name" and "Age"). I expect to have a function call like:

 pd.DataFrame(dd, columns=['Name', 'Age'])

which gives not desired output, since it only has 0 rows.

Currently I have two "solutions":

# Rename the index and reset it:
pd.DataFrame.from_dict(dd, orient='index', columns=['Age']).rename_axis('Name').reset_index()
pd.DataFrame(list(dd.items()), columns=['Name', 'Age'])
# Both result in the desired output:
 Name Age
0 Alice 40
1 Bob 50
2 Charlie 35

However, both appear a bit hacky and thus inefficient and error-prone to me. Is there a more pythonic way to achieve this?

Question 2

There's nothing wrong/hacky in using pd.DataFrame(dd.items(), columns=['Name', 'Age']) to get the needed result in your case

Question 3

@RomanPerekhrest, Didn't realize that ```list()´´´´ can be removed. Without this, it seems to be ok for me. Do you want to post it as an answer, so I can accept it?

Question 4

Honestly, it's too simple to be a significant answer.

Question 5

The advantage of your call to from_dict is that the method name makes the conversion a little obvious (though the rest of the index manipulation makes this less obvious). Don't rename_axis(); instead pass a names parameter in reset_index().

Your call to dd.items() is probably the best approach in terms of simplicity, just drop the call to list.

I show two other options: one makes it even more obvious what's going on by sending in separate key and value series; and the fourth is a variant of your I expect to have a function call like but repaired.

import typing
import pandas as pd
def method_a(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 return pd.DataFrame.from_dict(
 data=dd, orient='index', columns=columns[1:],
 ).reset_index(names=columns[0])
def method_b(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 return pd.DataFrame(data=dd.items(), columns=columns)
def method_c(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 kcol, vcol = columns
 return pd.DataFrame({kcol: dd.keys(), vcol: dd.values()})
def method_d(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 df = pd.DataFrame(dd, index=columns[1:])
 return df.T.reset_index(names=columns[0])
def test() -> None:
 dd = {'Alice': 40,
 'Bob': 50,
 'Charlie': 35}
 ref = method_a(dd=dd, columns=('Name', 'Age'))
 for method in (method_b, method_c, method_d):
 result = method(dd=dd, columns=('Name', 'Age'))
 assert ref.equals(result)
if __name__ == '__main__':
 test()

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2024-12-22 21:48:35Z

The advantage of your call to from_dict is that the method name makes the conversion a little obvious (though the rest of the index manipulation makes this less obvious). Don't rename_axis(); instead pass a names parameter in reset_index().

Your call to dd.items() is probably the best approach in terms of simplicity, just drop the call to list.

I show two other options: one makes it even more obvious what's going on by sending in separate key and value series; and the fourth is a variant of your I expect to have a function call like but repaired.

import typing
import pandas as pd
def method_a(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 return pd.DataFrame.from_dict(
 data=dd, orient='index', columns=columns[1:],
 ).reset_index(names=columns[0])
def method_b(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 return pd.DataFrame(data=dd.items(), columns=columns)
def method_c(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 kcol, vcol = columns
 return pd.DataFrame({kcol: dd.keys(), vcol: dd.values()})
def method_d(dd: dict[str, typing.Any], columns: typing.Sequence[str]) -> pd.DataFrame:
 df = pd.DataFrame(dd, index=columns[1:])
 return df.T.reset_index(names=columns[0])
def test() -> None:
 dd = {'Alice': 40,
 'Bob': 50,
 'Charlie': 35}
 ref = method_a(dd=dd, columns=('Name', 'Age'))
 for method in (method_b, method_c, method_d):
 result = method(dd=dd, columns=('Name', 'Age'))
 assert ref.equals(result)
if __name__ == '__main__':
 test()

Stack Exchange Network

Pythonic way to cast a dictionary into a pd.DataFrame with two columns?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Pythonic way to cast a dictionary into a pd.DataFrame with two columns?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions