-1

I am working in Jupyter Notebook with pandas, and I noticed something strange.

In one cell , I did this:

import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3]})
df2 = df1

Then in another cell, I changed df2:

df2.loc[0,"A"] = 100

But when I check df1 , its also updated - even though I never touched it directly!

print(df1)
output
 A
0 100
1 2
2 3

I expected df1 to stay unchanged. why this happening? Do Jupyter cells share variables differently , or is this pandas work with assignments?

  • I tried using df2 = df1.copy() - that seems to fix it.
  • I expected df1 and df2 to be two independent DataFrame since I created them separately.
  • Just want to understand why the change happens and the right way to avoid it
Emi OB
3,3953 gold badges20 silver badges40 bronze badges
asked Oct 9 at 5:34
1
  • 1
    Your expectations are wrong df2 = df1 doesn't create a new object. This just assigns a new variable to the same object. That's it. And you already know the fix: create a copy if you need a copy... And this has nothing to do with jupyter cells. Commented Oct 9 at 5:41

1 Answer 1

0

df2 = df1 only copies the reference, not the data.
So both df1 and df2 point to the same DataFrame in memory — changing one changes the other.

To check their memory addresses:

print("df1:\n", df1)
print("df2:\n", df2)
print("id(df1):", id(df1))
print("id(df2):", id(df2))
df1:
 A
0 100
1 2
2 3
df2:
 A
0 100
1 2
2 3
id(df1): 90949904
id(df2): 90949904

They will be the same.

If you want df2 to be a separate copy, use:

df2 = df1.copy()

Then the ids will be different, and modifying df2 won’t affect df1.

answered Oct 9 at 5:52
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.