I am working in Jupyter Notebook with pandas, and I noticed something strange.
In one cell , I did this:
import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3]})
df2 = df1
Then in another cell, I changed df2:
df2.loc[0,"A"] = 100
But when I check df1 , its also updated - even though I never touched it directly!
print(df1)
output
A
0 100
1 2
2 3
I expected df1 to stay unchanged. why this happening? Do Jupyter cells share variables differently , or is this pandas work with assignments?
- I tried using
df2 = df1.copy()- that seems to fix it. - I expected df1 and df2 to be two independent DataFrame since I created them separately.
- Just want to understand why the change happens and the right way to avoid it
1 Answer 1
df2 = df1 only copies the reference, not the data.
So both df1 and df2 point to the same DataFrame in memory — changing one changes the other.
To check their memory addresses:
print("df1:\n", df1)
print("df2:\n", df2)
print("id(df1):", id(df1))
print("id(df2):", id(df2))
df1:
A
0 100
1 2
2 3
df2:
A
0 100
1 2
2 3
id(df1): 90949904
id(df2): 90949904
They will be the same.
If you want df2 to be a separate copy, use:
df2 = df1.copy()
Then the ids will be different, and modifying df2 won’t affect df1.
Comments
Explore related questions
See similar questions with these tags.
df2 = df1doesn't create a new object. This just assigns a new variable to the same object. That's it. And you already know the fix: create a copy if you need a copy... And this has nothing to do with jupyter cells.