Summary
I'm trying to save some data I have in a CSV to a SQLite database using SQLAlchemy in a Jupyter Notebook. When trying to flush or merge the session data for the junction tables, I receive an IntegrityError saying that there is a duplicated row. I do not believe this to be true as I checked the data source and found no duplicates and checked each row before adding them and also found no duplicates. Thus, I am confused about why this error is happening.
Problem
First, the CSV are imported into dataframes like:
recipe_df = pd.read_csv("csv/recipes.csv").replace({np.nan: None})
material_df = pd.read_csv("csv/materials.csv").replace({np.nan: None})
crystal_df = pd.read_csv("csv/crystals.csv").replace({np.nan: None})
and the ORM table classes look like:
class Recipe(Base):
__tablename__ = "recipe"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String)
# ...
# Relationships
materials = relationship("Material", secondary="recipe_to_material", back_populates="recipes")
crystals = relationship("Crystal", secondary="recipe_to_crystal", back_populates="recipes")
class Material(Base):
__tablename__ = "material"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String)
icon = Column(String)
is_recipe = Column(Boolean)
# Relationships
recipes = relationship("Recipe", secondary="recipe_to_material", back_populates="materials")
class Crystal(Base):
__tablename__ = "crystal"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String)
icon = Column(String)
# Relationships
recipes = relationship("Recipe", secondary="recipe_to_crystal", back_populates="crystals")
class RecipeToMaterial(Base):
__tablename__ = "recipe_to_material"
recipe_id = Column(Integer, ForeignKey("recipe.id"), primary_key=True)
material_id = Column(Integer, ForeignKey("material.id"), primary_key=True)
amount = Column(Integer)
class RecipeToCrystal(Base):
__tablename__ = "recipe_to_crystal"
recipe_id = Column(Integer, ForeignKey("recipe.id"), primary_key=True)
crystal_id = Column(Integer, ForeignKey("crystal.id"), primary_key=True)
amount = Column(Integer)
class MaterialHasRecipe(Base):
__tablename__ = "material_has_recipe"
material_id = Column(Integer, ForeignKey("material.id"), primary_key=True)
recipe_id = Column(Integer, ForeignKey("recipe.id"), primary_key=True)
There is no trouble exporting to the recipe, material, and crystal tables, however, trouble arises when trying to do the same for the junction tables. Essentially, recipe_df contains both the data necessary for the recipe table alongside the names of the materials and crystals it requires. The junction table rows should be built by taking the id of the recipe and the id of the material/crystal that matches the name in each column of that recipe like:
def add_recipe_crystals():
recipe_ids:list[int] = []
crystal_ids:list[int] = []
amounts:list[int] = []
# Get all recipe and crystal ids
for _, row in recipe_df.iterrows():
recipe_id = session.query(Recipe.id).filter(Recipe.name == row["name"]).first()[0]
for i in range(1, 3):
crystal_name = row[f"crys_{i}_name"]
crystal_id = session.query(Crystal.id).filter(Crystal.name == crystal_name).first()
# if crystal_i doesn't exist, just skip this one
if crystal_id is None:
continue
crystal_id = crystal_id[0]
recipe_ids.append(recipe_id)
crystal_ids.append(crystal_id)
amounts.append(int(row[f"crys_{i}_count"]))
recipe_to_crystal_df = pd.DataFrame({
"recipe_id": recipe_ids,
"crystal_id": crystal_ids,
"amount": amounts,
})
# Create RecipeToCrystal instances
for i, row in recipe_to_crystal_df.iterrows():
rec_2_crys = RecipeToCrystal(
recipe_id = row["recipe_id"],
crystal_id = row["crystal_id"],
amount = row["amount"]
)
session.add(rec_2_crys)
This appears to work perfectly fine until time comes to try and flush or merge the session. When that happens, I'm given an error like:
IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: recipe_to_crystal.recipe_id, recipe_to_crystal.crystal_id
[SQL: INSERT INTO recipe_to_crystal (recipe_id, crystal_id, amount) VALUES (?, ?, ?)]
Attempted Fixes
Thinking this might be an autoflush problem, I wrapped the functions to add to the junction tables in the function add_junction_tables() and wrapped that in a session.no_autoflush context manager like:
with session.no_autoflush:
add_junction_tables()
session.commit()
I also checked recipe_df for rows where the crystals had the same name like:
count = 0
for i, row in recipe_df.iterrows():
crys_1 = row["crys_1_name"]
crys_2 = row["crys_2_name"]
if crys_2 is None:
continue
if crys_1 in crys_2:
count += 1
print(f"In recipe {row["name"]}, the crystals {crys_1} and {crys_2} are the same")
if count == 0:
print("No duplicates found")
This printed "No duplicates found"
I also made a checking function like:
def recipe_crystal_exists(rec_id:int, crys_id:int):
existing_entry = session.query(RecipeToCrystal).filter(
RecipeToCrystal.recipe_id == rec_id,
RecipeToCrystal.crystal_id == crys_id
).first()
if existing_entry:
return True
return False
and added a check in add_recipe_crystals() before adding the ids to their respective lists like:
if recipe_crystal_exists(recipe_id, crystal_id):
raise ValueError(f"Recipe {recipe_id} and crystal {crystal_id} already exists in RecipeToCrystals")
This error was never raised.
I also swapped out session.add() for session.merge() , expunged the session, and tried deleting the database altogether and trying again.
Conclusion
The error still persists, however, and I am left stumped. The weirdest thing is that add_recipe_materials(), which is functionally identical and runs before add_recipe_crystals() does not seem to have this problem. Otherwise, I would assume the error would be raised for it first.
What is causing this error and/or is there any way to see what rows specifically are the duplicates to better diagnose this problem? Any and all help would be appreciated.
Explore related questions
See similar questions with these tags.
crys_1_nameandcrys_1_nameare same or not in "a row". But Instead you should check if there are duplicates "among" the rows. If there are two rows with samenameand samecrys_1_nameand/orcrys_1_nameyou should see duplicate error as expected.