I have a database with a table of games (some columns omitted):
gameid
hometeam
awayteam
`date`
Sometimes games are added with the same home and away teams, but on a different date that is just 1 day difference (game date moved), and the old game isn't deleted. I'd like to be able to search over a variable number of days as well. Thus, for football, having the same opponent in a time span of +/- 5 days would be wrong, but possible in basketball or baseball or many other sports.
It's easy to search for multiple values of the same home/away/date, but the date difference makes this harder.
Also, the home and away team may be swapped, with the date also being the same or slightly different.
Example data:
gameid hometeam awayteam date
5 777 999 2014年10月23日
6 999 777 2014年10月23日
7 777 999 2014年10月24日
8 777 999 2014年10月25日
All of these are duplicates. Determining which doesn't matter, just that it should let me know that there are 4 games scheduled for this which it should (probably) be 1.
This is what I use to find duplicated games for the same home/away/date:
SELECT COUNT(*) as num,hometeam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,hometeam HAVING `num`>1
UNION
SELECT COUNT(*) as num,awayteam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,awayteam HAVING `num`>1 ORDER BY `num` DESC;
2 Answers 2
Not sure why the question popped up now, but if you are still interested in an answer something like:
select g1.*
from games g1
where exists (
select 1
from games g2
where g1.gameid <> g2.gameid
and least(g1.hometeam,g1.awayteam)
= least(g2.hometeam,g2.awayteam)
and greatest(g1.hometeam,g1.awayteam)
= greatest(g2.hometeam,g2.awayteam)
and abs(datediff(g1.d, g2.d)) < 2
);
should give you what you need
-
Since date is a reserved word I used d as the column name.Lennart - Slava Ukraini– Lennart - Slava Ukraini2015年01月25日 15:38:32 +00:00Commented Jan 25, 2015 at 15:38
I can see that this question has been here a few days and no one has taken a stab at it yet. I'm not familiar with mySQL so I can't give you a sample that will work, but here is an idea for you.
Add another column to your table with a hash of the two teamID's. You will need to take care that the teams are entered in the same order, say ascending by their ID but that would allow you to uniquely identify a combination of teams.
Perhaps using md5? I came up with something like the query below which would work for MS-SQL.
SELECT gameid
, hometeam
, awayteam
, [date]
, teamhash = HASHBYTES('md5', CASE WHEN hometeam < awayteam
THEN hometeam
ELSE awayteam
END
+ CASE WHEN hometeam > awaytem
THEN hometeam
ELSE awayteam
END)
FROM gamedata
Then you can query against that looking for patterns. To make it perform better, you could add a table that contained a list of teams and all possible matches with their hashes.
Hope that helps.