1

I have a database with a table of games (some columns omitted):

gameid
hometeam
awayteam
`date`

Sometimes games are added with the same home and away teams, but on a different date that is just 1 day difference (game date moved), and the old game isn't deleted. I'd like to be able to search over a variable number of days as well. Thus, for football, having the same opponent in a time span of +/- 5 days would be wrong, but possible in basketball or baseball or many other sports.

It's easy to search for multiple values of the same home/away/date, but the date difference makes this harder.

Also, the home and away team may be swapped, with the date also being the same or slightly different.

Example data:

gameid hometeam awayteam date
5 777 999 2014年10月23日
6 999 777 2014年10月23日
7 777 999 2014年10月24日
8 777 999 2014年10月25日

All of these are duplicates. Determining which doesn't matter, just that it should let me know that there are 4 games scheduled for this which it should (probably) be 1.

This is what I use to find duplicated games for the same home/away/date:

SELECT COUNT(*) as num,hometeam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,hometeam HAVING `num`>1
 UNION
SELECT COUNT(*) as num,awayteam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,awayteam HAVING `num`>1 ORDER BY `num` DESC;
asked Oct 23, 2014 at 16:48

2 Answers 2

1

Not sure why the question popped up now, but if you are still interested in an answer something like:

select g1.* 
from games g1 
where exists ( 
 select 1 
 from games g2 
 where g1.gameid <> g2.gameid 
 and least(g1.hometeam,g1.awayteam) 
 = least(g2.hometeam,g2.awayteam) 
 and greatest(g1.hometeam,g1.awayteam) 
 = greatest(g2.hometeam,g2.awayteam) 
 and abs(datediff(g1.d, g2.d)) < 2
);

should give you what you need

answered Jan 25, 2015 at 15:36
1
  • Since date is a reserved word I used d as the column name. Commented Jan 25, 2015 at 15:38
0

I can see that this question has been here a few days and no one has taken a stab at it yet. I'm not familiar with mySQL so I can't give you a sample that will work, but here is an idea for you.

Add another column to your table with a hash of the two teamID's. You will need to take care that the teams are entered in the same order, say ascending by their ID but that would allow you to uniquely identify a combination of teams.

Perhaps using md5? I came up with something like the query below which would work for MS-SQL.

 SELECT gameid
, hometeam
, awayteam
, [date]
, teamhash = HASHBYTES('md5', CASE WHEN hometeam < awayteam 
 THEN hometeam 
 ELSE awayteam 
 END 
 + CASE WHEN hometeam > awaytem 
 THEN hometeam 
 ELSE awayteam 
 END)
FROM gamedata

Then you can query against that looking for patterns. To make it perform better, you could add a table that contained a list of teams and all possible matches with their hashes.

Hope that helps.

Lennart - Slava Ukraini
23.9k3 gold badges34 silver badges72 bronze badges
answered Oct 28, 2014 at 19:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.