I can't quite seem to figure this out, and hoping someone can help. I was trying something like this, but doesn't seem to work:
SELECT *
FROM testtable
GROUP BY SUBSTRING_INDEX(urlTest,'/',3)
HAVING COUNT(*)>1
Here is a (simplified) table structure with what I am using:
CREATE TABLE `testtable` (
`field1` int(11) NOT NULL,
`field2` datetime NOT NULL,
`urlTest` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `testtable` (`field1`, `field2`, `urlTest`) VALUES
(2, '2010-01-01', 'http://test1.com/somethingelse.php?id=1'),
(5, '2010-01-01', 'http://test1.com/'),
(6, '2012-02-02', 'http://test1.com/newscript/something'),
(7, '2013-02-02', 'http://test2.com/newscript/something'),
(8, '2014-02-02', 'http://test3.com/newscript/something'),
(9, '2015-02-02', 'http://test3.com/');
ALTER TABLE `testtable`
ADD PRIMARY KEY (`field1`);
The output I want to get is to return any "base" urls that are identical with an 'identical' date, that I can manually inspect.
In other words, although "test1.com/somethingelse.php?id=1" and "test1.com" are "unique" url entries, the "base" url (i.e., "test1.com") is identical/a duplicate (because the dates -> 2010年01月01日 <- are also identical). However, test1.com/newscript/something would not count as a 'duplicate' - because although the 'base' url is the same - it has a 'unique' date to the rest of the 'test1.com' urls (i.e., "2012-02-02")
So the output I'd like to get is:
2, 2010年01月01日, 'http://test1.com/somethingelse.php?id=1'
5, 2010年01月01日, 'http://test1.com/'
(because only test1.com has the same "base" unique URL (test1.com) as well as has the same timestamp. "test3.com" - while the same "base" url - since the actual timestamps are different, would be considered 'unique' urls)
How would I accomplish this?
Thanks!
1 Answer 1
Selecting all fields makes no sense - server will output one RANDOM field value from all values in the group. Use
SELECT testtable.*
FROM testtable
JOIN ( SELECT SUBSTRING_INDEX(urlTest,'/',3) AS domain,
field2
FROM testtable
GROUP BY SUBSTRING_INDEX(urlTest,'/',3), field2
HAVING COUNT(*) > 1 ) domains
WHERE SUBSTRING_INDEX(testtable.urlTest,'/',3) = domains.domain
AND testtable.field2 = domains.field2
-
wow! i'm impressed! that is very cool, thanks very much for your help! I've been trying to figure it out for quite some time now! thanks very much! :Duser6262902– user62629022019年09月27日 17:54:19 +00:00Commented Sep 27, 2019 at 17:54
-
follow-up question (now that this works great!) and I can see the 'duplicate' urls... something I didn't anticipate is "some" base urls require a 'longer' substring_index, - so what would be the best way of doing that?user6262902– user62629022019年09月27日 18:01:20 +00:00Commented Sep 27, 2019 at 18:01
-
i.e, say I have "test1.com", "test2.com" and "test3.com", and I want to 'individually' add specific domain bases, but specify how many slashes there should be for each? i.e.,user6262902– user62629022019年09月27日 18:02:08 +00:00Commented Sep 27, 2019 at 18:02
-
shoot - sorry, I keep typing and it keeps ending the comment. anyways, kind of like: test1.com/random test1.com/otheroutput (so test1.com is "3" slashes) test2.com/base/newrandom test2.com/base/otheritem (so "4" slashes, because up until "/base/" everything is identical) then say: test3.com/base/newrandom/variablending001 test3.com/base/newrandom/variablending002 test3.com/base/newrandom/variablending003 (so '3' variable endings in this case)? so "5" slashes... or... it just occurred to me...user6262902– user62629022019年09月27日 18:04:03 +00:00Commented Sep 27, 2019 at 18:04
-
@user6262902 New task is relative slightly. I'd recommend to create new question, with new DDL+data scripts and desured output.Akina– Akina2019年09月27日 18:04:12 +00:00Commented Sep 27, 2019 at 18:04