-1

I have a log table. It logs visitor data (time of visit, browser, window size ...).

I want to create reports. I use a query like this (MySQL):

SELECT
COUNT(*) as raw_views ...
FROM logs
WHERE timestamp >= CURDATE()
GROUP BY DATE(timestamp)

The problem is DATE(timestamp). I am using a function on this field, so the index won't work. But the table is huge. Index are very important. But how else can I create hourly, daily, yearly, monthly reports?

I tried to find out how others are doing it. For example Piwik (an open source Google Analytics like script). But I don't really get it.

asked Jul 25, 2014 at 8:41
13
  • GROUP BY can only be used if you have an aggregate, of which you have none. Commented Jul 25, 2014 at 8:49
  • Sorry, I edited the code. Commented Jul 25, 2014 at 8:51
  • 2
    See dba.stackexchange.com/questions/239/… Commented Jul 25, 2014 at 8:56
  • I can't store aggregated values like that. There are MANY reason in my scenario but for example: I want daily, hourly, yearly reports. Commented Jul 25, 2014 at 8:58
  • MySQL doesn't support indexes on expressions or functions Commented Jul 25, 2014 at 9:04

4 Answers 4

1

Looking at the original query

SELECT
COUNT(*) as raw_views ...
FROM logs
WHERE timestamp >= CURDATE()
GROUP BY DATE(timestamp)

It appears that you are looking for counts for just today. In that instance, why use GROUP BY at all ? You should run

SELECT
COUNT(*) as raw_views ...
FROM logs
WHERE timestamp >= CURDATE();

OK, let's get a little more granular. You said you have reports per user_id

Perhaps the query could be adjusted per user

SELECT
COUNT(*) as raw_views
FROM logs
WHERE userid = <some number>
AND timestamp >= CURDATE();

In that case, you can index like this:

ALTER TABLE logs ADD INDEX userid_timestamp_index (user_id,timestamp);

If you are getting all the user raw count in one shot, you would query like this

SELECT
 user_id,COUNT(*) as raw_views
FROM logs
WHERE timestamp >= CURDATE()
GROUP BY user_id
;

In that case, you can index like this:

ALTER TABLE logs ADD INDEX timestamp_userid_index (timestamp,user_id);

This is all I can suggest with what you have given. Hope this help !!!

answered Jul 28, 2014 at 1:58
0
1

You could create an additional column, date, which stores the function date(timestamp) on insertion. This won't make the group by extremely efficient, but it can avoid the temporary table.

The second problem is the range + GROUP BY, which would make an index on (timestamp, date) useless (BTREE limitations). You can create just an index on (date) or better, on (date, timestamp) if you are using 5.6 with Index Condition Pushdown and do:

SELECT
COUNT(*) as raw_views ...
FROM logs
WHERE date >= CURDATE() and timestamp >= CURDATE()
GROUP BY date

Again, this will be better than before, but it will not make the group by radically efficient.

There are several options for that, the most direct is the one that @Remus Rusanu mentioned, calculating aggregates. If you need them "in real time", you can update them on every insert, adding +1 with a trigger, for example, but that will reduce quite significantly your insertion rate. Usually, those are cached for several usages, as @ypercube mentions, and only recalculated every some time.

Maybe you are using the wrong tool: MySQL is not the best tool for analytics/aggregate results. Think of column-based storages- there are even some propietary implementations as MySQL storage engines.

answered Jul 25, 2014 at 9:57
2
  • Some, such as InfiniDB are Open Source - Infobright is also, but less so. I think that InfiniDB is an interesting project - it's even starting to put sorely needed windowing functions into MySQL. Even better for this sort of work would be PostgreSQL IF you can change - or use a second server as a data warehouse. Commented Jul 25, 2014 at 12:21
  • Well, I actually used proprietary as in "exclusive/non common/owned" rather than non-free. Commented Jul 25, 2014 at 12:23
1

First create a Calendar table (a table of dates) as described in this answer; then extend it by adding and populating Year, MonthName, DayOfMonth, and Tomorrow columns in that table and add a unique index on it by (Year, MonthName and DayOfMonth), and another unique index on it by the base column (Date) - the Primary Key.

Now you can generate your reports by joining to the Calendar table (assume it is called Calendar) like this:

SELECT 
 Year,MonthName,
 count(*) as raw_views
from logs
join Calendar on _data <= Timestamp and TimeStamp < Tomorrow
group by Year,MonthName
answered Jul 28, 2014 at 1:15
0

This might work:

select sum(N), the_date from (
 SELECT COUNT(*) as N, timestamp, DATE(timestamp) as the_date
 FROM logs
 WHERE timestamp >= CURDATE()
 group by timestamp
) as A
GROUP BY the_date -- or: GROUP BY DATE(timestamp)

Your first problem is MySQL, for many reasons, here because function(column) shouldn't preclude the use of an index on column. The above query might engender consultation of the column at the price of two aggregations.

answered Jul 28, 2014 at 0:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.