I analyse web logs (nginx) and want store (organize) result in DB. After analyse and aggregate my result looks like:
- Date and Time (Truncate it by hours)
- Country
- City
- URL (Page on the website)
- Referrer (source website)
- UTM_Medium
- UTM_Campaign
- c_sessions
- c_views
- avg_session_time
- c_js_event_1
- c_js_event_2
- c_js_event_3
And so on, about 30 columns.
For that data I prepare "big table"
CREATE TABLE public.cca_data_hours
(
datetime timestamp without time zone NOT NULL,
id_geo_country integer NOT NULL,
id_geo_city integer NOT NULL,
id_web_page bigint NOT NULL,
id_web_referrer bigint NOT NULL,
id_utm_campaign bigint NOT NULL,
id_utm_medium bigint NOT NULL,
c_sessions integer,
c_views integer,
avg_sd integer,
c_jse_1 integer,
c_jse_2 integer,
c_jse_3 integer,
...
CONSTRAINT cca_data_hours_pri PRIMARY KEY (datetime, id_web_page, id_web_referrer, id_utm_campaign, id_utm_medium, id_geo_country, id_geo_city)
)
I expect 100 000 000 records.
My selections
- I want see all data by countries, cities
- I want see data per page
- I want see data per source (referrer)
- I want see data per page, source and utm campaign
- Some other combinations
My questions
Is it normal use so many columns as primary key (index)? Primary key now contains 7 columns, it may up to 10-15 columns in the future.
I think this is a simple solution how to organize this data in one object-relational DB table. Do you know better solution, practice?
-
do you realize that a pkey(a,b,c) can only be used to seek b,c iff you also search on a. shy of that it can be used only in an index scan. So adding a ton of columns to a primary key doesn't suffice the covering index problem you just slow down inserts unless your search uses all of those columns.Evan Carroll– Evan Carroll2018年01月07日 20:49:40 +00:00Commented Jan 7, 2018 at 20:49
-
1Adding to the practical remarks from Evan, I would question if it made sense to have a PK at all. What would you use it for when storing/processing logs?András Váczi– András Váczi2018年01月07日 22:48:19 +00:00Commented Jan 7, 2018 at 22:48
-
Nice question, really, do I need it at all!? May be better to add 1-2-3 indexes.Dmitry– Dmitry2018年01月07日 23:07:18 +00:00Commented Jan 7, 2018 at 23:07
1 Answer 1
So many columns in a primary key sounds strange, at least. That means to me that the data is not really adhering to a sensible uniqueness definition - this is more often than not like this in the case of logs. For this reason, I wouldn't even try to find a PK here, just define indexes based on the queries used against the table. One could call this opportunistic indexing ;)
Note that in general, in a relational models PKs are very important. Here it is more the case the data is not really relational, only stored in a RDBMS.