0

I analyse web logs (nginx) and want store (organize) result in DB. After analyse and aggregate my result looks like:

  • Date and Time (Truncate it by hours)
  • Country
  • City
  • URL (Page on the website)
  • Referrer (source website)
  • UTM_Medium
  • UTM_Campaign
  • c_sessions
  • c_views
  • avg_session_time
  • c_js_event_1
  • c_js_event_2
  • c_js_event_3

And so on, about 30 columns.

For that data I prepare "big table"

CREATE TABLE public.cca_data_hours
(
 datetime timestamp without time zone NOT NULL,
 id_geo_country integer NOT NULL,
 id_geo_city integer NOT NULL,
 id_web_page bigint NOT NULL,
 id_web_referrer bigint NOT NULL,
 id_utm_campaign bigint NOT NULL,
 id_utm_medium bigint NOT NULL,
 c_sessions integer,
 c_views integer,
 avg_sd integer,
 c_jse_1 integer,
 c_jse_2 integer,
 c_jse_3 integer,
 ...
 CONSTRAINT cca_data_hours_pri PRIMARY KEY (datetime, id_web_page, id_web_referrer, id_utm_campaign, id_utm_medium, id_geo_country, id_geo_city)
)

I expect 100 000 000 records.

My selections

  • I want see all data by countries, cities
  • I want see data per page
  • I want see data per source (referrer)
  • I want see data per page, source and utm campaign
  • Some other combinations

My questions

  • Is it normal use so many columns as primary key (index)? Primary key now contains 7 columns, it may up to 10-15 columns in the future.

  • I think this is a simple solution how to organize this data in one object-relational DB table. Do you know better solution, practice?

asked Jan 7, 2018 at 20:43
3
  • do you realize that a pkey(a,b,c) can only be used to seek b,c iff you also search on a. shy of that it can be used only in an index scan. So adding a ton of columns to a primary key doesn't suffice the covering index problem you just slow down inserts unless your search uses all of those columns. Commented Jan 7, 2018 at 20:49
  • 1
    Adding to the practical remarks from Evan, I would question if it made sense to have a PK at all. What would you use it for when storing/processing logs? Commented Jan 7, 2018 at 22:48
  • Nice question, really, do I need it at all!? May be better to add 1-2-3 indexes. Commented Jan 7, 2018 at 23:07

1 Answer 1

0

So many columns in a primary key sounds strange, at least. That means to me that the data is not really adhering to a sensible uniqueness definition - this is more often than not like this in the case of logs. For this reason, I wouldn't even try to find a PK here, just define indexes based on the queries used against the table. One could call this opportunistic indexing ;)

Note that in general, in a relational models PKs are very important. Here it is more the case the data is not really relational, only stored in a RDBMS.

answered Jan 8, 2018 at 12:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.