Good morning,
Im having some issues with how my database should be laid out and would appreciate some guidance. I have several tables (rainfall data), each containing the following columns:
Date (DateTime), Value_of_rainfall (Float)
Each of these tables are for a specific location.
It has been suggested i create another table with the following columns:
LocationID (tinyInt), LocationName(char(6))
and insert into the first table a new column called LocationID(tinyInt).
Now the confusion i have is regarding that the data i have stored in the rainfall data datatables. Its been suggested that all the data from each location be contained in one data table. The statistical analysis im looking to carry out on the info is very (as far as i have envisioned thus far) location specific and wont require querying multiple locations at once. A couple months worth of data for one location = nearly 3 million rows and im looking to set up long running calculations on the data. therefore would a data warehouse be more appropriate? If so, could someone give me some pointers on how i should lay it out?
Thanks for your time.
Note: Im using Sql Server 2008
-
1A table per location is not an appropriate design for any modern theoretic data model that I know of, including both relational and dimensional (Data Warehousing). The matter of location specific calculations is a performance implementation issue that all modern DBMS and DW products are easily able to accommodate without having to compromised the data design like this. The question of whether to use a relational DB (OLTP) or Data Warehouse (OLAP) has primarily to do with the relationship between data writing (inserts/updates/deletes, timing, etc.) and data reading (select timing, scope, etc)RBarryYoung– RBarryYoung2012年06月19日 13:16:16 +00:00Commented Jun 19, 2012 at 13:16
1 Answer 1
It seems you want to aggregate location based statistics over time for rainfall. A database structure like the one below would let you do that. The 'data source' could be just a filename, or some indication as to where it came from.
create table DimDataSource (
DataSourceID int identity (1,1) not null
DataSourceDesc nvarchar (100) -- May need unicode for file names
)
go
alter table DimDataSource
add constraint PK_DataSource
primary key clustered (DataSourceID)
go
create table DimLocation (
LocationID int identity (1,1) not null
LocationDesc varchar (50)
)
go
alter table DimLocation
add constraint PK_Location
primary key clusterd (LocationID)
go
create table DimDate (
DateID smalldatetime not null -- 'Date' is a reserved word
,MonthID int not null
,MonthDesc varchar (15)
,QuarterID int not null
,QuarterDesc varchar (15)
,YearID
)
go
alter table DimDate
add constraint PK_Date
primary key clustered (DateID)
go
create table DimTime (
TimeID time not null -- 'Time' is a reserved word
,Hour int not null
)
go
alter table DimTime
add constraint PK_Time
primary key clustered (TimeID)
go
-- If the table is <50GB, don't bother with partitioning, but put a clustered
-- index on DateID or LocationID and DateID, depending on how you normally expect
-- to query the data.
create table FactRainfall (
RainfallID int identity (1,1) not null -- May need a wider type if >4B rows.
-- SSAS likes an identity column for
-- incremental loads
,DataSourceID int not null
,LocationID int not null
,DateID smalldatetime not null
,TimeID time not null
,Rainfall float
)
go
-- Add foreign keys as necessary
Populate the dimensions with the appropriate list of locations, date ranges, time of day to the right grain and one data source record per file. This table will also allow you to put a cube over the top, or can be flattened with a view, which will help people using tools like Excel or stats packages to get and use the data.
Explore related questions
See similar questions with these tags.