I'm managing a lot of sensor data and need to do charting, smoothing, stats over time on the data. Currently I import the data from flat files using VBA to do ETL into Excel.
The format of the data is:
[DataTime][SystemID][SystemParams 1][SystemParams ..n][Data 1][Data ...n]
There are 30 System Parameters and 8000 datapoints in each file. All this is fine in Excel where I can manipulate, filter and pivot off the system parameters and then chart the Data over time.
But I want to get all rows out of Excel into a database as there will be 10K files showing up every week in this format.
When I export the rows from Excel to SQL Server (using the SQL Server import/export tools I only get 200 columns of the 8000+ from my Excel row due to the column restrictions in SQL Server
What would be the correct way of structuring this data so that I can run queries on the System Parameters an retrieve the 8000 data points back into Excel for charting?
I'd like to stay in SQL Server but have read I might be better off with NoSQL. All advice gratefully accepted..
Overwhelmed
2 Answers 2
To help those who follow... I ended up packing the data into a comma separated string for insert into a varchar(max) column. Not what I wanted to do but it works and I'm on my way. Another gotcha with this method is that although varchar(max) can store 2GB the table read from an external connection in excel only returns the first 32765 bytes of the string. To get round that I needs to run a vba script to get the full string using GetChunk() method.
Best.
SQL Server has a feature called wide tables which will let you go up to 30,000 columns per table. It uses sparse columns and column sets.
Create Table dbo.Example ([Id] Int Identity(1, 1), [Value1] Int Sparse, [Value2] Int Sparse, Cs Xml Column_Set For All_Sparse_Columns)
You define all the extra data columns as Sparse (this would normally indicate most of them will default to NULL and so that these NULLs should not be stored, because when you have 8,000 columns I doubt every single one of them has data all the time, but hey maybe they do and that's okay).
When you insert data into column [Value1], it will be encoded as XML and stored in [Cs], same if you insert data into [Value2]. When you Select * you'll only see [Id] and [Cs] but you can Select [Id], [MyValue1] and any other columns in the column set specifically, which will pull the values back directly as if they were normal columns.
It has some limitations regarding modifying the column set after you're done, and translating to/from empty strings and stuff which you will need to check. But it sounds like this is the technology for you.
DateTime
row? Really?