0

I have daily yield curve data for integer maturities that should be stored in a MSSQL server.

Currency.csv

Currently, the data is stored in CSV files per currency in the format:

date, 1y, 2y, 3y, ...
08/12/2020, 0.4, 0.5, 0.6, ...

I am wondering what the best choice for a table would be.

Same Format Table

The current data could be directly translated to a table with columns:

+------------+------+------+------+------+
| date | 1y | 2y | 3y | ... |
+------------+------+------+------+------+
| 08/12/2020 | 0.4 | 0.5 | 0.6 | ... |
+------------+------+------+------+------+

Alternate Table Design

However, I could also store it like:

+------------+-----------+-----------+--------+
| date | currency | maturity | value |
+------------+-----------+-----------+--------+
| 08/12/2020 | "EUR" | 1 | 0.4 |
| 08/12/2020 | "EUR" | 2 | 0.5 |
| 08/12/2020 | "EUR" | 3 | 0.6 |
+------------+-----------+-----------+--------+

Obviously, the second approach would take more space than the first, but I would see the advantage that I am not restricting the value of the maturity in case in the future it should be changed to non-integer values or to maturities that don't have a column yet.

Which of the two table designs would be better? Are there other designs that would fit better?

John K. N.
18.9k14 gold badges56 silver badges117 bronze badges
asked Dec 8, 2020 at 19:23
2
  • As you can see, both "work". Which is better depends wholly on how the data will be used and what the constraints are. If you want to normalize the data, the second is more ideal. If you're just planning on ingesting it into R for analysis, the former may be preferable. We don't know because we lack context. Commented Dec 8, 2020 at 20:37
  • The data will mostly be read (and not updated) by applications that further process it, be it for some statistical analysis or recurring reports. Commented Dec 8, 2020 at 20:55

1 Answer 1

1

Your second choice (long) is preferable for the reasons you outline, provided you use the right keys and constraints.

Something like this is probably in the right direction:

CREATE TABLE Currency
(
 CurrencyCd CHAR(3) NOT NULL /* This is the ISO currency code */
 ,CurrencyDesc VARCHAR(50) NOT NULL
 ,CONSTRAINT PK_Currency PRIMARY KEY (CurrencyCd)
 ,CONSTRAINT AK_Currency UNIQUE (CurrencyDesc)
 ,CONSTRAINT CK_CurrencyCd_Length CHECK (LEN(CurrencyCd) = 3)
)
GO
/* This is solely to prevent invalid values being added */
CREATE TABLE MaturityAge
(
 MaturityAgeYr TINYINT NOT NULL
 ,CONSTRAINT PK_Maturity PRIMARY KEY (MaturityAgeYr)
)
GO
CREATE TABLE CurrencyYield
(
 CurrencyCd CHAR(3) NOT NULL
 ,MaturityAgeYr TINYINT NOT NULL /* Possibly consider using months if finer detail possible */
 ,AsOfDt DATE NOT NULL
 ,Yield DECIMAL(9,1) NOT NULL /* Can increase decimal precision if necessary */
 ,CONSTRAINT FK_CurrencyYield_Of_Currency FOREIGN KEY (CurrencyCd) REFERENCES Currency (CurrencyCd)
 ,CONSTRAINT FK_CurrencyYield_At_Maturity FOREIGN KEY (MaturityAgeYr) REFERENCES MaturityAge (MaturityAgeYr)
 ,CONSTRAINT PK_CurrencyYield PRIMARY KEY (CurrencyCd, AsOfDt, MaturityAgeYr)
)
GO

Here the choice of the composite clustered index form by the primary key (CurrencyCd, AsOfDt, MaturityAgeYr) ensures the data will be (more or less) stored sorted according to those columns. That means any queries of the type WHERE CurrencyCd = 'EUR' will only read the relevant pages.

This choice of clustered index + storing the data in a "long" format will make "widening" the data trivial - so if you need to view the data as you provide in the first example, it's a simple PIVOT.

Other considerations:

  • If the volume of data is very large, but most of the values do not change, you can explore only storing changed values. It takes a little more code to retrieve the data for a given date, but you can create date tables/views to simplify that. If this is the case, let me know in the comments and I'll expand my answer.
  • Over time the table will fragment (a large number of records will not be co-located according to the clustered index), so the clustered index will need to be rebuilt occasionally. There are a number of readily available tools/scripts to make this relatively simple.
answered Dec 8, 2020 at 21:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.