2

I'm creating a database of simulation results, and am trying to do so the right way. I've attempted to show relationships between data to make sure nothing redundant is thrown in.

My current structure is like this:

Experiment

Primary key: SimulationID

Tables:

  • measurement
  • sampling_rate
  • first_draft_flow_rate
  • second_draft_flow_rate
  • final_flow_rate

Foreign key relating these tables: cycle_count

Each table contains the primary key, foreign key, and values for the variables.

So for a given experiment, it runs for so many cycles (the number of cycles it runs for varies from simulation to simulation). We log data at each cycle of the simulation.

I've made a table called measurement, sampling_rate, etc, but am not sure what to name the columns. Should the also be called measurement, sampling_rate, etc? Or should I just use Value as the column name?

Here's a spreadsheet with a sample log to demonstrate what I'm working with. All of this data would be filed under a single SimulationID.

spreadsheet

Also, any tips on how best to design a DB to be normal/best practices would be greatly appreciated.

Erik
4,8434 gold badges29 silver badges58 bronze badges
asked Sep 29, 2015 at 20:08
2

3 Answers 3

2

This is not database administration, this is data modeling. Very different disciplines.

I'm guessing your main entity is Simulation and the tables you list describe it. You don't show the structure or content of these tables so the following is based on conjecture.

Measurement looks like it could be a list of measurement types: temperature, flow, particles per unit volume, etc. SamplingRates also looks like a list of valid rates: 1/sec, 10/sec, 100/sec, etc.

Finally there are three table that look like they should be one, FlowRates, that is also a lookup table.

This would mean a Simulation is the recorded results of, say, a temperature reading at a rate of 10 times per second of a 30 ml/sec flow.

Is that accurate? If so, here would be an example:

Measurements
 ID Name
 1 Temperature
 2 Particles per ml
SamplingRates
 ID Name Period
 1 1 sec
 2 10 sec
FlowRates
 ID Rate Unit Period
 1 10 ML sec
 2 20 ML sec
 2 30 ML sec

So the example Simulation entry would show a Measurement of 1, SamplingRate of 2 and FlowRate of 3 -- along with the results of the measurement, of course, and probably a timestamp of when the simulation was performed.

It would help a lot if you would give a plain language description of an experiment: "An experiment consists of any number of simulations. A simulation is made up of various readings of...made at a certain frequency based on ..." Don't think in terms of tables and columns. Pretend you're talking to a lab technician.

Update: When designing a table (and this pertains to naming the fields) it is generally a good idea to isolate the entity from all other entities -- that is, the naming should be done as context-free as possible. What that means is if you have a field that represents, say, the name or description of the entity, then by all means call those fields "Name" and "Description". No matter that you have dozens of other fields with identical names in tables scattered around the database.

A table has no context. It is the query that establishes a context.

select s.Name as NewSite, u1.Name as Owner, u2.Name as Manager
from Site s
join Users u1
 on u1.ID = s.OwnerID
join Users u1
 on u2.ID = s.MgrID
where s.Created > '2015-01-01';

Here are three tables each with the field Name -- it really doesn't matter that one table is used twice. Within each table, Name means "this is the name of the entity represented by this row."

The context established by this query is easily identified. It is looking at the owner and manager of all newly created sites and renames each Name field to suit the context. Different queries can and do use the fields in completely different contexts. As it is the query which sets the context, let the query rename the fields to whatever best suits that context.

Don't try to force a context with a table by naming the fields User_ID or User_Name and so forth. In a query, fields should be prefixed with the table name or alias anyway, so there's never any confusion.

where User.Name = 'John Smith'

Compare with

where User.User_Name = 'John Smith'

The extra "User_" adds no useful information. Besides, you're bound to hit something like this:

where ExtremelyLongTableName.ExtremelyLongTableName_SomewhatLongFieldName = ...

I'm exhausted just typing it the one time. Besides, some DBMSs limit the length of object names. Oracle, iirc, only considers the first 32 characters of an object name. I've hit that limit more than once in shops that use the tablename_fieldname convention. At that point you have to use abbreviations which is really messy.

Anyway, "best practices" is a fairly subjective concept. Opinions will vary. Choose what is most comfortable for you.

answered Sep 30, 2015 at 18:54
0
2

My preference would be to name the columns like final_flow_rate__value, etc, since value by itself can easily become very confusing if there are multiple columns named that.

Take for instance:

SELECT 
 measurement.value
 sampling_rate.value
 first_draft_flow_rate.value
 second_draft_flow_rate.value
 final_flow_rate.value
FROM 
 measurement
 INNER JOIN sampling_rate ON ...
 INNER JOIN first_draft_flow_rate ON ...
 INNER JOIN second_draft_flow_rate ON ...
 INNER JOIN final_flow_rate ON ...

Results will show column headings all saying just value.

The following returns column headers that make more sense.:

SELECT 
 measurement.measurement_value
 sampling_rate.sampling_rate_value
 first_draft_flow_rate.first_draft_flow_rate_value
 second_draft_flow_rate.second_draft_flow_rate_value
 final_flow_rate.final_flow_rate_value
FROM 
 measurement
 INNER JOIN sampling_rate ON ...
 INNER JOIN first_draft_flow_rate ON ...
 INNER JOIN second_draft_flow_rate ON ...
 INNER JOIN final_flow_rate ON ...

As a DBA charged with debugging other-people's-code, I especially hate it when every surrogate key in a database is named ID in the referenced table, and xxx_ID in the referencing table. For instance:

CREATE TABLE dbo.SomeTable
(
 ID INT NOT NULL
 CONSTRAINT PK_SomeTable
 PRIMARY KEY CLUSTERED
 , ...
);
CREATE TABLE dbo.SomeOtherTable
(
 ID INT NOT NULL
 CONSTRAINT PK_SomeOtherTable
 PRIMARY KEY CLUSTERED
 , SomeTable_ID INT NOT NULL
 CONSTRAINT FK_SomeOtherTable_SomeTable
 FOREIGN KEY REFERENCES dbo.SomeTable(ID)
 , ...
);

This pattern leads to hard-to-debug patterns such as:

SELECT SomeTableID = st.ID
 , SomeOtherTableID = sot.ID
FROM dbo.SomeTable st
 INNER JOIN dbo.SomeOtherTable sot ON st.ID = sot.ID;

Which should in fact be:

SELECT SomeTableID = st.ID
 , SomeOtherTableID = sot.ID
FROM dbo.SomeTable st
 INNER JOIN dbo.SomeOtherTable sot ON st.ID = sot.SomeTable_ID;

If the tables are defined as:

CREATE TABLE dbo.SomeTable
(
 SomeTable_ID INT NOT NULL
 CONSTRAINT PK_SomeTable
 PRIMARY KEY CLUSTERED
 , ...
);
CREATE TABLE dbo.SomeOtherTable
(
 SomeOtherTable_ID INT NOT NULL
 CONSTRAINT PK_SomeOtherTable
 PRIMARY KEY CLUSTERED
 , SomeTable_ID INT NOT NULL
 CONSTRAINT FK_SomeOtherTable_SomeTable
 FOREIGN KEY REFERENCES dbo.SomeTable(SomeTable_ID)
 , ...
);

With the same names across tables, this bug will almost NEVER occur, since the "wrong" version would be obvious to spot if implemented, and likely would never be implemented wrongly in the first place:

SELECT st.SomeTable_ID
 , sot.SomeOtherTable_ID
FROM dbo.SomeTable st
 INNER JOIN dbo.SomeOtherTable sot ON st.SomeTable_ID = sot.SomeOtherTable_ID;

The correct version, which is immensely more readable, is:

SELECT st.SomeTable_ID
 , sot.SomeOtherTable_ID
FROM dbo.SomeTable st
 INNER JOIN dbo.SomeOtherTable sot ON st.SomeTable_ID = sot.SomeTable_ID;

Columns that contain the same content across tables should be named precisely the same in every table they are defined in, if only for the sake of reducing negative legacy.

answered Sep 29, 2015 at 20:32
1
  • 1
    Excellent examples of why [ID] is a horrible field name. Commented Aug 30, 2016 at 19:47
1

Always try to think ahead, use table and field names that are self documenting (self-explanatory) whenever possible. Don't just string a bunch of letters together and assume the person to come behind you would have any idea what they're looking at. Naming things in this matter also makes it much easier to see what results your query is returning.

answered Sep 29, 2015 at 20:36
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.