I started working on an existing SQL Server Database System in which most of the fields are stored as text. Except some IDs all fields are varchar (phone numbers, zip codes, dates, addresses, monetary values etc.).
This is not how I learned to build a database. When asked collegues said that it is easier this way. Is it bad practice to keep all fields in varchar? How could I argue that it should be changed?
6 Answers 6
Size is one consideration. An int
can hold up to -2,147,483,648 in four bytes. A char
will need 11 bytes to hold the same value.
There are built-in functions to manipulate the various data types. DATEADD()
and DATEDIFF()
are two examples. This will not be possible with date-stored-as-text. Constantly CAST
ing back and forth will not make for efficient processing, or legible code.
Automatic validation is another foregone benefit with the all-text approach. You may think that a column contains dates but there will be nothing to stop someone entering the value '2014-13-97'.
Sorting is unlikely to give the intended result with columns which are "really" numbers. For example, if a column contained integers '1' through '100', and the query sorted by this column, one would expect the result to be
1, 2 .. 10, 11 .. 20, 21 .. 100
However, the actual result is more likely to be
1, 10, 100, 11, 12, 2, 20, 21 ...
There may be similar concerns for dates, depending on the chosen character representation. Again type casting can cure this with the costs mentioned previously.
Occasionally you will find strings which contain digits only. Examples are national identity numbers, bank account numbers, phone numbers and such like. The common characteristic of such values is that it does not make sense to perform calculations on them. Other things being equal it may be OK to store these as char()
columns, especially if they have optional embedded alpha characters, but the above considerations still apply.
Your colleague is correct that it is easier to simply not think about it and just store everything as a varchar. But this comes at a large cost in terms of space requirements, performance, flexibility in querying data, and most importantly, lack of data integrity. This is not just a one-time cost; it is paid repeatedly over the lifecycle of the application.
I suggest one follow the best practice and store data using the data type most appropriate for the domain of data stored, as you learned.
There are several situations in which it's better to represent numbers using some kind of numeric data-type. It's a little more efficient, but that's just the beginning. You get support for built-in arithmetic using SQL operators without performing type conversions at run time. Not only do type conversions slow things down, but they can result in numerous run time errors, and catching these errors requires added programming effort.
In general, it's more worthwhile to filter out bad data on the way in that to discover the error at retrieve time.
It's often worthwhile to distinguish between a number, such as an annual salary, and a numeric character string, such as a telephone number. Storing numeric character strings in VARCHAR containers is often good practice.
Zip codes should be stored as text, as some start with 0 (screwing up formatting/sorting) and there is no reason to do math on them. Also, if you want to store global postal codes, they often contain letters.
Phone numbers are a maybe for text, especially if there might be extensions. Or you want to store numbers like 1-800-GOT-JUNK. There's no reason to do mathematical operations on phone numbers, so text is probably fine.
Storing dates and money amounts as text is dumb and allows for invalid values and prevents operations like addition and subtraction.
ONE good reason for a table with all text fields is for CSV import, but this should be temp tables only.
There are different data types in sql to store the data. if the all data are same then varchar is good to store.
but in the future, you need the operation on the data, you can't because of limited functionality.
better to go with the data type of data.
I have spent most of my long career with pick/multi-value databases for programming complex business systems. They are very simple and I have never seen a business which is too large or complex for this type of database. A record is string of tab delimited text, no matter what the data type. In the programming language the text is converted into either a Date, Time, Number or left as text. The query language also recognizes these 4 basic types of data. When I create a relational database I use only varchar, int, date and possibly time or datetime.
Explore related questions
See similar questions with these tags.
Date/DateTime
which will be used in times you need sorting, and handling other functions. Arithmetic calculation in sql query would be a mess, if any row has unexpected datatype. Performance issue in future.2014-02-29
into avarchar
column. You can also rely on a column of type integer to never have a value offoobar
in it.