Date and Time Data Types
 |
|
| You are reading part 2 from "Physical data storage in SQL Server 2005 and 2008," excerpted from A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008, by Eric Johnson and Joshua Jones, copyright 2008, printed with permission from Addison-Wesley Professional. |
|
|
 |
 |
When you need to store a date or time value, SQL Server provides you
with six data types. Knowing which type to use is important, because each
date and time data type provides a slightly different level of accuracy, and
that can make a huge difference when you're calculating exact times, as
well as durations. Let's look at each in turn.
Datetime and Smalldatetime
The datetime and smalldatetime data types can store date and time data in
a variety of formats; the difference is the range of values that each can
store. Datetime can hold values from January 1, 1753, through December
31, 9999, and can be accurate to 3.33 milliseconds. In contrast, smalldatetime
can store dates only from January 01, 1900, through June 6, 2079, and
is accurate only to 1 minute. For storage, datetime requires 8 bytes, and
smalldatetime needs only 4 bytes.
Date and Time
New in SQL Server 2008 are data types that split out the date portion and
the time portion of a traditional date and time data type. Literally, as the
names imply, these two data types account for either the date portion
(month, day, and year), or the time portion (hours, minutes, seconds, and
nanoseconds). Thus, if needed, you can store only one portion or the other
in a column.
The range of valid values for the date data type are the same as for the
datetime data type, meaning that date can hold values from January 1,
1753, through December 31, 9999. From a storage standpoint, date requires
only 3 bytes of space, with a character length of 10.
The time data type holds values 00:00:00.0000000 through
23:59:59.9999999 and can hold from 8 characters (hh:mm:ss) to 16 characters
(hh:mm:ss:nnnnnnn), where n represents fractional seconds. For example,
13:45:25.5 literally means that it is 1:45:25 and one-half second
p.m. You can specify the scale of the time data type from 0 to 7 to designate
how many digits you can use for fractional seconds. At its maximum,
the time data type requires 5 bytes of storage.
Datetime2
Another new data type in SQL Server 2008 is the datetime2 data type. This
is very similar to the original datetime data type, except that datetime2 incorporates
the precision and scale options of the time data type. You can specify the scale from 0 to 7, depending on how you want to divide and
store the seconds. Storage for this data type is fixed at 8 bytes, assuming a
precision of 7.
Datetimeoffset
The final SQL Server 2008 date and time data type addition is datetimeoffset.
This is a standard date and time data type, similar to datetime2 (because
it can store the precision). Additionally, datetimeoffset can store a
plus or minus 14-hour offset. It is useful in applications where you want to
store a date and a time along with a relative offset, such as when you're
working with multiple time zones. The storage requirement for datetimeoffset
is 10 bytes.
String Data Types
When it comes to storing string or character data, the choice and variations
are complex. Whether you need to store a single letter or the entire text of
War and Peace, SQL Server has a string data type for you. Fortunately,
once you understand the difference between the available string data
types, choosing the correct one is straightforward.
Char and Varchar
Char and varchar are probably the most used of the string data types. Each
stores standard, non-Unicode text data. The differences between the two
lie mostly in the storage of the data. In each case, you must specify a length
when defining a column as char or varchar. The length sets the limit on the
number of characters the column can hold.
Here's the kicker: The char data type always requires the same number
of bytes for storage as you have specified for the length. If you have a
char(20), it will always require 20 bytes of storage, even if you store only a
5-character word in the column. With a varchar, the storage is always the
actual number of characters you have stored plus 2 bytes. So a varchar(20)
with a 5-character word will take up 7 bytes, with the extra 2 bytes holding
a size reference for SQL Server. Each type can have a length of as many as
8,000 characters.
When do you use one over the other? The rule of thumb is to use char
when all the data will be close to the same length, and use varchar when
the data will vary a great deal. Following this rule should make for optimum
storage.
Another tip is to avoid using varchar for short columns. We have seen
databases use varchar(2) columns, and the result is wasted space. Let's assume
you have 100 rows in your table and the table contains a varchar(2)
column. Assuming all the columns are NULL, you still need to store the
2 bytes of overhead, so without storing any data you have already taken up
as much space as you would using char(2).
One other special function of varchar is the max length option. When
you specify max as the length, your varchar column can store as much as
2^31–1 bytes of data, which is about 2 trillion bytes, or approximately 2GB
of string data. If you don't think that's a lot, open your favorite text editor
and start typing until you reach a 2GB file. Go on, we'll wait. It's a lot of information
to cram into a single column. Varchar(max) was added to SQL
Server in the 2005 release and was meant to replace the text data type from
previous versions of SQL Server.
Nchar and Nvarchar
The nchar and nvarchar data types work in much the same way as the char
and varchar data types, except that the n versions store Unicode data.
Unicode is most often used when you need to store non-English language
strings that require special characters such as the Greek letter beta (_).
Because Unicode data is a bit more complex, it requires 2 bytes for each
character, and thus an nchar requires double the length in bytes for storage,
and nvarchar requires double the actual number of characters plus the
obligatory 2 bytes of overhead.
From our earlier discussion, recall that SQL Server stores tables in
8,060-byte pages. Well, a single column cannot span a page, so some simple
math tells us that when using these Unicode data types, you will reach
8,000 bytes when you have a length of 4,000. In fact, that is the limit for
the nchar and nvarchar data types. Again, you can specify nvarchar(max),
which in SQL Server 2005 replaced the old ntext data type.
Binary and Varbinary
Binary and varbinary function in exactly the same way as char and varchar.
The only difference is that these data types hold binary information such
as files or images. As before, varbinary(max) replaces the old image data
type. In addition, SQL Server 2008 allows you to specify the filestream attribute
of a varbinary(max) column, which switches the storage of the
BLOB. Instead of being stored as a separate file on the file system, it is
stored in SQL Server pages on disk.
Text, Ntext, and Image
As mentioned earlier, the text, ntext, and image data types have been
replaced with the max length functionality of varchar, nvarchar, and
varbinary, respectively. However, if you are running on an older version or
upgrading to SQL Server 2005 or SQL Server 2008, you may still need
these data types. The text data type holds about 2GB of string data, and
ntext holds about 1GB of Unicode string data. Image is a variable-length
binary field and can hold any binary data, up to about 2GB. When using
these data types, you must use certain functions to write, update, and read
to the columns; you cannot just do a simple update. Keep in mind that
these three data types have been replaced, and Microsoft will likely remove
them from future releases of SQL Server.
Other Data Types
In addition to the standard numeric and string data types, SQL Server
2008 provides several other useful data types. These additional types allow
you to store XML data, globally unique identifiers (GUIDs), hierarchical identities, and spatial data types. There is also a new file storage data type
that we'll talk about shortly.
Sql_variant
A column defined as sql_variant can store most any data that can be stored
in the other SQL Server data types. The only data you cannot put into a
sql_variant are text, ntext, image, xml, timestamp, or the max length data
types. Using sql_variant you can store various data types in the same column
of a table. As you will read in Chapter 4, this is not the best practice
from a modeling standpoint. That said, there are some good uses for
sql_variant, such as building a staging table when you're loading less-thanperfect
data from other sources. The storage requirement for a sql_variant
depends on the type of data you put in the column.
Timestamp
This data type has a somewhat misleading name. In fact timestamp does
not store any sort of time or date information. Instead, timestamp is a binary
number that is automatically incremented each time an insert or update
happens to a table containing the timestamp column. The counter for
the timestamp column is stored for the entire database, and each table is
allowed to have only a single timestamp column. In this way, you can tell
in what order various operations have happened in your database, or you
can implement row versioning.
We once used timestamp to archive a large database. Each night we
would run a job to grab all the rows from all the tables where the timestamp
was greater than the last row copied the night before. Timestamps
require 8 bytes of storage, and remember, 8 bytes can add up fast if you
add timestamps to all your tables.
Uniqueidentifier
The uniqueidentifier data type is probably one of the most interesting data
types available, and it is the topic of much debate. Basically, a uniqueidentifier
column holds a GUID—a string of 32 random characters in blocks
separated by hyphens. For example, the following is a valid GUID:
45E8F437-670D-4409-93CB-F9424A40D6EE
Why would you use a uniqueidentifier column? First, when you generate
a GUID, it will be a completely unique value and no other GUID in
the world will share the same string. This means that you can use GUIDs
as PKs on your tables if you will be moving data between databases. This
technique prevents duplicate PKs when you actually copy data.
When you're using uniqueidentifier columns, keep in mind a couple of
things. First, they are pretty big, requiring 16 bytes of storage. Second, unlike
timestamps or identity columns (see the section on primary keys later
in this chapter), a uniqueidentifier does not automatically have a new
GUID assigned when data is inserted. You must use the NEWID function
to generate a new GUID when you insert data. You can also make the default
value for the column NEWID(). In this way, you need not specify
anything for the uniqueidentifier column; SQL Server will insert the
GUID for you.
Xml
The xml data type is a bit outside the scope of this book, but we'll say a few
words about it. Using the xml data type, SQL Server can hold Extensible
Markup Language (XML) data in a column. Additionally, you can bind an
XML schema to the column to constrain the XML data being stored. Like
the max data types, the xml data type is limited to 2GB of storage.
Table
A table data type can store the result set of T-SQL statements for processing
later. The data is stored in a similar fashion to the way an entire table
is stored. It is important to note that the table data type cannot be used on columns; it can be used only in variables in T-SQL code. Programming in
SQL Server is beyond the scope of this book, but the table data type plays
an important role in user-defined functions, which we discuss shortly.
Table variables behave in the same way as base tables. They contain
columns and can have check constraints, unique constraints, and primary
keys. As with base tables, a table variable can be used in SELECT, INSERT,
UPDATE, and DELETE statements. Like other local variables,
table variables exist in the scope of the calling function and are cleaned up
when the calling module finishes executing. To use table variables, you declare
them like any other variable and provide a standard table definition
to the declaration.
Hierarchyid
The hierarchyid data type is a system-provided data type that allows you to
store hierarchical data, such as organizational data, project tasks, or file system–
style data in a relational database table. Whenever you have selfreferencing
data in a tiered format, hierarchyid allows you to store and
query the data more efficiently. The actual data in a hierarchyid is represented
as a series of slashes and numerical designations. This is a specialized
data type and is used only in very specific instances.
Spatial Data Types
SQL Server 2008 also introduces the spatial data types for relational storage.
The first of the two new data types is geometry, which allows you to
store planar data about physical locations (distances, vectors, etc.). The
other data type, geography, allows you to store round earth data such as latitude
and longitude coordinates. Although this is oversimplifying, these
data types allow you to store information that can help you determine the
distance between locations and ways to navigate between them.
User-Defined Data Types
In addition to the data types we have described, SQL Server allows you to
create user-defined data types. With user-defined data types, you can
create standard columns for use in your tables. When defining userdefined
data types, you still must use the standard data types that we have
described here as a base. A user-defined data type is really a fixed definition
of a data type, complete with length, precision, or scale as applicable.
For example, if you need to store phone numbers in various tables in
your database, you can create a phone number data type. If you create the phone number data type as a varchar(25), then every column that you define
as a phone number will be exactly the same, a varchar(25). As you recall
from the discussion of domains in Chapter 2, user-defined data types
are the physical implementation of domains in SQL Server. We highly recommend
using user-defined data types for consistency, both during the initial
development and later during possible additions to your data model.
Check out the authors' website where they co-host a podcast show for IT professionals.