Home > SQL Server Tips > Data Warehousing and Business Intelligence > Table partitioning with SQL Server 2005
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

Table partitioning with SQL Server 2005


By Baya Pavliashvili
01.15.2008
Rating: -3.91- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Performance tuning is the ultimate responsibility of every valuable SQL Server administrators. Although ensuring data security and availability are our topmost goals, if the applications don't perform to the users' expectations, DBAs often get the credit, or blame, for poor design and implementation. SQL Server 2005 offers numerous improvements in terms of database performance, not the least of which is table partitioning. If you haven't examined the table partitioning feature, do yourself a favor and take time to read this tip.

The concept of partitioning tables is not new; if you've worked as a SQL Server DBA for a while, you've probably had to archive data in a heavy transactional table once the historical data is no longer accessed frequently. For example, if you have a time reporting application, your reports probably query records from the year 1995 very seldom; most budgetary projections will be based on records from more recent years.

With earlier releases of SQL Server, you could create multiple tables. Each table would consist of the same columns and store data from different years. Then you could create a view which queried each of these tables, if accessing historical data was necessary. Having multiple tables is handy because scanning through small tables will be quicker than querying a huge table. But this benefit is only available if you know ahead of time which time period you wish to retrieve to get the data. You would also have to create a new table and move data every so often as data became obsolete.

SQL Server 7 and SQL Server 2000 support the concept of distributed partitioned views (sometimes referred to as materialized views). Distributed partitioned views consist of tables with the same schema on multiple servers, and you have to add linked server definitions to each server and create a view on one of the servers to union the data returned from each server. The idea is that the database engine would exploit the processing power of multiple servers to satisfy queries.

Distributed partitioned views (DPVs) have many limitations, which you can read about in SQL Server online documentation. Although DPVs provide performance benefits in some cases, they aren't particularly widespread. Arguably they don't offer a silver bullet for scaling enterprise-level applications. Furthermore, implementing DPVs is a laborious process, requiring much work on behalf of a DBA.

SQL Server 2005 supports table partitioning, whereby all partitions reside on a single server. Each table partition is associated with a single file that is part of a filegroup. The same file/filegroup can hold multiple partitioned tables.

With this architecture, the database engine determines which partition to poll at query execution time, without scanning the entire table. If data rows requested by the query reside in multiple partitions, SQL Server uses multiple processors to query more than one partition in parallel. You can partition indexes along with the tables on which they are created. Seeking or scanning through a smaller index is considerably faster than scanning the entire table or scanning the clustered index on a large table. So table partitioning can yield significant performance benefits for queries executed against large tables.

Let's look at a quick example of how table partitioning works. I won't go into the details of partitioning syntax in this tip, but you can look it up in SQL Server's online documentation. The example below is based on the data warehouse containing time reporting system's records. I will create seven filegroups in addition to the default filegroup, and each filegroup will contain a single file that will store a subset of data as defined by the partition function:

To test performance benefits of table partitioning, I added 15 million rows to the partitioned table, as well as an identical table that was not partitioned. The INSERT statement running against the partitioned table was considerably faster. Even on my laptop, which has less than a Gigabyte of RAM, the INSERT statement against the partitioned table was three times faster than against a table created on a single filegroup. Of course query execution times will vary depending on hardware resources at hand, so you can see a different level of improvement in your environment.

I further examined the execution plan of a simple SELECT statement returning all rows from partitioned and un-partitioned tables, given a specific date range in the WHERE clause. The same query had different execution plans for the two tables. The query against the partitioned table shows nested loops and a clustered index scan. Essentially, SQL Server treats the two partitions as separate tables and joins them together using a nested loop. The same query executed against the table without partitions uses a clustered index scan to return the same row set. Performance gains are even more significant if you have multiple tables created using the same partitioning scheme, and your queries join such tables.

You can examine the number of rows in each table partition by using the following query:

SELECT $PARTITION.TimeEntryDateRangePFN(time_entry_date) AS Partition,
COUNT(*) AS [COUNT] FROM fact_time_entry
GROUP BY $PARTITION.TimeEntryDateRangePFN(time_entry_date)
ORDER BY Partition

Table partitioning is a great feature for transactional as well as data warehousing environments. One of the main complaints from data warehouse users is that populating the fact table sometimes takes too long. While loading data into the fact table, the performance of user queries (or cube processing queries) will be significantly slower and might not succeed at all. Therefore, loading a large number of rows into a fact table might require downtime. With table partitioning this doesn't have to be the case any longer – you can populate the fact table in a blink of an eye -- literally. To demonstrate how this works, I'll create another table using the same partition function and partition scheme as before, this time called fact_time_entry2. I will seed the identity column at 50 million so the fact_time_entry2 table won't contain any keys that are also present in fact_time_entry:

Next I will populate this table with data for the year 2007. And let's suppose that fact_time_entry table has data for years before 2007. Once fact_time_entry2 table is populated, I can execute the following statement:

ALTER TABLE fact_time_entry2
SWITCH PARTITION 8 TO fact_time_entry PARTITION 8

This statement moves partition number 8, which happens to have data for 2007 from fact_time_entry2 into fact_time_entry and takes 3 milliseconds on my laptop. Within 3 milliseconds, my fact table has 5 million additional records! Yes, I had to populate the intermediate table prior to swapping partitions, but my users never need to worry about that – the fact table is available for querying with zero downtime! Behind the scenes, no data movement takes place – only the metadata for two tables is changed.

I could use a similar query to remove the data no longer needed in the fact table. For example, let's suppose we decided that we no longer care about time entry records from 2004. The following statement moves such records to the work table we created:

ALTER TABLE fact_time_entry
SWITCH PARTITION 2 TO fact_time_entry2 PARTITION 2

Again this statement completes in a matter of milliseconds. At this point I can drop fact_time_entry2 table or move it to another server. My fact table will no longer have any records from 2004. The partition must already exist in the destination table and it must be empty. You cannot switch partitions to a replicated table. The source and destination tables must be partitioned the same way, and the data to be transferred must reside in the same filegroup for both tables. Even with these limitations, the ability to switch partitions and populate fact tables with zero downtime is bound to make SQL Server data warehouse implementations more successful than ever before.


ABOUT THE AUTHOR:   
Baya Pavliashvili is a DBA manager overseeing database operations that support more than one million users. Pavliashvili's primary areas of expertise are performance tuning, replication and data warehousing.
Copyright 2007 TechTarget

More on SearchSQLServer.com

  • Business intelligence features in SQL Server 2005
  • Get techniques to boost data warehouse performance in this tip
  • Segregating data for optimum parallelization

    Rate this Tip
    To rate tips, you must be a member of SearchSQLServer.com.
    Register now to start rating these tips. Log in if you are already a member.




    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


    RELATED CONTENT
    SQL Server data warehousing/business intelligence
    Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
    Troubleshoot Web service issues in SQL Server 2005 Reporting Services
    Ordering the results of a SQL query
    SSIS error message due to installation problem on SQL Server 2005
    Using MDX and UDM in a SQL Server Analysis Services environment
    Configuring SQL Server with a changed computer name
    Change data capture in SQL Server 2008 improves BI reporting accuracy
    Manage traces in SQL Server 2005 Analysis Services with XMLA commands
    Tutorial: SQL Server 2005 Analysis Services
    Top 10 SQL Server Integration Services (SSIS) and DTS tips

    SQL Server 2005 (Yukon)
    SQL Server consolidation: Why it's an optimization technique
    Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
    Using SQL Server datetime functions GETDATE, DATENAME and DATEPART
    SSIS error message due to installation problem on SQL Server 2005
    SQL Server data conversions from date/time values to character types
    Basics for working with DATETIME and SMALLDATETIME in SQL Server 2005
    How to configure Database Mail in SQL Server 2005 to send mail
    How to use rank function in SQL Server 2005
    Manage traces in SQL Server 2005 Analysis Services with XMLA commands
    Tutorial: SQL Server 2005 Analysis Services
    SQL Server 2005 (Yukon) Research

    SQL Server upgrades and patches
    SQL Server consolidation: Why it's an optimization technique
    Should you upgrade to SQL Server 2005 or SQL Server 2008?
    Monitor database mirroring and replication after a SQL Server upgrade
    Upgrade live applications to SQL Server 2005 for high availability
    SQL Server high availability when upgrading to SQL Server 2005
    How to restore SQL Server database to transition server during upgrade
    Top 10 SQL Server Integration Services (SSIS) and DTS tips
    Upgrade Active/Active cluster to SQL Server 2005 and Windows 2003
    FAQ: SQL Server databases how-to
    Upgrading to SQL Server 2008 advantages and hardware requirements

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    data aggregation  (SearchSQLServer.com)
    data preprocessing  (SearchSQLServer.com)
    data warehouse  (SearchSQLServer.com)
    FileMaker  (SearchSQLServer.com)
    GIS  (SearchSQLServer.com)
    MOLAP  (SearchSQLServer.com)
    pivot table  (SearchSQLServer.com)
    Quiz: SQL Server 2000  (SearchSQLServer.com)
    SQL  (SearchSQLServer.com)
    T-SQL  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary

    DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

  • HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




    All Rights Reserved, Copyright 2005 - 2008, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts