For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them.
Contents at a Glance About the Author������������������������������������������������������������������������������������������������������������� xxiii About the Technical Reviewers���������������������������������������������������������������������������������������� xxv Acknowledgments���������������������������������������������������������������������������������������������������������� xxvii Introduction��������������������������������������������������������������������������������������������������������������������� xxix
■■Part 1: Tables and Indexes��������������������������������������������������������������������������� 1 ■■Chapter 1: Data Storage Internals�������������������������������������������������������������������������������������3 ■■Chapter 2: Tables and Indexes: Internal Structure and Access Methods������������������������29 ■■Chapter 3: Statistics��������������������������������������������������������������������������������������������������������53 ■■Chapter 4: Special Indexing and Storage Features���������������������������������������������������������81
■■Chapter 5: Index Fragmentation������������������������������������������������������������������������������������113 ■■Chapter 6: Designing and Tuning the Indexes���������������������������������������������������������������125
■■Part 2: Other things that matter��������������������������������������������������������������� 149 ■■Chapter 7: Constraints���������������������������������������������������������������������������������������������������151 ■■Chapter 8: Triggers��������������������������������������������������������������������������������������������������������165 ■■Chapter 9: Views�����������������������������������������������������������������������������������������������������������181 ■■Chapter 10: User-Defined Functions������������������������������������������������������������������������������195 ■■Chapter 11: XML������������������������������������������������������������������������������������������������������������209 ■■Chapter 12: Temporary Tables���������������������������������������������������������������������������������������233
■ Contents at a Glance
■■Chapter 13: CLR�������������������������������������������������������������������������������������������������������������255 ■■Chapter 14: CLR Types���������������������������������������������������������������������������������������������������275 ■■Chapter 15: Data Partitioning����������������������������������������������������������������������������������������301 ■■Chapter 16: System Design Considerations������������������������������������������������������������������349
Introduction Several people asked me the same question during the time I worked on this book. “Why have you decided to write yet another book on SQL Server Internals? There are plenty of books on this subject out there, including an excellent one by Kalen Delaney et al., the latest version being entitled, Microsoft SQL Server 2012 Internals, Developer Reference series (Microsoft Press 2013). To be absolutely honest, I asked myself the same question while I toyed with the idea of writing that book. In the end, I defined two goals:
I wanted to write a book that explains how SQL Server works while keeping the content as practical as possible.
I wanted the book to be useful to both database administrators and developers.
There is a joke in SQL Server community: “How do you distinguish between junior- and senior-level database professionals? Just ask them any question about SQL Server. The junior-level person gives you the straight answer. The senior-level person, on the other hand, always answers, “It depends.” As strange as it sounds, that is correct. SQL Server is a very complex product with a large number of components that depend on each other. You can rarely give a straight yes or no answer to any question. Every decision comes with its own set of strengths and weaknesses and leads to consequences that affect other parts of the system. This book talks about on what “it depends”. My goal is to give you enough information about how SQL Server works and to show you various examples of how specific database designs and code patterns affect SQL Server behavior. I tried to avoid generic suggestions based on best practices. Even though those suggestions are great and work in a large number of cases, there are always exceptions. I hope that, after you read this book, you will be able to recognize those exceptions and make decisions that benefit your particular systems. My second goal is based on the strong belief that the line between database administration and development is very thin. It is impossible to be a successful database developer without knowledge of SQL Server Internals. Similarly, it is impossible to be a successful database administrator without the ability to design efficient database schema and to write good T-SQL code. That knowledge also helps both developers and administrators to better understand and collaborate with each other, which is especially important nowadays in the age of agile development and multi-terabyte databases. I have worn both hats in my life. I started my career in IT as an application developer, slowly moving to backend and database development over the years. At some point, I found that it was impossible to write good T-SQL code unless I understood how SQL Server executes it. That discovery forced me to learn SQL Server Internals, and it led to a new life where I design, develop, and tune various database solutions. I do not write client applications anymore; however, I perfectly understand the challenges that application developers face when they deal with SQL Server. I have “been there and done that.” I still remember how hard it was to find good learning materials. There were plenty of good books; however, all of them had a clear separation in their content. They expected the reader either to be developer or database administrator — never both. I tried to avoid that separation in this book. Obviously, some of the chapters are more DBA-oriented, while others lean more towards developers. Nevertheless, I hope that anyone who is working with SQL Server will find the content useful. Nevertheless, do not consider this book a SQL Server tutorial. I expect you to have previous experience working with relational databases — preferably with SQL Server. You need to know RDBMS concepts, be familiar with different types of database objects, and be able to understand SQL code if you want to get the most out of this book.
Finally, I would like to thank you for choosing this book and for your trust in me. I hope that you will enjoy reading it as much as I enjoyed writing it.
How This Book Is Structured The book is logically separated into eight different parts. Even though all of these parts are relatively independent of each other, I would encourage you to start with Part 1, “Tables and Indexes” anyway. This part explains how SQL Server stores and works with data, which is the key point in understanding SQL Server Internals. The other parts of the book rely on this understanding. The Parts of the book are as follows: Part 1: Tables and Indexes covers how SQL Server works with data. It explains the internal structure of database tables; discusses how and when SQL Server uses indexes, and provides you with the basic guidelines about how to design and maintain them. Part 2: Other Things That Matter provides an overview of different T-SQL objects, and it outlines their strengths and weaknesses along with use-cases when they should or should not be used. Finally, this part discusses data partitioning, and provides general system design considerations for systems that utilize SQL Server as a database backend. Part 3: Locking, Blocking, and Concurrency talks about the SQL Server concurrency model. It explains the root-causes of various blocking issues in SQL Server, and it shows you how to troubleshoot and address them in your systems. Finally, this part provides you with a set of guidelines on how to design transaction strategies in a way that improves concurrency in systems. Part 4: Query Life Cycle discusses the optimization and execution of queries in SQL Server. Moreover, it explains how SQL Server caches execution plans and it demonstrates several plan-caching–related issues commonly encountered in systems. Part 5: Practical Troubleshooting provides an overview of the SQL Server Execution Model, and it explains how you can quickly diagnose systems and pinpoint the root-causes of the problems. Part 6: Inside the Transaction Log explains how SQL Server works with the transaction log, and it gives you a set of guidelines on how to design Backup and High Availability strategies in systems. Part 7: In-Memory OLTP Engine (Hekaton) talks about the new in-memory OLTP engine introduced in SQL Server 2014. It explains how Hekaton works internally and how you can work with memory-optimized data in your systems. Part 8: Columnstore Indexes provides an overview of columnstore indexes, which can dramatically improve the performance of Data Warehouse solutions. It covers nonclustered columnstore indexes, which were introduced in SQL Server 2012, along with clustered columnstore indexes, introduced in SQL Server 2014. As you may have already noticed, this book covers multiple SQL Server versions including the recently released SQL Server 2014. I have noted version-specific features whenever necessary; however, most of the content is applicable to any SQL Server version, starting with SQL Server 2005. It is also worth noting that most of the figures and examples in this book were created in the Enterprise Edition of SQL Server 2012 with parallelism disabled on the server level in order to simplify the resulting execution plans. In some cases, you may get slightly different results when you run scripts in your environment using different versions of SQL Server.
Downloading the Code You can download the code used in this book from the Source Code section of the Apress web site (www.apress.com) or from the Publications section of my blog (http://aboutsqlserver.com). The source code consists of SQL Server Management Studio solutions, which include a set of the projects (one per chapter). Moreover, it includes several .Net C# projects, which provide the client application code used in the examples in Chapters 12, 13, 14, and 16.
Contacting the Author You can visit my blog at: http://aboutsqlserver.com or email me at: firstname.lastname@example.org.
Tables and Indexes
Data Storage Internals SQL Server database is a collection of objects that allow you to store and manipulate data. In theory, SQL Server supports 32,767 databases per instance, although the typical installation usually has only several databases. Obviously, the number of the databases SQL Server can handle depends on the load and hardware. It is not unusual to see servers hosting dozens or even hundreds of small databases. In this chapter, we will discuss the internal structure of the databases, and will cover how SQL Server stores the data.
Database Files and Filegroups Every database consists of one or more transaction log files and one or more data files. A transaction log stores information about database transactions and all of the data modifications made in each session. Every time the data is modified, SQL Server stores enough information in the transaction log to undo (rollback) or redo (replay) this action.
■■Note We will talk about the transaction log in greater detail in Part 6 of this book “Inside the Transaction Log.” Every database has one primary data file, which by convention has an .mdf extension. In addition, every database can also have secondary database files. Those files, by convention, have .ndf extensions. All database files are grouped into the filegroups. A filegroup is a logical unit that simplifies database administration. It permits the logical separation of database objects and physical database files. When you create database objects-tables, for example, you specify into what filegroup they should be placed without worrying about the underlying data files’ configuration. Listing 1-1 shows the script that creates a database with name OrderEntryDb. This database consists of three filegroups. The primary filegroup has one data file stored on the M: drive. The second filegroup, Entities, has one data file stored on the N: drive. The last filegroup, Orders, has two data files stored on the O: and P: drives. Finally, there is a transaction log file stored on the L: drive. Listing 1-1. Creating a database create database [OrderEntryDb] on primary (name = N'OrderEntryDb', filename = N'm:\OEDb.mdf'), filegroup [Entities] (name = N'OrderEntry_Entities_F1', filename = N'n:\OEEntities_F1.ndf'), filegroup [Orders] (name = N'OrderEntry_Orders_F1', filename = N'o:\OEOrders_F1.ndf'),
You can see the physical layout of the database and data files in Figure 1-1. There are five disks with four data files and one transaction log file. The dashed rectangles represent the filegroups.
Figure 1-1. Physical layout of the database and data files The ability to put multiple data files inside a filegroup lets us spread the load across different storage drives, which could help to improve the I/O performance of the system. Transaction log throughput, on the other hand, does not benefit from multiple files. SQL Server works with transactional logs sequentially, and only one log file would be accessed at any given time.
■■Note We will talk about the transaction Log internal structure and best practices associated with it in Chapter 29, “Transaction Log Internals.” Let's create a few tables, as shown in Listing 1-2. The Clients and Articles tables are placed into the Entities filegroup. The Orders table resides in the Orders filegroup. Listing 1-2. Creating tables create table dbo.Customers ( /* Table Columns */ ) on [Entities];
Figure 1-2 shows physical layout of the tables in the database and disks.
Figure 1-2. Physical layout of the tables The separation between logical objects in the filegroups and the physical database files allow us to fine-tune the database file layout to get the most out of the storage subsystem without worrying that it breaks the system. For example, independent software vendors (ISV), who are deploying their products to different customers, can adjust the number of database files during the deployment stage based on the underlying I/O configuration and the expected amount of data. These changes will be transparent to developers who are placing the database objects into the filegroups rather than into database files. It is generally recommended to avoid using the PRIMARY filegroup for anything but system objects. Creating a separate filegroup or set of the filegroups for the user objects simplifies database administration and disaster recovery, especially in the case of large databases. We will discuss this in great detail in Chapter 30, “Designing a Backup Strategy.” You can specify initial file sizes and auto-growth parameters at the time that you create the database or add new files to an existing database. SQL Server uses a proportional fill algorithm when choosing to which data file it should write data. It writes an amount of data proportional to the free space available in the files—the more free space a file has, the more writes it handles.
■■Tip It is recommended that all files in a single filegroup have the same initial size and auto-growth parameters with grow size being defined in megabytes rather than by percent. This helps the proportional fill algorithm to balance write activities evenly across data files. Every time SQL Server grows the files, it fills the newly allocated space with zeros. This process blocks all sessions that are writing to the corresponding file or, in case of transaction log growth, generating transaction log records. SQL Server always zeros out the transaction log, and this behavior cannot be changed. However, you can control if data files are zeroed out or not by enabling or disabling Instant File Initialization. Enabling Instant File Initialization helps speed up data file growth and reduces the time required to create or restore the database.
Chapter 1 ■ Data Storage Internals
■■Note There is a small security risk associated with Instant File Initialization. When this option is enabled, an unallocated part of the data file can contain information from previously deleted OS files. Database administrators are able to examine such data. You can enable Instant File Initialization by adding an SA_MANAGE_VOLUME_NAME permission, also known as Perform Volume Maintenance Task, to the SQL Server startup account. This can be done under the Local Security Policy management application (secpol.msc), as shown in Figure 1-3. You need to open the properties for the “Perform volume maintenance task” permission, and add a SQL Server startup account to the list of users there.
Figure 1-3. Enabling Instant File Initialization in secpol.msc
■■Tip SQL Server checks to see if Instant File Initialization is enabled on startup. You need to restart SQL Server service after you give the corresponding permission to the SQL Server startup account. In order to check if Instant File Initialization is enabled, you can use the code shown in Listing 1-3. This code sets two trace flags that force SQL Server to put additional information into the error log, creates a small database, and reads the content of the error log file. Listing 1-3. Checking to see if Instant File Initialization is enabled dbcc traceon(3004,3605,-1) go
create database Dummy go
exec sp_readerrorlog go
Chapter 1 ■ Data Storage Internals
drop database Dummy go
dbcc traceoff(3004,3605,-1) go
If Instant File Initialization is not enabled, the SQL Server error log indicates that SQL Server is zeroing out the .mdf data file in addition to zeroing out the log .ldf file, as shown in Figure 1-4. When Instant File Initialization is enabled, it would only show zeroing out of the log .ldf file.
Figure 1-4. Checking if Instant File Initialization is enabled - SQL Server error log Another important database option that controls database file sizes is Auto Shrink. When this option is enabled, SQL Server shrinks the database files every 30 minutes, reducing their size and releasing the space to operating system. This operation is very resource intensive and rarely useful, as the database files grow again when new data comes into the system. Moreover, it greatly increases index fragmentation in the database. Auto Shrink should never be enabled. Moreover, Microsoft will remove this option in future versions of SQL Server.
■■Note We will talk about index fragmentation in greater detail in Chapter 5, “Index Fragmentation.”
Data Pages and Data Rows The space in the database is divided into logical 8KB pages. These pages are continuously numbered starting with zero, and they can be referenced by specifying a file ID and page number. The page numbering is always continuous such that when SQL Server grows the database file, new pages are numbered starting from the last highest page number in the file plus one. Similarly, when SQL Server shrinks the file, it removes the highest number pages from the file. Figure 1-5 shows the structure of a data page.
Chapter 1 ■ Data Storage Internals
Figure 1-5. The data page structure A 96-byte page header contains various pieces of information about a page, such as the object to which the page belongs, the number of rows and amount of free space available on the page, links to the previous and next pages if the page is in an index page chain, and so on. Following the page header is the area where actual data is stored. This is followed by free space. Finally, there is a slot array, which is a block of 2-byte entries indicating the offset at which the corresponding data rows begin on the page. The slot array indicates the logical order of the data rows on the page. If data on a page needs to be sorted in the order of the index key, SQL Server does not physically sort the data rows on the page, but rather it populates the slot array based on the index sort order. The slot 0 (rightmost in Figure 1-5) stores the offset for the data row with the lowest key value on the page; slot 1, the second lowest key value; and so forth.
■■Note We will discuss indexes in greater detail in Chapter 2, “Internal Structure and Access Patterns.” SQL Server offers a rich set of the system data types that can be logically separated into two different groups: fixed length and variable length. Fixed-length data types, such as int, datetime, char, and others always use the same amount of storage space regardless of their value, even when it is NULL. For example, the int column always uses 4 bytes and an nchar(10) column always uses 20 bytes to store information. In contrast, variable-length data types, such as varchar, varbinary, and a few others, use as much storage space as required to store data plus two extra bytes. For example an nvarchar(4000) column would use only 12 bytes to store a five-character string and, in most cases, 2 bytes to store a NULL value. We will discuss the case where variable-length columns do not use storage space for NULL values later in this chapter. Let's look at the structure of a data row, as shown in Figure 1-6.
Chapter 1 ■ Data Storage Internals
Figure 1-6. Data row structure The first 2 bytes of the row, called Status Bits A and Status Bits B, are bitmaps that contain information about the row, such as row type; if the row has been logically deleted (ghosted); and if the row has NULL values, variable-length columns, and a versioning tag. The next two bytes in the row are used to store the length of the fixed-length portion of the data. They are followed by the fixed-length data itself. After the fixed-length data portion, there is a null bitmap, which includes two different data elements. The first 2-byte element is the number of columns in the row. It is followed by a null bitmap array. This array uses one bit for each column of the table, regardless of whether it is nullable or not. A null bitmap is always present in data rows in heap tables or clustered index leaf rows, even when the table does not have nullable columns. However, the null bitmap is not present in non-leaf index rows nor leaf-level rows of nonclustered indexes when there are no nullable columns in the index.
■■Note We will talk about indexes in greater detail in Chapter 2, “Internal Structure and Access Patterns.” Following the null bitmap, there is the variable-length data portion of the row. It starts with a two-byte number of variable-length columns in the row followed by a column-offset array. SQL Server stores a two-byte offset value for each variable-length column in the row, even when value is null. It is followed by the actual variable-length portion of the data. Finally, there is an optional 14-byte versioning tag at the end of the row. This tag is used during operations that require row-versioning, such as an online index rebuild, optimistic isolation levels, triggers, and a few others.
■■Note We will discuss Index Maintenance in Chapter 5; Triggers in Chapter 8; and Optimistic Isolation Levels in Chapter 21.
Chapter 1 ■ Data Storage Internals
Let's create a table, populate it with some data, and look at the actual row data. The code is shown in Listing 1-4. The Replicate function repeats the character provided as the first parameter 255 times. Listing 1-4. The data row format: Table creation create table dbo.DataRows ( ID int not null, Col1 varchar(255) null, Col2 varchar(255) null, Col3 varchar(255) null );
insert into dbo.DataRows(ID, Col1, Col3) values (1,replicate('a',255),replicate('c',255));
insert into dbo.DataRows(ID, Col2) values (2,replicate('b',255));
dbcc ind ( 'SQLServerInternals' /*Database Name*/ ,'dbo.DataRows' /*Table Name*/ ,-1 /*Display information for all pages of all indexes*/ );
An undocumented, but well-known DBCC IND command returns the information about table page allocations. You can see the output of this command in Figure 1-7.
Figure 1-7. DBCC IND output There are two pages that belong to the table. The first one with PageType=10 is a special type of the page called an IAM allocation map. This page tracks the pages that belong to a particular object. Do not focus on that now, however, as we will cover allocation map pages later in the chapter.
■■Note SQL Server 2012 introduces another undocumented data management function (DMF), sys.dm_db_database_ page_allocations, which can be used as a replacement for the DBCC IND command. The output of this DMF provides more information when compared to DBCC IND, and it can be joined with other system DMVs and/or catalog views. The page with PageType=1 is the actual data page that contains the data rows. The PageFID and PagePID columns show the actual file and page numbers for the page. You can use another undocumented command, DBCC PAGE, to examine its contents, as shown in Listing 1-5. Listing 1-5. The data row format: DBCC PAGE call -- Redirecting DBCC PAGE output to console dbcc traceon(3604) dbcc page
Listing 1-6 shows the output of the DBCC PAGE that corresponds to the first data row. SQL Server stores the data in byte-swapped order. For example, a two-byte value of 0001 would be stored as 0100. Listing 1-6. DBCC PAGE output for the first row Slot 0 Offset 0x60 Length 39
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 39 Memory Dump @0x000000000EABA060
Let's look at the data row in more detail, as shown in Figure 1-8.
Figure 1-8. First data row As you see, the row starts with the two status bits followed by a two-byte value of 0800. This is the byte-swapped value of 0008, which is the offset for the number of columns attribute in the row. This offset tells SQL Server where the fixed-length data part of the row ends.
Chapter 1 ■ Data Storage Internals
The next four bytes are used to store fixed-length data, which is the ID column in our case. After that, there is the two-byte value that shows that the data row has four columns followed by a one-byte NULL bitmap. With just four columns, one byte in the bitmap is enough. It stores the value of 04, which is 00000100 in the binary format. It indicates that the third column in the row contains a NULL value. The next two bytes store the number of variable-length columns in the row, which is 3 (0300 in byte-swapped order). It follows by offset array, in which each two bytes stores the offset where variable-length column data ends. As you see, even though Col2 is NULL, it still uses the slot in the offset-array. Finally, there is the actual data from the variable-length columns. Now let's look at the second data row. Listing 1-7 shows DBCC PAGE output, and Figure 1-9 shows the row data. Listing 1-7. DBCC PAGE output for the second row Slot 1 Offset 0x87 Length 27
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 27 Memory Dump @0x000000000EABA087
Figure 1-9. Second data row data The NULL bitmap in the second row represents a binary value of 00001010, which shows that Col1 and Col3 are NULL. Even though the table has three variable-length columns, the number of variable-length columns in the row indicates that there are just two columns/slots in the offset-array. SQL Server does not maintain the information about the trailing NULL variable-length columns in the row.
Chapter 1 ■ Data Storage Internals
■■Tip You can reduce the size of the data row by creating tables in the manner in which variable-length columns, which usually store null values, are defined as the last ones in CREATE TABLE statement. This is the only case when the order of columns in the CREATE TABLE statement matters. The fixed-length data and internal attributes must fit into the 8,060 bytes available on the single data page. SQL Server does not let you create the table when this is not the case. For example, the code in Listing 1-8 produces an error. Listing 1-8. Creating a table with a data row size that exceeds 8060 bytes create table dbo.BadTable ( Col1 char(4000), Col2 char(4060) ) ;
Msg 1701, Level 16, State 1, Line 1 Creating or altering table 'BadTable' failed because the minimum row size would be 8067, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.
Large Objects Storage Even though the fixed-length data and the internal attributes of a row must fit into a single page, SQL Server can store the variable-length data on different data pages. There are two different ways to store the data, depending on the data type and length.
Row-Overflow Storage SQL Server stores variable-length column data, which does not exceed 8,000 bytes, on special pages called row-overflow pages. Let's create a table and populate it with the data shown in Listing 1-9. Listing 1-9. ROW_OVERFLOW data: Creating a table create table dbo.RowOverflow ( ID int not null, Col1 varchar(8000) null, Col2 varchar(8000) null );
insert into dbo.RowOverflow(ID, Col1, Col2) values (1,replicate('a',8000),replicate('b',8000));
As you see, SQL Server creates the table and inserts the data row without any errors, even though the data row size exceeds 8,060 bytes. Let's look at the table page allocation using the DBCC IND command. The results are shown in Figure 1-10.
Chapter 1 ■ Data Storage Internals
Figure 1-10. ROW_OVERFLOW data: DBCC IND results Now you can see two different sets of IAM and data pages. The data page with PageType=3 represents the data page that stores ROW_OVERFLOW data. Let's look at data page 214647, which is the in-row data page that stores main row data. The partial output of the DBCC PAGE command for the page (1:214647) is shown in Listing 1-10. Listing 1-10. ROW_OVERFLOW data: DBCC PAGE results for IN_ROW data Slot 0 Offset 0x60 Length 8041
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS Record Size = 8041 Memory Dump @0x000000000FB7A060
As you see, SQL Server stores Col1 data in-row. Col2 data, however, has been replaced with a 24-byte value. The first 16 bytes are used to store off-row storage metadata, such as type, length of the data, and a few other attributes. The last 8 bytes is the actual pointer to the row on the row-overflow page, which is the file, page, and slot number. Figure 1-11 shows this in detail. Remember that all information is stored in byte-swapped order.
Figure 1-11. ROW_OVERFLOW data: Row-overflow page pointer structure As you see, the slot number is 0, file number is 1, and page number is the hexadecimal value 0x00034675, which is decimal 214645. The page number matches the DBCC IND results shown in Figure 1-10. The partial output of the DBCC PAGE command for the page (1:214645) is shown in Listing 1-11.
LOB Storage For the text, ntext, or image columns, SQL Server stores the data off-row by default. It uses another kind of page called LOB data pages.
■■Note You can control this behavior up to a degree by using the “text in row” table option. For example, exec sp_ table_option dbo.MyTable, 'text in row', 200 forces SQL Server to store LOB data less than or equal to 200 bytes in-row. LOB data greater than 200 bytes would be stored in LOB pages. The logical LOB data structure is shown in Figure 1-12.
Figure 1-12. LOB data: Logical structure Like ROW_OVERFLOW data, there is a pointer to another piece of information called the LOB root structure, which contains a set of the pointers to other data pages/rows. When LOB data is less than 32 KB and can fit into five data pages, the LOB root structure contains the pointers to the actual chunks of LOB data. Otherwise, the LOB tree starts to include an additional, intermediate levels of pointers, similar to the index B-Tree, which we will discuss in Chapter 2, “Tables and Indexes: Internal Structure and Access Methods.”
Chapter 1 ■ Data Storage Internals
Let's create the table and insert one row of data, as shown in Listing 1-12. We need to cast the first argument of the replicate function to varchar(max). Otherwise, the result of the replicate function would be limited to 8,000 bytes. Listing 1-12. LOB data: Table creation create table dbo.TextData ( ID int not null, Col1 text null );
insert into dbo.TextData(ID, Col1) values (1, replicate(convert(varchar(max),'a'),16000));
The page allocation for the table is shown in Figure 1-13.
Figure 1-13. LOB data: DBCC IND result As you see, the table has one data page for in-row data and three data pages for LOB data. I am not going to examine the structure of the data row for in-row allocation; it is similar to the ROW_OVERFLOW allocation. However, with the LOB allocation, it stores less metadata information in the pointer and uses 16 bytes rather than the 24 bytes required by the ROW_OVERFLOW pointer. The result of DBCC PAGE command for the page that stores the LOB root structure is shown in Listing 1-13. Listing 1-13. LOB data: DBCC PAGE results for the LOB page with the LOB root structure Blob row at: Page (1:3046835) Slot 0 Length: 84 Type: 5 (LARGE_ROOT_YUKON)
As you see, there are two pointers to the other pages with LOB data blocks, which are similar to the blob data shown in Listing 1-11. The format, in which SQL Server stores the data from the (MAX) columns, such as varchar(max), nvarchar(max), and varbinary(max), depends on the actual data size. SQL Server stores it in-row when possible. When in-row allocation is impossible, and data size is less or equal to 8,000 bytes, it stored as row-overflow data. The data that exceeds 8,000 bytes is stored as LOB data.
■■Note text, ntext, and image data types are deprecated, and they will be removed in future versions of SQL Server. Use varchar(max), nvarchar(max), and varbinary(max) columns instead. 16 www.it-ebooks.info
Chapter 1 ■ Data Storage Internals
It is also worth mentioning that SQL Server always stores rows that fit into a single page using in-row allocations. When a page does not have enough free space to accommodate a row, SQL Server allocates a new page and places the row there rather than placing it on the half-full page and moving some of the data to row-overflow pages.
SELECT * and I/O There are plenty of reasons why selecting all columns from a table with the select * operator is not a good idea. It increases network traffic by transmitting columns that the client application does not need. It also makes query performance tuning more complicated, and it introduces side effects when the table schema changes. It is recommended that you avoid such a pattern and explicitly specify the list of columns needed by the client application. This is especially important with row-overflow and LOB storage, when one row can have data stored in multiple data pages. SQL Server needs to read all of those pages, which can significantly decrease the performance of queries. As an example, let's assume that we have table dbo.Employees with one column storing employee pictures. The Listing 1-14 creates the table and populates it with some data. Listing 1-14. Select * and I/O: Table creation create table dbo.Employees ( EmployeeId int not null, Name varchar(128) not null, Picture varbinary(max) null );
;WITH N1(C) AS (SELECT 0 UNION ALL SELECT 0) -- 2 rows ,N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2) -- 4 rows ,N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2) -- 16 rows ,N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2) -- 256 rows ,N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N2 AS T2) -- 1,024 rows ,IDs(ID) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N5) insert into dbo.Employees(EmployeeId, Name, Picture) select ID, 'Employee ' + convert(varchar(5),ID), convert(varbinary(max),replicate(convert(varchar(max),'a'),120000)) from Ids;
The table has 1,024 rows with binary data amounting to 120,000 bytes. Let's assume that we have code in the client application that needs the EmployeeId and Name to populate a drop-down box. If a developer is not careful, he or she can write a select statement using the select * pattern, even though a picture is not needed for this particular use-case. Let's compare the performance of two selects; one selecting all data columns and another that selects only EmployeeId and Name. The code to do this is shown in Listing 1-15. The execution time and number of reads on my computer is shown in Table 1-1. Listing 1-15. Select * and I/O: Performance comparison set statistics io on set statistics time on
select * from dbo.Employees; select EmployeeId, Name from dbo.Employees;
set statistics io off set statistics time off
Chapter 1 ■ Data Storage Internals
Table 1-1. Select *: Number of reads and execution time of the queries
select EmployeeId, Name from dbo.Employee
select * from dbo.Employee
Number of reads
As you see, the first select, which reads the LOB data and transmits it to the client, is a few orders of magnitude slower than the second select. One case where this becomes extremely important is with client applications, which use Object Relational Mapping (ORM) frameworks. Developers tend to reuse the same entity objects in different parts of an application. As a result, an application may load all attributes/columns even though it does not need all of them in many cases. It is better to define different entities with a minimum set of required attributes on an individual usecase basis. In our example, it would work best to create separate entities/classes, such as EmployeeList and EmployeeProperties. An EmployeeList entity would have two attributes: EmployeeId and Name. EmployeeProperties would include a Picture attribute in addition to the two mentioned. This approach can significantly improve the performance of systems.
Extents and Allocation Map Pages SQL Server logically groups eight pages into 64KB units called extents. There are two types of extents available: Mixed extents store data that belongs to different objects. Uniform extents store the data for the same object. When a new object is created, SQL Server stores first eight object pages in mixed extents. After that, all subsequent space allocation for that object is done with uniform extents. SQL Server uses special kind of pages, called Allocation Maps, to track extent and page usage in a file. There are several different types of allocation maps pages in SQL Server. Global Allocation Map (GAM) pages track if extents have been allocated by any objects. The data is represented as bitmaps where each bit indicates the allocation status of an extent. Zero bits indicate that the corresponding extents are in use. The bits with a value of one indicate that the corresponding extents are free. Every GAM page covers about 64,000 extents, or almost 4GB of data. This means that every database file has one GAM page for about 4GB of file size. Shared Global Allocation Map (SGAM) pages track information about mixed extents. Similar to GAM pages, it is a bitmap with one bit per extent. The bit has a value of one if the corresponding extent is a mixed extent and has at least one free page available. Otherwise, the bit is set to zero. Like a GAM page, SGAM page tracks about 64,000 extents, or almost 4GB of data. SQL Server can determine the allocation status of the extent by looking at the corresponding bits in GAM and SGAM pages. Table 1-2 shows the possible combinations of the bits. Table 1-2. Allocation status of the extents
Free, not in use
Mixed extent with at least one free page available