People take it as a general rule, to use variable-length data types as the most appropriate when they expect fluctuations of the data for a column.
Using variable-length columns can sometimes improve the performance as it allows more rows to fit in a data page. In the last post, I have discussed the table size estimation with a sample customerinfo table, where I used the numbers 8096 and 8192, which are related to data page. A data page of SQL Server is of 8KB (8192 bytes), of which 8096 are available to store data. The rest is used for the internal purpose of keeping track of structural information of the page and object to which the page belongs.
One I/O operation brings back the entire page, that means fitting 160 rows on a page is efficient than 80 rows per page. To summarize, the more rows you can fit per page, the better the I/O and cache-hit efficiency is.
Example:
Fig: A, CustomerInfo_Fixed table with fixed-length columns
CustomerInfo_Fixed | |
ID | Smallint |
Name | Char(50) |
Address1 | Char(50) |
Address2 | Char(50) |
City | Char(50) |
State | Char(2) |
Postal Code | Char(10) |
Phone | Char(20) |
Fax | Char(20) |
Email | Char(30) |
Website | Char(100) |
In the above fixed-length structured table (Fig: A), every row uses 384 bytes (Add up the bytes of data types of all columns) for data regardless of the number of characters actually inserted in the row. SQL Server also needs an additional 10 bytes of overhead for every row in this table. So, each row needs 394 bytes for storage.
Rows Per Page Calculation: Page Size/Size of each row = 8096/394 = 20 rows
Fig: B, CustomerInfo_Variable table with variable-length columns
CustomerInfo_Variable | |
ID | Smallint |
Name | Varchar(50) |
Address1 | Varchar(50) |
Address2 | Varchar(50) |
City | Varchar(50) |
State | Char(2) |
Postal Code | Varchar(10) |
Phone | Varchar(20) |
Fax | Varchar(20) |
Email | Varchar(30) |
Website | Varchar(100) |
Now, the variable-length table (Fig: B), lets assume that for all the variable-length (varchar) columns the average entry is actually only about half the maximum. Instead of a row of length of 394 bytes, the average length is about 224 bytes.
The calculation of 224 bytes is as below:
1. Smallint + Char(2) = 4 bytes
1. Smallint + Char(2) = 4 bytes
2. Half of Maximum Varchar columns = 380/2 = 190 bytes
3. 2-byte overhead for each of 9 Varchar columns = 18 bytes
4. 2-bytes for any row that has one or more variable length columns= 2 bytes
5. Every row SQL Server overhead (as discussed above in fixed-length case) = 10 bytes
Finally, Total Bytes = 4+190+18+2+10 = 224 bytes
Rows Per Page Calculation: Page Size/Size of each row = 8096/224 = 36 rows
Conclusion: The difference in (rows per page) RPP of fixed-length (20) to RPP of variable-length (36), clearly illustrates which one is better to go with, CHAR? OR VARCHAR?. The table using variable-length columns will consume about half as many pages in storage, a single I/O retrieves almost twice as many rows.
ASSUMING FOR 1000000 RECORDS
ReplyDeleteIF
"CHAR"
SIZE OF THE ROW = 394 BYTES
AS GIVEN NO OF ROWS = 20
FOR 1000000 RECORDS
= 1000000/20
=50,000( # OF PAGES)
NOW MULTIPLYING # OF PAGES WITH 8192
=50,000 * 8192
=409600000 bytes
= 391 MB
IF
"VARCHAR"
SIZE OF THE ROW = 224 BYTES
AS GIVEN NO OF ROWS = 36
FOR 1000000 RECORDS
=1000000/36
=27777.777 ~ 27778(# OF PAGES)
NOW MULTYLIYING # OF PAGES WITH 8192
=27778 *8192
=227557376 bytes
= 217MB
SO 391 - 217 = 174 MB IS SAVED ON THE DISK. :-)
Good One Anil!!
ReplyDeleteAwesome dudes...
ReplyDeleteKeep Posting........