Wednesday, January 5, 2011

Variable-Length vs. Fixed-Length

If there is any difficult step in designing a table, then one would say it is deciding whether to use a variable-length or a fixed-length data type.

People take it as a general rule, to use variable-length data types as the most appropriate when they expect fluctuations of the data for a column.

Using variable-length columns can sometimes improve the performance as it allows more rows to fit in a data page. In the last post, I have discussed the table size estimation with a sample customerinfo table, where I used the numbers 8096 and 8192, which are related to data page. A data page of SQL Server is of 8KB (8192 bytes), of which 8096 are available to store data. The rest is used for the internal purpose of keeping track of structural information of the page and object to which the page belongs.

One I/O operation brings back the entire page, that means fitting 160 rows on a page is efficient than 80 rows per page. To summarize, the more rows you can fit per page, the better the I/O and cache-hit efficiency is.

Example:

Fig: A, CustomerInfo_Fixed table with fixed-length columns

CustomerInfo_Fixed
ID
Smallint
Name
Char(50)
Address1
Char(50)
Address2
Char(50)
City
Char(50)
State
Char(2)
Postal Code
Char(10)
Phone
Char(20)
Fax
Char(20)
Email
Char(30)
Website
Char(100)

In the above fixed-length structured table (Fig: A), every row uses 384 bytes (Add up the bytes of data types of all columns) for data regardless of the number of characters actually inserted in the row. SQL Server also needs an additional 10 bytes of overhead for every row in this table. So, each row needs 394 bytes for storage.

Rows Per Page Calculation: Page Size/Size of each row8096/394 = 20 rows

Fig: B, CustomerInfo_Variable table with variable-length columns

CustomerInfo_Variable
ID
Smallint
Name
Varchar(50)
Address1
Varchar(50)
Address2
Varchar(50)
City
Varchar(50)
State
Char(2)
Postal Code
Varchar(10)
Phone
Varchar(20)
Fax
Varchar(20)
Email
Varchar(30)
Website
Varchar(100)

Now, the variable-length table (Fig: B), lets assume that for all the variable-length (varchar) columns the average entry is actually only about half the maximum. Instead of a row of length of 394 bytes, the average length is about 224 bytes. 
The calculation of 224 bytes is as below: 
1. Smallint + Char(2) = 4 bytes
2. Half of Maximum Varchar columns = 380/2 = 190 bytes
3. 2-byte overhead for each of 9 Varchar columns = 18 bytes
4. 2-bytes for any row that has one or more variable length columns= 2 bytes
5. Every row SQL Server overhead (as discussed above in fixed-length case) = 10 bytes
Finally, Total Bytes = 4+190+18+2+10 = 224 bytes

Rows Per Page Calculation: Page Size/Size of each row = 8096/224 = 36 rows


Conclusion: The difference in (rows per page) RPP of fixed-length (20) to RPP of variable-length (36), clearly illustrates which one is better to go with, CHAR? OR VARCHAR?. The table using variable-length columns will consume about half as many pages in storage, a single I/O retrieves almost twice as many rows.

3 comments:

  1. ASSUMING FOR 1000000 RECORDS

    IF

    "CHAR"

    SIZE OF THE ROW = 394 BYTES

    AS GIVEN NO OF ROWS = 20

    FOR 1000000 RECORDS

    = 1000000/20

    =50,000( # OF PAGES)

    NOW MULTIPLYING # OF PAGES WITH 8192

    =50,000 * 8192

    =409600000 bytes

    = 391 MB



    IF

    "VARCHAR"

    SIZE OF THE ROW = 224 BYTES

    AS GIVEN NO OF ROWS = 36

    FOR 1000000 RECORDS

    =1000000/36

    =27777.777 ~ 27778(# OF PAGES)

    NOW MULTYLIYING # OF PAGES WITH 8192

    =27778 *8192

    =227557376 bytes

    = 217MB


    SO 391 - 217 = 174 MB IS SAVED ON THE DISK. :-)

    ReplyDelete