Friday, June 1, 2012

UPSERT Statement in T-SQL

The MERGE statement is also referred as the UPSERT statement. It allows us to merge multiple rows from one table into another table. Illustrating MERGE Statement with an example using step by step process and with multiple executions. We shall see SCD Type 1 Implementation here.

Note: If you are looking for MERGE using SCD Type 2 Implemenation, click here.

          MERGE is about comparing a source and destination table and depending upon the changes, new records get inserted and existing get updated.

Basic Syntax:
MERGE [INTO] Target_Table
   USING Source_Table
   ON JOIN_Condition
[WHEN MATCHED [AND Search_Conditions]
   THEN DML_Statement]
[WHEN NOT MATCHED
   THEN DML_Statement]

Implementation:

Source Table: DEPT

CREATE TABLE [dbo].[DEPT](
            [DEPTNO] [numeric](2, 0) NOT NULL,
            [DNAME] [varchar](25) NULL,
            [LOC] [varchar](25) NULL
)

Target Table: DEPT_MERGE

CREATE TABLE [dbo].[DEPT_MERGE](
            [DEPTNO] [numeric](2, 0) NOT NULL,
            [DNAME] [varchar](25) NULL,
            [LOC] [varchar](25) NULL,
            [START_DATE] [datetime] NULL,
            [END_DATE] [datetime] NULL
)

INSERT data into DEPT:

INSERT [dbo].[DEPT] ([DEPTNO], [DNAME], [LOC]) VALUES (10, N'ACCOUNTING', N'NEW YORK')
INSERT [dbo].[DEPT] ([DEPTNO], [DNAME], [LOC]) VALUES (20, N'RESEARCH', N'MIAMI')
INSERT [dbo].[DEPT] ([DEPTNO], [DNAME], [LOC]) VALUES (30, N'SALES', N'CHICAGO')
INSERT [dbo].[DEPT] ([DEPTNO], [DNAME], [LOC]) VALUES (40, N'OPERATIONS', N'BOSTON')


Screen Shot:



MERGE Statement Code:
/************************************* MERGE STATEMENT ***********************************/
MERGE DEPT_MERGE AS D
            USING DEPT AS S
            ON D.DEPTNO = S.DEPTNO
WHEN MATCHED AND (S.DNAME <> D.DNAME OR S.LOC <> D.LOC)
THEN
            UPDATE SET
            D.DNAME = S.DNAME,
            D.LOC = S.LOC,
            D.END_DATE = GETDATE()
WHEN NOT MATCHED
THEN
            INSERT (DEPTNO, DNAME, LOC, [START_DATE])
            VALUES (S.DEPTNO, S.DNAME, S.LOC, GETDATE());

Run-1 Execution: 4 Rows affected.





















Run-1 Results: All rows from source into the destination along with start dates.























Further Observation:
To observe how it behaves for the second execution, lets make some changes in the source data. A new record is inserted and an existing record is updated in the source table as below.


Run-2 Execution:  2 rows affected. (1 new row inserted and 1 row updated).






















Run-2 Results: Observe the data changes.



























Summary: This is identical to Slowly Changing Dimension of SSIS. If you want to handle the data changes in T-SQL then this is most efficient way. Hope this helps. Happy T-SQLing. :)


No comments:

Post a Comment