Both the Table-Valued Function (TVF) and Inline Table-Valued Function (ITVF) are user defined functions that return table data type.
TVF and ITVF can be used to achieve the functionality of parameterized views. Which means a view does not support parameters in the search conditions specified in the WHERE clause. But user-defined functions do support parameters in the search conditions specified in the WHERE clause.
Inline Table-Valued Function:
To define TVF and ITVF you must follow their own set of rules. Below is a brief description about how an ITVF can be defined.
To define TVF and ITVF you must follow their own set of rules. Below is a brief description about how an ITVF can be defined.
1. The RETURNS clause contains only the keyword “table”. There is no need to define the format of a return variable, because it is set by the format of the result set of the SELECT statement in the RETURN clause.
2. There is no function body delimited by BEGIN and END.
3. The RETURN clause contains a single SELECT statement in parentheses.
4. It accepts only constants or @local_variable arguments.
It is always preferable to use ITVF over TVF whenever feasible to get the best possible performance which we will see later. Following are some critical disadvantages we observed while using TVF.
1. Cardinality estimate for TVF will be always wrong, as it produces a dynamic table at execution time thus no statistics.
2. TVF requires intermediate results which will go into tempdb.
3. Confusing execution plan and cost estimation.
4. Consumes more CPU because lack of statistics.
5. Causes more performance issues when used with LEFT JOIN.
Performance Comparison:
Here, we will be reviewing performance differences among TVF, ITVF and SELECT statement.
1. Create a new database.
2. Create a table and populate it with 100,000 records.
3. Create clustered and covering indexes.
4. Create a TVF which is “fn_myTVF” and create an ITVF which is “fn_myInlineTVF”.
5. Write an equivalent query for the above.
In our test we will clear Buffer cache in each run and will collect STATISTICS IO and TIME output.
Let’s get started:
1. Create a database “TestDB”
CREATE DATABASE TestDB
GO
USE TestDB
GO
2. Create a table “tblLarge” and populate it with 100,000 records.
IF OBJECT_ID('tblLarge') IS NOT NULL
DROP TABLE tblLarge
GO
CREATE TABLE tblLarge
(
xID INT IDENTITY(1, 1) ,
sName1 VARCHAR(100) ,
sName2 VARCHAR(1000) ,
sName3 VARCHAR(400) ,
sIdentifier CHAR(10) ,
dDOB DATETIME ,
nWage NUMERIC(20, 2) ,
sLicense VARCHAR(25)
)
GO
-- insert 100,000 dummy records
SET NOCOUNT ON
INSERT INTO tblLarge
( sName1 ,
sName2 ,
sName3 ,
sIdentifier ,
dDOB ,
nWage ,
sLicense
)
VALUES ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*50) , -- sName1
LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*60) , -- sName2
LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*70) , -- sName2
LEFT(CAST(NEWID() AS VARCHAR(36)), 2) , -- sIdentifier
DATEADD(dd, -RAND() * 20000, GETDATE()) , -- dDOB
( RAND() * 1000 ) , -- nWage
SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) -- sLicense
)
GO 100000
3. Create indexes:
-- create a clustered index
ALTER TABLE dbo.tblLarge ADD CONSTRAINT PK_tblLarge PRIMARY KEY CLUSTERED (xID)
GO
CREATE NONCLUSTERED INDEX IX_tblLarge_sIdentifier ON dbo.tblLarge
(sIdentifier)
INCLUDE (sName1, sName2, sName3)
WITH( STATISTICS_NORECOMPUTE = OFF)
GO
4. Create the Table-Valued Function (TVF) “fn_myTVF”:
CREATE FUNCTION dbo.fn_myTVF
(
@sIdentifier VARCHAR(10)
)
RETURNS @myTbl TABLE
(
xID INT NOT NULL ,
sName1 VARCHAR(100) ,
sName2 VARCHAR(1000) ,
sName3 VARCHAR(400)
)
AS
BEGIN
INSERT INTO @myTbl
SELECT xID ,
sName1 ,
sName2 ,
sName3
FROM tblLarge
WHERE sIdentifier = @sIdentifier
RETURN
END
GO
5. Create the Inline Table-Valued Function (ITVF) – “fn_myInlineTVF”:
CREATE FUNCTION dbo.fn_myInlineTVF
(
@sIdentifier VARCHAR(10)
)
RETURNS TABLE
AS
RETURN
( SELECT xID ,
sName1 ,
sName2 ,
sName3
FROM tblLarge
WHERE sIdentifier = @sIdentifier
)
6. Now, run the following query a few times and examine the Actual Execution Plan (CTRL+M)
SET STATISTICS IO ON
SET STATISTICS TIME ON
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SELECT a.xID, a.sName1, a.sName2, a.sName3 FROM dbo.fn_myTVF('AA') a
DBCC DROPCLEANBUFFERS
SELECT a.xID, a.sName1, a.sName2, a.sName3 FROM dbo.fn_myInlineTVF('AA') a
DBCC DROPCLEANBUFFERS
SELECT a.xID, a.sName1, a.sName2, a.sName3 FROM tblLarge a WHERE sIdentifier = 'AA'
Three execution plans:
It looks like the first Execution Plan which is “fn_myTVF” has better efficiency and is less expensive when compared with the other two!
Let’s hover over the mouse on the first node of the plan tree (SELECT Cost 0%), we will see that the estimated number of rows is 1 (one) and for which the estimated sub-tree cost is also low “0.0032863”, though in reality the query return 366 rows in our case.
Thus we can easily conclude that the cardinality estimation is completely wrong and so is the query cost.
STATISTICS TIME and IO output:
From this comparison, we can easily notice that the TVF version is a more expensive operation than the other two.
--- TVF
Table '#267ABA7A'. Scan count 1, logical reads 5, physical reads 0,
read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 179 ms.
-- ITVF
Table 'tblLarge'. Scan count 1, logical reads 8, physical reads 3,
read-ahead reads 5, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 134 ms.
--SELECT Statement
Table 'tblLarge'. Scan count 1, logical reads 8, physical reads 3,
read-ahead reads 5, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 140 ms.
Conclusion:
If it is possible, Inline table-valued function should be used to avoid hidden performance gotchas. Alternatively, the TVF functionality may be merged into the main Stored Procedure or in the SQL batch.
Sirji, Good job.
ReplyDelete