The
Self-Join technique is commonly used to create a hierarchical tree set or
finding duplicate records. It is also used to find previous or next values from
a table by utilizing the inline table technique. Generally a self-join is a join in which a table is joined with
itself. For example, when we need details about an employee and his manager where
both employee and manager ID resides in the same table.
When we use self-join, it usually indicates that
the table is not normalized. This may not be a problem for a small table but
for a large and busy OLTP table with higher concurrency, this may lead to poor
performance and degrade query response time.
The key point is that each self-joined table will
be treated as a separate table while retrieving data from the disk, thus self-join
incurs more I/O activities and increases locking overhead as it touches the same
table twice or more. By adding appropriate indexes on JOIN and WHERE clause
columns, it will reduce I/O activities and improve performance. It will be a
good approach to avoid self-join whenever possible for heavy duty OLTP tables
and queries.
Let’s perform a simple self-join test to see how
the execution plan looks like. We can observe that as the retrieval number of
records increases, each query behaves differently, such as requiring specific index
and introducing parallel query.
Queries:
-- self-join one
select top (10) a.xid, a.sIdentifier
from tbllarge a
inner join tbllarge b on a.xid = b.xid
where a.sIdentifier ='A1'
-- self-join two
select top (100) a.xid, a.sIdentifier
from tbllarge a
inner join tbllarge b on a.xid = b.xid
where a.sIdentifier ='A1'
-- self-join three
select top (1000) a.xid, a.sIdentifier
from tbllarge a
inner join tbllarge b on a.xid = b.xid
where a.sIdentifier ='A1'
Short and Informative..Excellent Job...!!!!!!
ReplyDeleteVery interesting, haven't thought much of the self join from a performance perspective, so good insight! A nice reference here also: sql self join
ReplyDelete