Working with hierarchical data in SQL Server databases

Working with hierarchical data in SQL Server databases
Note: Information & code samples from this article are tested on SQL Server 2005 RTM (Yukon) and found to be working. Will update the article in case of any compatibility issues.

In this article, I will show you a simple example that traverses a hierarchy and displays the output in indented format. I will also provide you with links and pointers for more advanced and in-depth information on processing hierarchies.

Representing data in the form of a hierarchy or a tree is a common requirement. The most common example I can think of, is an organizational chart that contains all the employees in a given organization and each employee is linked to his/her manager by his/her manager's ID. Since managers are also employees, their details also get stored in the same employees table.

A few more examples:

- Implementation of a permissions hierarchy.

- Implementation of a product catalog for an online store.

All the above examples need data to be stored in the form of hierarchies or trees (also loosely referred to as parent child relationships). But SQL Server is a relational database, not a hierarchical database. So, we have to store the data in normalized, relational tables, but come up with programming techniques to process this data in a hierarchical manner.

Unfortunately, there is no built-in support for processing hierarchies and tree structures in SQL Server's T-SQL implementation. (Oracle's PL/SQL implementation has a CONNECT BY syntax that makes it easier to work with hierarchical data. Let's hope we will see something similar in the next version of SQL Server, Yukon).

C programmers use recursive programming techniques for traversing tree structures. The same can be implemented in T-SQL using recursive stored procedure calls. I will demonstrate this in the following example.

Consider the employee table of an organization, that stores all the employee records. Each employee is linked to his/her manager by a manger ID. This table can be represented using the following CREATE TABLE statement:


CREATE TABLE dbo.Emp
(
 EmpID  int PRIMARY KEY,
 EmpName  varchar(30),
 MgrID  int FOREIGN KEY REFERENCES Emp(EmpID)
)
GO

Notice that, EmpID is decalred as a primary key, and the MgrID column is declared as a foreign key constraint, that references the EmpID column of the same table, that is, a self referencing table. This is so, because all employees and managers are stored in the same table.

Since EmpID is declared as a primary key, by default it will be implemented as a unique clustered index. Now, let's create a non-clustered index on MgrID column, to improve the query performance:

CREATE NONCLUSTERED INDEX NC_NU_Emp_MgrID ON dbo.Emp(MgrID)
GO

Let's populate the employee table with some data:

INSERT dbo.Emp SELECT 1, 'President', NULL
INSERT dbo.Emp SELECT 2, 'Vice President', 1
INSERT dbo.Emp SELECT 3, 'CEO', 2
INSERT dbo.Emp SELECT 4, 'CTO', 2
INSERT dbo.Emp SELECT 5, 'Group Project Manager', 4
INSERT dbo.Emp SELECT 6, 'Project Manager 1', 5
INSERT dbo.Emp SELECT 7, 'Project Manager 2', 5
INSERT dbo.Emp SELECT 8, 'Team Leader 1', 6
INSERT dbo.Emp SELECT 9, 'Software Engineer 1', 8
INSERT dbo.Emp SELECT 10, 'Software Engineer 2', 8
INSERT dbo.Emp SELECT 11, 'Test Lead 1', 6
INSERT dbo.Emp SELECT 12, 'Tester 1', 11
INSERT dbo.Emp SELECT 13, 'Tester 2', 11
INSERT dbo.Emp SELECT 14, 'Team Leader 2', 7
INSERT dbo.Emp SELECT 15, 'Software Engineer 3', 14
INSERT dbo.Emp SELECT 16, 'Software Engineer 4', 14
INSERT dbo.Emp SELECT 17, 'Test Lead 2', 7
INSERT dbo.Emp SELECT 18, 'Tester 3', 17
INSERT dbo.Emp SELECT 19, 'Tester 4', 17
INSERT dbo.Emp SELECT 20, 'Tester 5', 17
GO

Notice that the MgrID of President is NULL, that means, he is the super boss and has nobody managing him. As you can see, rest of the employees are linked to their respective managers using the MgrID column.

Now, let's create a stored procedure, that traverses this employee hierarchy recursively and displays the employees in the form of an indented tree structure. The following stored procedure has an input parameter named Root, that takes the ID of the employee (equivalent to the ID of a node in a tree) and displays all the employees managed by him and his sub-ordinates.

CREATE PROC dbo.ShowHierarchy
(
 @Root int
)
AS
BEGIN
 SET NOCOUNT ON
 DECLARE @EmpID int, @EmpName varchar(30)

 SET @EmpName = (SELECT EmpName FROM dbo.Emp WHERE EmpID = @Root)
 PRINT REPLICATE('-', @@NESTLEVEL * 4) + @EmpName

 SET @EmpID = (SELECT MIN(EmpID) FROM dbo.Emp WHERE MgrID = @Root)

 WHILE @EmpID IS NOT NULL
 BEGIN
  EXEC dbo.ShowHierarchy @EmpID
  SET @EmpID = (SELECT MIN(EmpID) FROM dbo.Emp WHERE MgrID = @Root AND EmpID > @EmpID)
 END
END
GO

While creating the above stored procedure, you will receive the following warning:

Cannot add rows to sysdepends for the current stored procedure because it depends on the missing object 'ShowHierarchy'. The stored procedure will still be created.

Nothing to worry about. It's a just a warning, as SQL Server is a little confused about the recursive stored procedures.

As you can see, the above stored procedure calls itself recursively. You should be aware of a limitation imposed by SQL Server though. A stored procedure can nest itself upto a maximum of 32 levels. If you exceed this limit, you will receive the following error:

Server: Msg 217, Level 16, State 1, Procedure sdss, Line 1
Maximum stored procedure nesting level exceeded (limit 32).

Now let's run this procedure by passing the IDs of different employees and look at the output:
--Complete hierarchy

EXEC dbo.ShowHierarchy 1
GO


---President
------Vice President
---------CEO
---------CTO
------------Group Project Manager
---------------Project Manager 1
------------------Team Leader 1
---------------------Software Engineer 1
---------------------Software Engineer 2
------------------Test Lead 1
---------------------Tester 1
---------------------Tester 2
---------------Project Manager 2
------------------Team Leader 2
---------------------Software Engineer 3
---------------------Software Engineer 4
------------------Test Lead 2
---------------------Tester 3
---------------------Tester 4
---------------------Tester 5



--From Group Project Manager onwards

EXEC dbo.ShowHierarchy 5
GO


---Group Project Manager
------Project Manager 1
---------Team Leader 1
------------Software Engineer 1
------------Software Engineer 2
---------Test Lead 1
------------Tester 1
------------Tester 2
------Project Manager 2
---------Team Leader 2
------------Software Engineer 3
------------Software Engineer 4
---------Test Lead 2
------------Tester 3
------------Tester 4
------------Tester 5



--From Project Manager 1 onwards

EXEC dbo.ShowHierarchy 6
GO


---Project Manager 1
------Team Leader 1
---------Software Engineer 1
---------Software Engineer 2
------Test Lead 1
---------Tester 1
---------Tester 2

The above is just a simple example of traversing hierarchies. There's more to hierarchies than just traversing, that is, adding to, deleting from and modifying hierarchies. The following links deal with hierarchies in greater detail and propose different methodologies for managing hierarchies efficiently.