> > > Closure Table – Store Hierarchical Data Seamlessly | PostgreSQL

Closure Table – Store Hierarchical Data Seamlessly | PostgreSQL

Closure table is a simple and elegant way of storing and querying hierarchical data in any RDBMS. By hierarchical data we mean a set of data that has some parent – child relationship among them. We use the word ‘tree’ instead of hierarchies commonly. As an example we may take the relationships between geographic locations like ‘Countries’, ‘States/ Province’, ‘Districts/ Cities’ etc.

Closure Table Storing a Tree of locations

I lived in a district called ‘Malappuram’ which belongs to the state, Kerala. Kerala share its border with two other states – Tamilnadu and Karnataka and all these three states form a major part of  south India. Unfortunately I left my beautiful village and moved to Chennai, the Capital city of Tamilnadu because of my job requirements!

Now, we got some sample data which are simple enough to be represented as tree of locations. To make it even clear we represent it graphically as following image. (I have added few more places in image.)

Hierarchical Data - Closure Table
Location’s tree

Now we came back to the question, how to store these details using closure table? As the first step we store each of the locations in a table with three columns:

Hierarchical Data - Closure Table
locations table

Query:

INSERT INTO locations VALUES (1, 0, 'India');
INSERT INTO locations VALUES (2, 1, 'Kerala');
INSERT INTO locations VALUES (3, 1, 'Tamilnadu');
INSERT INTO locations VALUES (4, 1, 'Karnataka');
INSERT INTO locations VALUES (5, 2, 'Malappuram');
INSERT INTO locations VALUES (6, 2, 'Calicut');
INSERT INTO locations VALUES (7, 3, 'Chennai');
INSERT INTO locations VALUES (8, 3, 'Coimbatore');
INSERT INTO locations VALUES (9, 5, 'Kohinoor');
INSERT INTO locations VALUES (10, 7, 'T.nagar');
  1. The first column ‘id’ which is the primary key of the table, uniquely identifies each location.
  2. Second column ‘parent_id’ holds the id of the immediate parent of each location.
  3. Third column stores the actual name of location.

There is no parent defined for India as per our test data. So we consider India as the root and keep it’s parent_id as zero. In contrast the ‘locations’ table store each of the locations and also specify its parent. As we can see, it store only depth one relation. But we need one more table to store relations of any depth. We need to tell the db that ‘Malappuram’ is a ‘grandchild’ of India at depth 2, using exactly one row. For this we create the second table, ‘location_hierarchy’. It is the actual Closure Table we talked about so far. Closure table stores the transitive closure of all possible parent – child relation of the ‘locations’ table.

Let’s fill out the closure table with some data:

Hierarchical Data - Closure Table
closure table – locations_hierarchy

You might feel strange since I declared every locations as its own parent. (Still it is logically valid, if you think like: a location is its own parent at a depth of zero.) But it is incredibly useful for filling out rest of the relations.

Next we are going to tell our closure table that Kerala belongs to India. For this use the following query which involves a simple self join:

INSERT INTO location_hierarchy(parent_id, child_id, depth)
SELECT p.parent_id, c.child_id, p.depth+c.depth+1
FROM location_hierarchy p, location_hierarchy c
WHERE p.child_id = 1 AND c.parent_id = 2

In the WHERE clause we specified the immediate parent and child ids in p.child_id and c.parent_id respectively. Now we got one more row which represent the relation between Kerala and India:

india-kerala - closure table

Making Malappuram a child of Kerala and grand child of India:

INSERT INTO location_hierarchy(parent_id, child_id, depth)
SELECT p.parent_id, c.child_id, p.depth+c.depth+1
FROM location_hierarchy p, location_hierarchy c
WHERE p.child_id = 2 AND c.parent_id = 5

We can see that it will add two rows to represent each of two relations:

malappurma - keralaAdding entries for Kohinoor:

INSERT INTO location_hierarchy(parent_id, child_id, depth)
SELECT p.parent_id, c.child_id, p.depth+c.depth+1
FROM location_hierarchy p, location_hierarchy c
WHERE p.child_id = 5 AND c.parent_id = 9

KOhinoor - malappuram

Querying data from closure table

Getting the list of  parents of a location: let’s see how to get the list of parents of a location, say, T.nagar,

SELECT parent_id FROM location_hierarchy
WHERE child_id = 10 and depth != 0 ORDER BY depth DESC

this simple query will give the list of ids of India, Tamilnadu and Chennai. Joining it with locations table we get the real names of parent locations:

SELECT location_hierarchy.parent_id, location_name
FROM location_hierarchy JOIN locations
ON(location_hierarchy.parent_id = locations.id)
WHERE child_id = 10 AND depth!=0 ORDER BY depth DESC

tnagar parentsGetting the list of states of India: Getting the list of states of India is also simple than it seems, by just querying the closure table for locations whose parent is India at a depth of 1:

SELECT location_hierarchy.child_id, location_name
FROM location_hierarchy JOIN locations
ON(location_hierarchy.child_id = locations.id)
WHERE location_hierarchy.parent_id = 1 and depth = 1

list of states

Removing relations from closure table

To remove a parent child relationship from closure table simply use this query:

DELETE FROM location_hierarchy WHERE mapping_id IN(
SELECT  rel.mapping_id FROM location_hierarchy p,
location_hierarchy rel, location_hierarchy c
WHERE p.parent_id = rel.parent_id and c.child_id = rel.child_id
AND p.child_id = CHILD_ID AND c.parent_id = PARENT_ID)

Remember, this will remove the relation between two nodes(places) never the nodes. To remove a node and all of its child nodes we use the following query:

DELETE FROM location_hierarchy WHERE mapping_id IN(
SELECT rel.mapping_id FROM location_hierarchy p, location_hierarchy rel, location_hierarchy c, location_hierarchy delete
WHERE p.parent_id = rel.parent_id AND c.child_id = rel.child_id
AND p.child_id  = delete.parent_id AND c.parent_id = delete.child_id
AND (delete.parent_id = ID_TO_DELETE OR delete.child_id = ID_TO_DELETE)
AND delete.depth < 2 )

Using Triggers to automate Closure Entry Formation

Most of the DBMS systems like PostgreSQL, MySQL etc support Triggers which is fired when an INSERT, UPDATE, DELETE operation is performed. Triggers can be used to automate the creation/ deletion of closure entries whenever a new item is added to the main(location) table. We will see it in detail in another post!

To learn more about postgreSQL triggers, see:

 SQL Trigger – A complete Example in PostgreSQL

Recursive Queries to Eliminate depth Column

We can see from the location_hierarchy  closure  table that, as the depth increases, the number of entries to be inserted also increases. If you really want to decrease the size of the closure table, you can make use of Recursive query in PostgreSQL etc. It requires only depth one relations. But not all DBMS systems support recursive queries. We will see it in detail later!

References:
http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html

The following two tabs change content below.

Vipin Raj

Vipin Raj is a software developer specialized in PostgreSQL Database and Data Modeling, the man behind technobytz and an IoT and Security enthusiast. Having 3+ years of experience in the IT industry, he is currently pursuing his masters in computer science and information security. He spend his free time writing blog posts with the intension of sharing his knowledge to the tech community.

10 thoughts on “Closure Table – Store Hierarchical Data Seamlessly | PostgreSQL

Leave a Reply

Your email address will not be published. Required fields are marked *