Closure table is a simple and elegant way of storing and querying hierarchical data in any RDBMS. By hierarchical data we mean a set of data that has some parent – child relationship among them. We use the word ‘tree’ instead of hierarchies commonly. As an example we may take the relationships between geographic locations like ‘Countries’, ‘States/ Province’, ‘Districts/ Cities’ etc.
Closure Table Storing a Tree of locations
I lived in a district called ‘Malappuram’ which belongs to the state, Kerala. Kerala share its border with two other states – Tamilnadu and Karnataka and all these three states form a major part of south India. Unfortunately I left my beautiful village and moved to Chennai, the Capital city of Tamilnadu because of my job requirements!
Now, we got some sample data which are simple enough to be represented as tree of locations. To make it even clear we represent it graphically as following image. (I have added few more places in image.)
Now we came back to the question, how to store these details using closure table? As the first step we store each of the locations in a table with three columns:
Query:
INSERT INTO locations VALUES (1, 0, 'India'); INSERT INTO locations VALUES (2, 1, 'Kerala'); INSERT INTO locations VALUES (3, 1, 'Tamilnadu'); INSERT INTO locations VALUES (4, 1, 'Karnataka'); INSERT INTO locations VALUES (5, 2, 'Malappuram'); INSERT INTO locations VALUES (6, 2, 'Calicut'); INSERT INTO locations VALUES (7, 3, 'Chennai'); INSERT INTO locations VALUES (8, 3, 'Coimbatore'); INSERT INTO locations VALUES (9, 5, 'Kohinoor'); INSERT INTO locations VALUES (10, 7, 'T.nagar');
- The first column ‘id’ which is the primary key of the table, uniquely identifies each location.
- Second column ‘parent_id’ holds the id of the immediate parent of each location.
- Third column stores the actual name of location.
There is no parent defined for India as per our test data. So we consider India as the root and keep it’s parent_id as zero. In contrast the ‘locations’ table store each of the locations and also specify its parent. As we can see, it store only depth one relation. But we need one more table to store relations of any depth. We need to tell the db that ‘Malappuram’ is a ‘grandchild’ of India at depth 2, using exactly one row. For this we create the second table, ‘location_hierarchy’. It is the actual Closure Table we talked about so far. Closure table stores the transitive closure of all possible parent – child relation of the ‘locations’ table.
Let’s fill out the closure table with some data:
You might feel strange since I declared every locations as its own parent. (Still it is logically valid, if you think like: a location is its own parent at a depth of zero.) But it is incredibly useful for filling out rest of the relations.
Next we are going to tell our closure table that Kerala belongs to India. For this use the following query which involves a simple self join:
INSERT INTO location_hierarchy(parent_id, child_id, depth) SELECT p.parent_id, c.child_id, p.depth+c.depth+1 FROM location_hierarchy p, location_hierarchy c WHERE p.child_id = 1 AND c.parent_id = 2
In the WHERE clause we specified the immediate parent and child ids in p.child_id and c.parent_id respectively. Now we got one more row which represent the relation between Kerala and India:
Making Malappuram a child of Kerala and grand child of India:
INSERT INTO location_hierarchy(parent_id, child_id, depth) SELECT p.parent_id, c.child_id, p.depth+c.depth+1 FROM location_hierarchy p, location_hierarchy c WHERE p.child_id = 2 AND c.parent_id = 5
We can see that it will add two rows to represent each of two relations:
Adding entries for Kohinoor:
INSERT INTO location_hierarchy(parent_id, child_id, depth) SELECT p.parent_id, c.child_id, p.depth+c.depth+1 FROM location_hierarchy p, location_hierarchy c WHERE p.child_id = 5 AND c.parent_id = 9
Querying data from closure table
Getting the list of parents of a location: let’s see how to get the list of parents of a location, say, T.nagar,
SELECT parent_id FROM location_hierarchy WHERE child_id = 10 and depth != 0 ORDER BY depth DESC
this simple query will give the list of ids of India, Tamilnadu and Chennai. Joining it with locations table we get the real names of parent locations:
SELECT location_hierarchy.parent_id, location_name FROM location_hierarchy JOIN locations ON(location_hierarchy.parent_id = locations.id) WHERE child_id = 10 AND depth!=0 ORDER BY depth DESC
Getting the list of states of India: Getting the list of states of India is also simple than it seems, by just querying the closure table for locations whose parent is India at a depth of 1:
SELECT location_hierarchy.child_id, location_name FROM location_hierarchy JOIN locations ON(location_hierarchy.child_id = locations.id) WHERE location_hierarchy.parent_id = 1 and depth = 1
Removing relations from closure table
To remove a parent child relationship from closure table simply use this query:
DELETE FROM location_hierarchy WHERE mapping_id IN( SELECT rel.mapping_id FROM location_hierarchy p, location_hierarchy rel, location_hierarchy c WHERE p.parent_id = rel.parent_id and c.child_id = rel.child_id AND p.child_id = CHILD_ID AND c.parent_id = PARENT_ID)
Remember, this will remove the relation between two nodes(places) never the nodes. To remove a node and all of its child nodes we use the following query:
DELETE FROM location_hierarchy WHERE mapping_id IN( SELECT rel.mapping_id FROM location_hierarchy p, location_hierarchy rel, location_hierarchy c, location_hierarchy delete WHERE p.parent_id = rel.parent_id AND c.child_id = rel.child_id AND p.child_id = delete.parent_id AND c.parent_id = delete.child_id AND (delete.parent_id = ID_TO_DELETE OR delete.child_id = ID_TO_DELETE) AND delete.depth < 2 )
Using Triggers to automate Closure Entry Formation
Most of the DBMS systems like PostgreSQL, MySQL etc support Triggers which is fired when an INSERT, UPDATE, DELETE operation is performed. Triggers can be used to automate the creation/ deletion of closure entries whenever a new item is added to the main(location) table. We will see it in detail in another post!
To learn more about postgreSQL triggers, see:
SQL Trigger – A complete Example in PostgreSQL
Recursive Queries to Eliminate depth Column
We can see from the location_hierarchy closure table that, as the depth increases, the number of entries to be inserted also increases. If you really want to decrease the size of the closure table, you can make use of Recursive query in PostgreSQL etc. It requires only depth one relations. But not all DBMS systems support recursive queries. We will see it in detail later!
References:
http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
12 comments
Hi Raj,
great thing – thanks for sharing 🙂
Since … wow, at last v. 8.4 PG has tablefunc extension which has great connectby function defined ( ).
This allows you to produce closure table, and even more: the path (branch) for each level. If you decide to produce branch in ltree style (), then you can run very fast questies against the tree (ltree) to find whatever you want (all sub trees, root of the leaf, etc…). Ltree column could be indexed (using GIST) to speedup the queries.
regards,
Bartek
Wow! This is a new information for me. I will surely try it out. Thank you.
Something happend with my HTML tags – sorry for producing odd text.
Regards,
Bartek
Hello There. I found your blog using msn. This is a very well written article.
I’ll make sure to bookmark it and return to read more of your useful info.
Thanks for the post. I will definitely return.
Also visit my homepage; search engine optimization, google.com,
Greetings from Russia.)
Thank you for this guides, especialy for recursive guide.
I don’t know more understandable blog about PostgreSQL basics in english.
[…] Closure Table – Store Hierarchical Data Seamlessly | PostgreSQL – […]
Very helpful – thanks for posting this!!
Hello, closure table is very nice structure for use. But now I want to display the Whole India hierarchy in a drop down menu same as window. Is it possible using this? I think I have to write multiple queries for displaying it[1st query to fetch India child then 2nd query to fetch kerela child and so on…]. Can you please suggest.
+India
+ kerela
+Mallapuram
Kohinoor
+Calicut
+karnataka
+tamilNadu
+chennai
Well, that depends on the data format used by the plugin you used for dropdown. There are two approaches to it:
1. Construct the whole tree in a stretch by querying whole table.
2. Show only the root nodes(India) then allow users to drill down.
In the second case, you will have to write separate queries.
[…] http://technobytz.com/closure_table_store_hierarchical_data.html […]
sorry delete query don`t work
your query to populate location_hierarchy doesn’t seem right, should “location” instead of “location_hierarchy” be used? But if location is used, where would you source depth?