This time we are going to have look at how RECURSIVE WITH queries works in PostgreSQL. Instead of going into detail about the implementation, let’s begin with the problem statement.
Here, we have a tree of locations which we stored in the location_hierarchy table as below:
CREATE TABLE location_hierarchy ( parent character varying NOT NULL, child character varying NOT NULL, CONSTRAINT location_hierarchy_pkey PRIMARY KEY (parent, child) ) WITH ( OIDS=FALSE );
Our aim here is to get all the children of ‘India’ along with it’s depth relative to “India”. The result should look somewhat similar to this:
A normal SQL query is not enough for forming this result. You might thing of a giant self JOIN query, but that also fails since the depth here is arbitrary and may change in future.
WITH RECURSIVE Query in PostgreSQL
So, for this we use a special query called WITH RECURSIVE. People often call it PostgreSQL hierarchical query since it is usually used to query on hierarchical data, like the one which we are discussing. The WITH RECURSIVE, actually is an extension of WITH query which is referred to as Common Table Expressions(CTE) in PostgreSQL. WITH query can be seen as forming a temporary table(s) which has a scope for a single query or as a named sub-query.
So given below is the solution for our problem with RECURSIVE query. We will examine this query further, step by step.
WITH RECURSIVE children AS ( SELECT child, 1 AS depth ---|Non FROM location_hierarchy --|Recursive WHERE parent = 'india' ---|Part UNION ALL SELECT a.child, depth+1 ---|Recursive FROM location_hierarchy a --|Part JOIN children b ON(a.parent = b.child) ---| ) SELECT * FROM children
1. The query execution begin with the non-recursive part. And the result obtained is stored in two places:
- In the recursive query result
- In a temporary table called ‘working table’
So, after executing non-recursive part, the two tables will look like this:
(Note that both of these tables are in memory and they doesn’t exist as physical tables )
2. After executing recursive part and appending it’s result with old query result using UNION LL, we will have:
Note that, in the recursive query when we refer the “children” the records from the working table is pulled out, and the join is made b/w this temporary table and our location_hierarchy table. And the result of this join is appended to the query result as shown above.
After executing the recursive part the working table is updated with the new set of result as follows:
2. In the next iteration, just like in previous step, the working table shown above serve as the “children” table and is again joined with the location_hierarchy to form the following results:
3. In the next iteration we get null result because joining ‘work table’ with location_hierarchy table doesn’t yield any records. So, no new record is added to the query result and the working table becomes empty so that the WITH part of the query ends here.
So, the end condition for a WITH RECURSIVE query is ‘getting an empty working table’. This is an important point to be noticed, because, if you write a query in a way that the working table will never become empty, you end up with a recursive query which will run forever!
4. Obviously, in the next step, the select statement, SELECT * FROM children is executed, and this time “children” refers to the Query result table and all the results are pulled out. Thus we got the desired result!
We can have more than one WITH sections in a single query, separated by comma. For instance, we can add one more section, children_of_kerala, which gives the children of kerala. In the final SELECT query, we can reference all these temporary tables along with real tables, views etc.
Even, it is not required that we should write a SELECT query over there, we can write any valid DML, like INSERT, UPDATE, DELETE, etc.
Usage of UNION ALL in recursive query is not accidental, it have a considerable performance benefit over UNION. Because, if used, UNION will eliminate all duplicate elements from the result set, that means, in each iteration postgres will take additional efforts to find and eliminate duplicates. This will surely impact the performance, so usage of UNION is not preferred unless you have a real reason to use it.
How to release all data by the level that parent before child.
Sorry, can you explain a little further ?
How to select all data by parent ,child,child of child by hierarchical query.
I am not clear about your requirement. Please provide more information.
You can add this to the end of the query :
UNION SELECT child, 0 AS depth
WHERE parent = 'india'
ORDER BY depth
The entier query will look like this :
WITH RECURSIVE children AS (
SELECT child, 1 AS depth
WHERE parent = 'india'
SELECT a.child, depth+1
FROM location_hierarchy a
JOIN children b ON(a.parent = b.child)
SELECT * FROM children
SELECT child, 0 AS depth
WHERE parent = 'india'
ORDER BY depth
Thanks – your explanation of the RECURSIVE WITH statement is the clearest I’ve found and helped me understand the concept.
I agree. Finally a clear explanation! Thank you, Vipin Raj!
I would echo Dion. A fantastic explanation of WITH RECURSIVE with a simple example set (bonus to learn a little Indian geography as well!)
Hello Vipin. Thank you for this very interesting post 🙂
I am looking for a solution to split sentences into sub-sentences using this technique. I added a question on Stackoverflow here: http://stackoverflow.com/questions/30399065/split-sentences-into-sub-sentences-using-a-recursive-postgresql-query and I wondered if you could help?
Many thank in advance!
As an interesting aside, say you want to go up the tree – given the child ‘kohinoor’, you want to trace the route through malappuram, kerala and india, here is the query
with recursive parents as (
select parent,1 as depth from location_hierarchy where child=’kohinoor’
parents as par
select * from parents;
Simply great tutorial.
Thanks it is very good and simple post that helped me to grasp the concept.
Really helped me.