What is meant by binary tree crossing

What is the most efficient / elegant way to break a flat table into a tree?

For example, suppose you have a flat table that stores an ordered tree hierarchy:

Here is a diagram of where we have. Root node 0 is fictitious.

[0] root / \. [1] Node 1 [3] Node 2 / \ \ [2] Node 1.1 [6] Node 1.2 [5] Node 2.1 /. [4] Node 1.1.1

What minimalist approach would you use to render this a properly ordered, properly indented tree in HTML (or text)?

Suppose you only have basic data structures (arrays and hashmaps), no fancy objects with parent / child references, no ORM, no framework, just your two hands. The table is presented as a result set that can be accessed at random.

Pseudocode or plain English is fine, this is a purely conceptual question.

Bonus question: Is there a fundamentally better way to store such a tree structure in an RDBMS?


To answer a commenter (Mark Bessey) question: A root node is not required as it will never show up anyway. ParentId = 0 is the convention used to express "this is the top level". The Order column defines how nodes with the same parent are sorted.

The "result set" I was talking about can be represented as a series of hash maps (to stay with that terminology). Because my example should already be there. Some answers go the extra mile and construct it first, but that's okay.

The tree can be as deep as you want. Each node can have N child nodes. I didn't exactly have a "Millions of Entries" tree in mind, however.

Don't confuse my choice of node name ('Node 1.1.1') with something you can rely on. The nodes could just as easily be referred to as "Frank" or "Bob", no name structure is implied, this was just to make them readable.

I've published my own solution so you can tear it to pieces.


Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.

I tested recursive queries in MySQL 8.0 in my Recursive Query Throwdown presentation in 2017.

Below is my original answer from 2008:

There are several ways to store tree-structured data in a relational database. What you are showing in your example uses two methods:

  • Adjacency list (the "parent" column) and
  • Path enumeration (the dotted numbers in your name column).

Another solution is called Nested sets and can also be stored in the same table. For more information on these designs, see "Trees and Hierarchies in SQL for Smarties" by Joe Celko.

I usually prefer a design called Closure table (also known as "Adjacency Relation") for storing data in a tree structure. It requires a different table, but querying trees is pretty straightforward.

I describe the closure table in my hierarchical data presentation models using SQL and PHP and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

Store all paths in the closure table in which there is a direct descent from one node to another. For each node, add a line to refer to itself. For example, use the record you showed in your question:

Now you can get a tree starting at node 1 like this:

The output (in the MySQL client) looks like this:

In other words, nodes 3 and 5 are excluded because they are part of a separate hierarchy and do not descend from node 1.

Subject: Comment from e-satis on immediate children (or immediate parents). You can add a column to the "" column to make querying an immediate child or parent (or other distance) easier.

You can then add a term to your search to query the immediate children of a particular node. These are offspring whose 1 is.

Comment from @ashraf: "How about sorting the whole tree [by name]?"

Here is a sample query to return all nodes that are descendants of node 1, link them to the FlatTable that contains other node attributes, e.g. B. and to be sorted by name.

Comment from @Nate:

A user suggested an edit today. SO moderators approved the edit, but I'm undoing it.

Editing suggested that ORDER BY should be at the top of the last query, presumably to ensure that the order matches the hierarchy. However, this does not work because "Node 1.1.1" is ordered after "Node 1.2".

If you want the order to match the hierarchy in a meaningful way, you can, but not simply by ordering by path length. For example, see my answer on the MySQL Closure Table hierarchical database - How to subtract information in the correct order.

Using nested sets (sometimes called modified pre-order tree traversal) allows you to extract the entire tree structure or any subtree within it in tree order with a single query, with a higher cost for inserts as needed to manage columns that one Describe the path in the correct order through your tree structure.

For django-mptt I used a structure like this:

id parent_id tree_id level lft rght - --------- ------- ----- --- ---- 1 null 1 0 1 14 2 1 1 1 2 7 3 2 1 2 3 4 4 2 1 2 5 6 5 1 1 1 8 13 6 5 1 2 9 10 7 5 1 2 11 12

Which describes a tree that looks like this (with representation of each element):

1 + - 2 | + - 3 | + - 4 | + - 5 + - 6 + - 7

Or as a nested quantity diagram, which makes the functionality of the and values ​​clearer:

__________________________________________________________________________ | Root 1 | | ________________________________ ________________________________ | | | Child 1.1 | | Child 1.2 | | | | ___________ ___________ | | ___________ ___________ | | | | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | | 1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14 | | ________________________________ | | ________________________________ | | | __________________________________________________________________________ |

As you can see, getting the entire subtree for a given node in the tree, you can simply select all of the rows that have and their values ​​between and values. It is also easy to get the family tree for a specific node.

The column is a little denormalized, mainly for simplicity, and the column allows you to restart the numbering and numbering for each top-level node, thus reducing the number of columns affected by inserts, moves, and deletions as are the columns and must be adjusted accordingly when these operations take place to create or close gaps. I made some development notes as I tried to dig into the queries required for each operation.

In order to actually work with this data to display a tree, I created a utility function that, for each node, would provide enough information to generate the type of display we wanted.

More information about MPTT:

It's a pretty old question, but since it has many views, I think it's worth considering an alternative and, in my opinion, very elegant solution.

To read a tree structure, you can recursive common table expressions Use (CTEs). It offers the ability to view the entire tree structure at once, get information about the level of the node, its parent node and the order within the child nodes of the parent node.

Let me show you how this would work in PostgreSQL 9.1.

  1. Create a structure

  2. Write a query

    Here are the results:

    The tree nodes are arranged according to a depth. In the final edition, we would present them in the following lines.

    For each level they are sorted according to parent_id and node_order within the parent element. Here's how to present them to the parent node in this order in the output link node.

    With such a structure, it wouldn't be difficult to make a really nice presentation in HTML.

    Recursive CTEs are in PostgreSQL, IBM DB2, MS SQL Server and Oracle available .

    If you want to learn more about recursive SQL queries, you can either check the documentation for your preferred DBMS or read my two articles on the same topic:

As of Oracle 9i, you can use CONNECT BY.

Starting with SQL Server 2005, you can use a recursive common table expression (CTE).

Both give the following results.

Name 'Node 1' 'Node 1.1' 'Node 1.1.1' 'Node 1.2' 'Node 2' 'Node 2.1'

Bill's answer is darn good, this answer adds a few things that make me want SO supported thread responses.

In any case, I wanted to support the tree structure and the Order property. I put a single property in each node called that does the same thing that was intended in the original question (keep order from left to right).

mysql> desc node; + ------------- + -------------- + ------ + ----- + ------- - + ---------------- + | Field | Enter | a zero | Key | Standard | Extra | + ------------- + -------------- + ------ + ----- + ------- - + ---------------- + | id | int (11) | NO | PRI | NULL | auto_increment | | Name | varchar (255) | YES | | NULL | | | leftSibling | int (11) | NO | | 0 | | + ------------- + -------------- + ------ + ----- + ------- - + ---------------- + 3 rows in the sentence (0.00 sec.) Mysql> desc neighborhoods; + ------------ + --------- + ------ + ----- + --------- + --- ------------- + | Field | Enter | a zero | Key | Standard | Extra | + ------------ + --------- + ------ + ----- + --------- + --- ------------- + | RelationId | int (11) | NO | PRI | NULL | auto_increment | | Parents | int (11) | NO | | NULL | | | Child | int (11) | NO | | NULL | | | pathLen | int (11) | NO | | NULL | | + ------------ + --------- + ------ + ----- + --------- + --- ------------- + 4 rows in a sentence (0.00 sec.)

More details and SQL code on my blog.

Thank you Bill. Your answer was helpful to get started!

If I had a choice, I would use objects. I would create an object for each record where each object has a collection and store them all in an assoc array (/ hashtable) where the id is the key. And flash once through the collection and add the children to the relevant children's fields.Easy.

But because you're not having fun limiting the use of a good OOP, I'd probably iterate based on:

Edit: This is similar to some of the other entries, but I think it's a little cleaner. I would like to add one thing: this is extremely SQL intensive. It is angry . If you have a choice, go the OOP route.

This was written quickly and isn't pretty or efficient (plus, there are lots of car boxes converting between and is annoying!), But it works.

It's probably against the rules as I create my own objects, but hey, I do this as a distraction from the real work :)

This also assumes that the resultSet / table is fully read into a structure before you start creating nodes. This wouldn't be the best solution if you have hundreds of thousands of lines.