Binary search tree – Implementation in C/C++

Binary search tree – Implementation in C/C++


In our previous lesson, we saw what binary
search trees are, now in this lesson we are going to implement binary search tree. We
will be writing some code for binary search tree. prerequisite for this lesson is that
you must understand the concepts of pointers and dynamic memory allocation in C/C++. If
you have already followed this series and seen our lessons on linked list, then implementation
of binary search tree or binary tree in general is not going to be very different. We will
have nodes and links here as well. Ok, so lets get started. Binary search tree or BST
as we know is a binary tree in which for each node, value of all the nodes in left subtree
is lesser or equal and value of all the nodes in right subtree is greater. We can draw BST
as a recursive structure like this. Value of all the nodes in left subtree must be lesser
or equal and value of all the nodes in right subtree must be greater and this must be true
for all nodes and not just the root node. So, in this recursive definition here, both
left and right subtrees must also be binary search trees. I have drawn a binary search
tree of integers here. Now, the question is, how can we create this non-linear logical
structure in computer’s memory. I had talked about this briefly when we had discussed binary
trees. The most popular way is – dynamically created nodes linked to each other using pointers
or references just the way we do it for linked lists. Because in a binary search tree, or
in a binary tree in general, each node can have at most 2 children, we can define node
as an object with 3 fields something like what I’m showing here. We can have a field
to store data, another to store address or reference of left child and another to store
address or reference to right child. If there is no left or right child for a node, reference
can be set as NULL. In C or C++, we can define node like this. There is a field to store
data. Here the data type is integer, but it can be anything. There is one field which
is pointer to node. Node asterisk means pointer to node. This one is to store the address
of left child and we have another one to store the address of right child. This definition
of node is very similar to definition of Node for doubly linked list. Remember in doubly
linked list also, each node had two links – one to previous node and another to next
node. but doubly linked list was a linear arrangement. This definition of node is for
a binary tree. We could also name this something like BstNode, but Node is also fine, lets
go with Node. Now, in our implementation, just like linked list, all the nodes will
be created in dynamic memory or heap section of application’s memory using malloc function
in ‘C’ or new operator in C++. We can use malloc in C++ as well. Now, as we know any
object created in dynamic memory or heap section of applications memory cannot have a name
or identifier. It has to be accessed through a pointer. malloc or new operator return us
pointer to the object created in heap. if you want to revise some of these concepts
of dynamic memory allocation, you can check the description of this video for link to
a lesson. Its really important that you understand this concept of stack and heap in applications
memory really well. Now, for a linked list, if you remember, the information that we always
keep with us is address of the head node. If we know the head node, we can access all
other nodes using links. In case of trees, the information that we always keep with us
is address of the root node. If we know the root node, we can access all other nodes in
the tree using links. To create a tree, we first need to declare a pointer to BstNode.
I’ll rather call node BstNode here, BST for binary search tree. So, to create a tree,
we first need to declare a pointer to BstNode that will always store the address of root
node. I have declared a pointer to Node here named rootPtr – Ptr for pointer. In C, you
can’t just write BstNode asterisk rootPtr. You will have to write struct space BstNode
asterisk, you will have to write struct here as well. I am gonna write C++ here, but anyway
right now I’m trying to explain the logic. we will not bother about minute details of
implementation. In this logical structure of tree that I’m showing here, each Node as
you can see has 3 fields, 3 cells. leftmost cell is to store the address of left child
and rightmost cell is to store the address of right child. Lets say root node is at address
200 in memory and I’ll assume some random addresses for all other nodes as well. Now,
I can fill in these left and right cells in each node with addresses of left and right
children. In our definition, data is first field, but in this logical structure, I am
showing data in the middle. Ok, so for each Node, I have filled in addresses for both
left and right child. Address is 0 or NULL if there is no child. Now, as we were saying,
identity of the tree is address of the root node. We need to have a pointer to Node in
which we can store the address of the root node. We must have a variable of type pointer
to node to store the address of root node. All these rectangles with 3 cells are Nodes.
They are created using malloc or new operator and live in heap section of application’s
memory. We cannot have name or identifier for them. They are always accessed through
pointers. This rootPtr, root pointer has to be a local or global variable. We will discuss
this in little more detail in some time. Quite often, we like to name this root pointer,
just root. We can do so, but we must not confuse. This is pointer to root and not the root itself.
To create a BST, as I was saying, we first need to declare this pointer. Initially, we
can set this pointer as NULL to say that the tree is empty. A tree with no node can be
called empty tree and for empty tree, root pointer should be set as NULL. We can do this
declaration and setting the root as NULL in main function in our program. Actually, lets
just write this code in a real compiler. I am writing C++ here. As you can see, in the
main function I have declared this pointer to Node which will always store the address
of root node of my tree and I am initially setting this as NULL to say that the tree
is empty. With this much of code, we have created an empty tree. But, whats the point
of having an empty tree. We should have some data in it. So, what I want to do now is I
want to write a function to be able to insert a node in the tree. I will write a function
named Insert that will take address of the root node and data to be inserted as argument
and this function will Insert a node with this data at appropriate position in the tree.
In the main function, I’ll make calls to this insert function passing it address of root
and the data to be inserted, lets say first I want to insert number 15 and then I want
to insert number 10 and then number 20. We can insert more, but lets first write the
logic for Insert function. Before I write the logic for Insert function, I want to write
a function to create a new Node in dynamic memory or heap. This function GetNewNode should
take an integer, the data to be inserted as argument, create a node in heap using new
or malloc and return back the address of this new node. I am creating a new node here using
this new operator. the operator will return me the address of the newly created node which
I am collecting in this variable of type pointer to BstNode. In C, instead of new operator,
we will have to use malloc. We can use malloc in C++as well. C++ is only a super-set of
C. malloc will also do here. Now, anything in dynamic memory or heap is always accessed
through pointer. Now, using this pointer – newNode, we can access the fields of newly created
Node. I will have to dereference this pointer using asterisk operator. i am writing asterisk
newNode and now I can access the fields. We have 3 fields in node – data and 2 pointers
to node , left and right;. i have set the data here. Instead of writing asterisk newNode
dot data, we have this alternate syntax that we could use. We could simply write newNode->data
and this will mean the same. We have used this syntax quite a lot in our lessons on
linked list. Now, for the new node, we can set the left and right child as NULL and finally
we can return the address of the new Node. Ok, coming back to the insert function. We
can have couple of cases in insertion. First of all, tree may be empty. For this first
insertion, when we are inserting this number 15, tree will be empty. if tree is empty,
we can simply create a new node and set it as root. With this statement, root equal GetNewNode,
I am setting root as address of the new node. But there is something not alright here. This
root is local variable of Insert function and its scope is only within this function.
We want this root, root in main to be modified. This guy is a local variable of main function.
There are 2 ways of doing this. We can either return the address of the new root. So, return
type of insert function will be pointer to BstNode and not void. And here, in the main
function, we will have to write statement like root equal Insert and the arguments.
So, we will have to collect the return and update our root in main function. Another
way is that, we can pass the address of this root of main to the Insert function. This
root is already a pointer to Node. So, its address can be collected in a pointer to pointer.
So, Insert function , in insert function first argument will be a pointer to pointer and
here, we can pass the address. We will say ampersand root to pass the address. We can
name this argument root or we can name this argument rootPtr. We can name this whatever.
Now, what we need to do is, we need to dereference this using asterisk operator to access the
value in root of main and we can also set the value in root of main. So, here with this
statement, we are setting the value and the return type now can be void. This pointer
to pointer thing gets a little tricky. I’ll go with the former approach. Actually, there
is another way. instead of declaring root as a local variable in main function, we can
declare root as global variable. Global variable, as we know has to be declared outside all
the functions. if root would be global variable, it would be accessible to all the functions
and we will not have to pass the address stored in it as argument. Anyway, coming back to
the logic for insertion. As we were saying, if the tree is empty, we can simply create
a new node and we can simply set it as root. At this stage, we wanted to insert 15. If
we will make call to the insert function, address of root is 0 or NULL. NULL is only
a macro for 0 and the second argument is the number to be inserted. In this call to Insert
function, we will make call to get new Node. Lets say, we got this new Node at address
200. GetNewNode function will return us address 200 which we can set as root here, but this
root is a local variable. We will return this address 200 back to the main function and
in the main function, we are actually doing this root equal Insert. So, in the main function
we are building this link. Ok, our next call in the main function was to insert number
10. At this stage, root is 200. the address in root is 200 and the value to be inserted
is 10. Now, the tree is not empty. So, what do we do. If the tree is not empty, we can
basically have 2 cases. If the data to be inserted is lesser or equal, we need to insert
it in the left subtree of root and if the data to be inserted is greater, we need to
insert it in right subtree of the root. So, we can reduce this problem in a self similar
manner, in a recursive manner. Recursion is one thing that we are going to use almost
all the time while working with trees. In this function, I’ll say that if the data to
be inserted is less than or equal to the data in root, then make a recursive call to insert
data in left subtree. The root of the left subtree will be the left child. So, in this
recursive call, we are passing address of left child and data as argument and after
the data is inserted in left subtree, the root of the left subtree can change. Insert
function will return the address of the new root of the left subtree and we need to set
it as left child of the current node. In this example tree here, right now, both left and
right subtree are empty. We are trying to insert number 10, so we have made call to
this function Insert. From main function, we have called Insert passing it address 200
and value or data 10. Now, 10 is lesser than 15, so control will come to this line and
a call will be made to Insert data in left subtree. Now, left subtree is empty. So, address
of root for left subtree is 0. Data passed, data to be inserted passed as argument is
10. Now, this first insert call will wait for this insert below to finish and return.
For this last, insert call, root is NULL. Lets say we got this node at address 150.
Now, this Insert call will return back 150 and execution of first insert call will resume
at this line and now this particular address will be set as 150. So, we will build this
link and now this insert call can finish. It can return back the current root. Actually
this return root should be there for all cases, so I am taking it out and I have it after
all these conditions. Of course, we will have one more else here. If the data is greater,
we need to go Insert in right subtree. the third call in Insert function was to Insert
number 20. Now this time, we will go to this else statement, this statement in else. Lets
say, we got this new Node at address 300. So, this guy will return 300. For this node
at 200, right child will be set as 300 and now this call to Insert can finish. The return
will be 200. Ok, at this stage what if a call is made to Insert number 25. We are at root
right now, the node with address 200. 25 is greater, so we need to go and insert in right
subtree. Right subtree is not empty this time. So, once again, for this call also, we will
come to this else, last else because 25 is greater than 20. Now, in this call we will
go tot the first if. A node will be created, lets say, we got this Node in heap at address
500. This particular call Insert(0,25) will return 500 and finish. Now, for the node at
300, right child will be set as 500. So, this link will get built. Now, this guy will return
300. The root for this subtree has not changed. And this first call to Insert will also wrap
up. It will return 200. So, we are looking good for all cases. This Insert function will
work for all cases. We could write this Insert function without using recursion. I encourage
you to do so. You will have to use some temporary pointer to Node and loops. Recursion is very
intuitive here and recursion is intuitive in pretty much everything that we do with
trees. So, its really important that we understand recursion really well. Ok, I will write one
more function now to search some data in BST. In the main function here, I have made some
more calls to Insert. Now, i want to write a function named Search that should take as
argument, address of the root node and the data to be searched and this function should
return me true if data is there in the tree, false otherwise. Once again, we will have
couple of cases. If the root is NULL, then we can return false. If the data in root is
equal to the data that we are looking for, then we can return true. Else, we can have
2 cases – either we need to go and search in the left subtree or we need to go in the
right subtree. So, once again, i am using recursion here. I am making recursive call
to search function in these two cases. If you have understood the previous recursion,
then this is very similar.Lets test this code now. what I have dine here is I have asked
the user to enter a number to be searched and then I am making call to this search function
and if this function is returning me true, I am printing “Found”, else I am printing
“NotFound”. Lets run this code and see what happens. I have moved multiple Insert statements
in one line because I am short of space here. Lets say, we want to search for number 8.
8 is found. And now lets say, we want to search for 22. 22 is not found. So, we are looking
good. I’ll stop here now. You can check the description of this video for link to all
the source code. We will do a lot more with trees in coming lessons. In our next lesson,
we will go a little deeper and try to see how things move in various sections of application’s
memory. How things move in stack and heap sections of memory when we execute these functions.
It will give you a lot of clarity. This is it for this lesson. Thanks for watching !

100 thoughts on “Binary search tree – Implementation in C/C++

  • In 18 minutes I just learned way more than I did in class since January. Thank you so much! I cant wait to watch your other videos!

  • hi,
    here I mistakenly used else if in the last condition without an if statement
    it showed that the control reached the end of non void function
    could you help me.what is the problem?

  • let us assume we have created the root node and after it we have created the left child and linked the root node .
    currently root has the address of the left child and at the last when we return root then actually it is returning address of left child which is get copied to global root beacuse of root=insert(root,10) that means we have lost the address of our root node …IS THIS TRUE…

  • In the Insert function, the GetNewNode function should be returned. As:
    if(root == NULL){
    return GetNewNode(data);
    }

  • Fantastic lesson. I just want to point out that the addressing can be simplified so much by making a class instead in C++ that will save the pain of referencing addresses and whatnot.

  • When I run the code given in github, I get the result as not found for all the numbers which we inserted. Anyone can help ?

  • Why do u put a bracket while dynamically allocating new… Such as.. Here u have written bstnode*=new bstnode() ;
    Why do we put there circular paranthesis… Everywhere else whe just write
    Bstnode*=new bstnode;

  • great teacher just one word to say 'cin>>datastructures; loop to the end of life…. implement cout<<datastructures

  • Unbelievable, I am really bad in English but I understand almost everything what you explain in this language, which is pretty strange for me. It is like Einstein said: If you cant explain it easy enough you do not understand it well enough.
    And you understand it absolutely, so your explanations get great!

  • if root implies to the very first node, how is it possible that the value of root is kept throughout the program when new value of root is returned from insert() function?

  • some people in the comments made right points…we are unable to access the nodes …because the address is lost .. can anyone let me know what to do

  • The best video I've found on bst. God, I was stuck at the "pointer-to-pointer" code, but your code with a single pointer made my life easy. Thanks a lot!!

  • I have a dobut :

    Why do we have to write root = Insert( , ) in main function , if I declared root as globally ? Asking becasue it is not working if I did not write root=Insert() even though I declared root globally ?

    Anybody can help?

  • Why do you have to collect and update the root in main function (even if i declared root globally , it is not working if i did not write root = Insert() ) ?

  • In the line `if data <= root.data: return Search(root.left, data)`, shouldn't the <= be just <?
    When root.data == data, a separate if condition handles this case right?

  • Video is great and explained every concept clearly. This video is helping ne for preparing for my exams.Thanku codeschool for making such an awesome video tutorials.

  • All the cases returning 200 how please explain? when we call to create new node root value getting changed how it returning 200 all the cases

  • The perfect one

    #include<iostream>
    using namespace std;
    //Definition of Node for Binary search tree
    struct BstNode {
    int data;
    BstNode* left;
    BstNode* right;
    };

    // Function to create a new Node in heap
    BstNode* GetNewNode(int data) {
    BstNode* newNode = new BstNode();
    newNode->data = data;
    newNode->left = newNode->right = NULL;
    return newNode;
    }

    BstNode* Insert(BstNode* root,int data) {
    if(root == NULL)
    {
    return GetNewNode(data);
    }

    else if(data <= root->data) {
    root->left = Insert(root->left,data);
    }

    else {
    root->right = Insert(root->right,data);
    }
    //return root;
    }

    bool Search(BstNode* root,int data) {
    if(root == NULL) {
    return false;
    }
    else if(root->data == data) {
    return true;
    }
    else if(data <= root->data) {
    return Search(root->left,data);
    }
    else {
    return Search(root->right,data);
    }
    }
    int main() {
    BstNode* root = NULL; // Creating an empty tree
    /*Code to test the logic*/
    root = Insert(root,15);
    cout<<root<<endl;

    Insert(root,25);
    cout<<root<<endl;

    Insert(root,8);
    cout<<root<<endl;

    Insert(root,12);
    cout<<root<<endl;

    // Ask user to enter a number.
    int number;
    cout<<"Enter number be searchedn";
    cin>>number;
    //If number is found, print "FOUND"
    if(Search(root,number) == true) cout<<"Foundn";
    else cout<<"Not Foundn";
    }

  • Great explanation… Except for the addresses, which are explained as 300 and 500 etc which is usually misunderstood by the root values … better if it was declared as "a" "b" "c" … Thank you

  • Hi. I just learned C and now I am trying to learn C++. I am wondering in C, when we declare a node using struct, instead of node * left or node * right, we add struct in front of node (struct node * left). May I ask why does he omit this struct in the video and why does it still work ? Is this a feature of C++ ? Thank you

  • hello , I got " core dumped segmentation error" what should I suppose to do, to remove my error. could you suggest me

  • We can not use malloc as a substitute of New operator C++ bcoz malloc allocates memory where as new allocates as well constructs objects which calls the constructior

  • To all who wants the iterative solution :

    #include<iostream>
    using namespace std;

    struct Node{
    int data;
    Node* left;
    Node* right;
    };

    Node *newNode(int key)
    {
    Node *node = new Node;
    node->data = key;
    node->left = node->right = nullptr;

    return node;
    }

    void insert(Node** headref, int data){

    if (*headref == NULL)
    {
    *headref = newNode(data);
    return;
    }

    Node* travptr = *headref;
    Node* parentNode = NULL;

    while(travptr != NULL){
    parentNode = travptr;
    travptr = (data > travptr->data) ? travptr->right: travptr->left;
    }

    if (data > parentNode->data )
    parentNode->right = newNode(data);

    else
    parentNode->left = newNode(data);
    }

    bool find(Node **headref, int data)
    {
    Node *travptr = *headref;

    if (data == (*headref)->data)
    return true;

    while (travptr != NULL && travptr->data != data)
    travptr = (data > travptr->data) ? travptr->right : travptr->left;

    if (travptr == NULL)
    {
    return false;
    }
    return true;
    }

    int main(){

    Node* head = NULL;
    insert(&head, 1);
    insert(&head, 2);
    insert(&head, 3);
    insert(&head, 4);
    cout << find(&head, 3) << "t" << find(&head, 10);
    return 0;
    }

  • If you make even a small mistake in dynamic memory allocation, it can cost your system's memory. This is why im so scared of dynamic allocation.

  • what if we pass the root by reference then what will happen cuz my code hasn't work properly
    every time i run the prog it show me an error say has stopped working

  • Can someone make me understand why we were re- assigning the value of root for every insertion in main function? What is the need of it? #mycodeschool

  • Perfect Video Ever On BST…Such a GlassClear Explanation…Just wanted to add here that instead of using a separate function GetNode we can manage the node creation in struct itself as below :
    struct Node
    {
    int data;
    Node* left , *rigth;
    Node(int data){
    this->data = data;
    left = right = NULL;
    }
    };
    So instead of calling method we need to do simply new Node(data);

  • super good explanation! I was like "what the heack is a pointer reference' but now i get it. the new node being dynamically created is basically a pointer to a node so when calling insert privately, that is the pointer passed in as reference. Makes more sense after thinking about it more.

  • Sad to feel that this legend is no more in this world!!!!

    He will still remain eternal on Youtube and help everyone forever!!!!

  • this guy is the best teacher ever and you can't even find his name on the channel. Thank you for your work, humble Animesh Nayan.

  • The explanation about BST was splendid but this is a master piece because you not only made the concept clear but also implementation in the actual program.. hats off

  • I think this program is wrong if we do searching root = insert (root,12)
    searching only start from the last node in which 12 is stored

  • What was that concept of double pointer and insert function return type to be void?Can anyone explain me in detail…

  • Python Implementation – https://github.com/pranaychandekar/dsa/blob/master/src/trees/binary_trees/binary_search_trees/binary_search_tree.py

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *