Visualizing a Decision Tree – Machine Learning Recipes #2

Visualizing a Decision Tree – Machine Learning Recipes #2


100 thoughts on “Visualizing a Decision Tree – Machine Learning Recipes #2

  • i like the tutorial but pease dont make tutorials that a related anyway to biology . i really hate biology

  • open -a preview does not work for me, any idea guys? my Win machines does not recognize the cmd, do I need to install anythin else?

  • I see pydot must be installed, but anyone else getting "Module Not Callable" when running "tree.export(….)" ?

  • Just watched the first Machine Learning Recipies, and halfway through this, I was wondering if this was a good place to start learning how to code. Any recommendations?

  • For those that were having a bad time like me running graphviz on windows, follow the below:
    I have followed the following steps and it worked fine for me.

    1 . Download and install graphviz-2.38.msi from

    https://graphviz.gitlab.io/_pages/Download/Download_windows.html

    2 . Set the path variable

    (a) Control Panel > System and Security > System > Advanced System Settings > Environment Variables > Path > Edit

    (b) add 'C:Program Files (x86)Graphviz2.38bin'

  • Had written same prog but got message " Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."

  • hey great vid i am almost done i just am getting one little error at the end
    pydotplus.graphviz.InvocationException: GraphViz's executables not found
    really want this pdf for a presentation can you help with this error ?

  • This code make error "TypeError: 'numpy.ndarray' object is not callable "
    But it works good when I rewrite "test_idx" to "test_index"
    Why is it so?

  • for those with graphviz executables not found error, i've asked around and solve the problem.
    https://stackoverflow.com/questions/51376756/graphvizs-executables-not-found-python-3-and-pydotplus my first mistake is i only install the library, not the software. after that modify the environment path using this method https://www.mycodingzone.net/blogpost/english/pip-is-not-recognized-as-an-internal-or-external-command add the path where your gvedit.exe located (the bin folder). hope this helps everyone 🙂

  • I can't get the pdf file for some reason. The code didn't trow any error and that open -a preview didn't work.
    What am I doing wrong?

  • Is not a good video for beginners, he talks to fast and don't explain well what's he doing. Although, for people who has a knowledge about ML is a good video.

  • Whole Code in Python 3 Work perfectly:

    import numpy as np
    from sklearn import tree
    from sklearn.datasets import load_iris
    iris = load_iris()
    train_axis = [0, 50, 100]

    # train data
    train_target = np.delete(iris.target, train_axis)
    train_data = np.delete(iris.data, train_axis, axis=0)

    # test data
    test_target = iris.target[train_axis]
    test_data = iris.data[train_axis]

    clf = tree.DecisionTreeClassifier()
    clf.fit(train_data,train_target)

    print(test_target)
    print(test_data)

    print(clf.predict(test_data))

    from sklearn.externals.six import StringIO
    import graphviz

    dot_data = StringIO()
    tree.export_graphviz(clf, out_file = dot_data,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    filled = True, rounded = True,
    impurity = False)

    graph = graphviz.Source(dot_data.getvalue())
    graph.render("iris.pdf", view=True)

    _____________________________________
    install graphwiz in anconda:
    -> conda install -c anaconda graphviz python-graphviz
    _____________________________________
    For Whole iris Data print:
    # print(iris.feature_names)
    # print(iris.target_names)
    # print(iris.data[0])
    # print(iris.target[0])
    # for i in range(len(iris.target)):
    # print("Example %d: label %s, feature %s" % (
    # i,
    # iris.target[i],
    # iris.data[i]
    # )
    # )

  • Working code in python 3.7 , without the pdf :

    import numpy as np
    from sklearn.datasets import load_iris
    from sklearn import tree

    iris = load_iris()
    test_idx = [0,50,100]

    #training data
    train_target = np.delete(iris.target, test_idx)
    train_data = np.delete(iris.data, test_idx, axis=0)

    #testing data
    test_target = iris.target[test_idx]
    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()
    clf.fit(train_data, train_target)

    print (test_target)
    print (clf.predict(test_data))

    #viz code

    # test data

    print (test_data[1],test_target[1])
    print (iris.feature_names,iris.target_names)

  • Hi everyone,
    The code for Python 3 is already below. Best wishes.
    import numpy as np
    from sklearn.datasets import load_iris
    from sklearn import tree

    iris = load_iris()
    test_idx = [0,50,100]

    # training data
    train_target = np.delete(iris.target, test_idx)
    train_data = np.delete(iris.data, test_idx, axis=0)

    # testing data
    test_target = iris.target[test_idx]
    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(train_data, train_target)

    print(test_target)
    print(clf.predict(test_data))

    from sklearn.externals.six import StringIO
    import pydotplus
    dot_data = StringIO()
    tree.export_graphviz(clf,
    out_file = dot_data,
    feature_names = iris.feature_names,
    class_names = iris.target_names,
    filled = True, rounded = True,
    impurity = False)
    graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
    graph.write_pdf("iris.pdf")

  • Hey guys, for those who are using Python 3, the code showed on the video might not work. I made some alterations at the end of the code to generate the graph and it worked.

    # Import
    from sklearn.datasets import load_iris
    import numpy as np
    from sklearn import tree
    iris = load_iris()

    # Showing the data (this part I changed too)
    print(iris.data[0])
    print(iris.target[0])
    for i in range(len(iris.target)):
    print('Example {},t label {},t features {}'.format(i , iris.target[i] , iris.data[i]))

    # Training data
    test_idx = [0,50,100]
    train_target = np.delete(iris.target, test_idx)
    train_data = np.delete(iris.data, test_idx, axis=0)

    # Testing data
    test_target = iris.target[test_idx]
    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()
    clf.fit(train_data, train_target)

    print(test_target)
    print(clf.predict(test_data))

    # Exporting the decision tree
    from sklearn.externals.six import StringIO
    import pydot

    dot_data = StringIO()

    tree.export_graphviz(clf,
    out_file=dot_data,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    filled=True, rounded=True,
    impurity=False)

    # I used this module (graphviz) to generate the graph
    import graphviz as gp
    graph = gp.Source(dot_data.getvalue())
    graph.render("iris", view = True)

  • Unlike all the other recent posts, I could only get the visualisation to work with a change nobody else seems to be using (might be a python 2.7 thing?). Here's what I have to make it work:

    (graph,) = pydot.graph_from_dot_data(dot_data.getvalue())

    From: https://www.programcreek.com/python/example/84621/pydot.graph_from_dot_data

  • i have homework from my college about ML exactly Decision Tree fro data iris, but i must create manually model decision tree of data iris, could you explain how about if create model manually(i mean manual with paper)?

  • quick summary of video:
    ======================

    – there are several types of classifiers , each with their own pros/cons
    – one of the pros of a decision tree classifier is that it is human readable ("interpret-able")

    – let's talk a little bit about terminology used to describe the data that you use to train your classifier
    – let's assume you want to train your classifier to be able to predict the sex of people based on their height, weight, and bone density
    – you have a bunch of example data, each example consists of a height, weight, bone density, and the corresponding sex (male or female)
    – we call the height, weight, and bone density features
    – we call the male/female the class or label

    – there are a bunch of free data sets out there that you can use to practice machine learning
    – a popular one is called the iris data set
    – the iris data set has a bunch of different flower types, along with their petal width, sepal width, etc
    – again, the petal width, sepal width, etc would be called features while the actual flower type would be called the label

    – scikit-learn has a lot of these common data sets built in
    – for example, you can use the iris data set by doing data_set = load_iris()
    – data_set.data[0] is a list of the features of the 0-eth example in the data set
    – data_set.data[1] is a list of the features of the 1-eth example
    – data_set.target[0] is the label for the 0-eth example
    – and so on…I think you get it

    – usually, you splits your data set into two subsets
    – you use one of the subsets to train your classifier
    – you use the other subset to test how well your trained classifier predicts
    – a decent rule of thumb is 2/3 to train and 1/3 to test

    key thing to take away from the video:
    ================================

    Lots of different types of classifiers out there, decision tree is just one of them. One of the pros of a decision tree is that it is human readable. There are a bunch of free data sets that are easily importable in scikit-learn, use them to practice machine learning. Use about 2/3 of your data to train, and the other 1/3 to test.

    Hope that was helpful!!

    P.S. thanks so much for these videos, they are so well made!

  • Hello, Everyone, I want to learn artificial intelligence machine learning programming. Please suggest where I need to start it. Please send me the resource to this course. Thanx

  • I have some errors following:

    File "C:ProgramDataAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

    File "C:/Users/Std User/.spyder-py3/ml3.py", line 17, in <module>
    test_target=iris.target(test_idx)

    TypeError: 'numpy.ndarray' object is not callable.

    How can I solve this prob?

  • Hello everyone!
    I know I'm quite late, but if anyone could help it would be wonderful.
    My problem is at 3:53 , when I need to test the code.
    This is the error message on line 9:
    AttributeError: 'function' object has no attribute 'target'
    Please help, thanks!

  • import numpy as np
    from sklearn.datasets import load_iris
    from sklearn import tree

    iris = load_iris()
    test_idx = [0, 50, 100]

    # training data
    train_target = np.delete(iris.target, test_idx)
    train_data = np.delete(iris.data, test_idx, axis=0)

    # testing data
    test_target = iris.target[test_idx]
    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()
    clf.fit(train_data, train_target)

    print test_target
    print clf.predict(test_data)

    # viz code
    from sklearn.externals.six import StringIO
    import pydot
    dot_data = StringIO()
    tree.export_graphviz(clf,
    out_file=dot_data,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    filled=True, rounded=True,
    impurity=False)

    graph = pydot.graph_from_dot_data(dot_data.getvalue())
    graph.write_pdf("iris.pdf")

  • For those who are wondering how to install Graphviz for mac, start by installing a package manager like MacPorts or Homebrew and then install the package using it. Pip install for GraphViz for Mojave doesn't work correctly

  • Traceback (most recent call last):
    File "main.py", line 40, in <module>
    graph.write_pdf('iris.pdf')
    AttributeError: 'list' object has no attribute 'write_pdf'

    Am I the only one seeing that?

  • I've an issue to show the graph in python 3 (anaconda – pycharm). So I "googling" then tried this code below, and Alhamdulillah it's work 🙂

    from sklearn.externals.six import StringIO

    import graphviz

    dot_data = StringIO()

    tree.export_graphviz(clf,

    out_file=dot_data,

    feature_names=iris.feature_names,

    class_names=iris.target_names,

    filled=True, rounded=True,

    impurity=False)

    graph = graphviz.Source(dot_data.getvalue())

    graph.render("iris.pdf", view=True)

    print(graph)

    print(graph.render("iris.pdf", view=True))

  • #version 3.7.2

    import numpy as np

    from sklearn.datasets import load_iris

    from sklearn import tree

    iris = load_iris()

    test_idx = [0,50,100]

    train_target = np.delete(iris.target, test_idx)

    train_data = np.delete(iris.data, test_idx, axis=0)

    test_target = iris.target[test_idx]

    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()

    clf.fit(train_data, train_target)

    print test_target

    print clf.predict(test_data)

  • Very well thought out and easy-to-follow introduction. One suggestion: for people who are not aware of the "print ()" requirement of Python 3, it would be worth pointing it out for viewers in one of the earlier videos. Luckily I was aware, but with this small addition, the whole of the tutorial is easy to follow for someone with experience of any other programming language.

  • I also have an issue with pydot and pydotplus. I read all the comments and I have installed all the packages. I started with:

    graph = pydot.graph_from_dot_data(dot_data.getvalue())

    graph.write_pdf('output.pdf')
    I get this error:
    AttributeError: 'list' object has no attribute 'write_pdf'
    So I moved to pydotplus, and I get this error:
    TypeError: add_node() received a non node class object: <pydotplus.graphviz.Node object at 0x000001F601A390F0>
    Any idea?

  • Traceback (most recent call last):

    File "C:/Users/crazy hero/AppData/Local/Programs/Python/Python37-32/imgpro/ml2.py", line 12, in <module>

    clf.fit(train_data,train_target)

    File "C:Userscrazy heroAppDataLocalProgramsPythonPython37-32libsite-packagessklearntreetree.py", line 801, in fit

    X_idx_sorted=X_idx_sorted)

    File "C:Userscrazy heroAppDataLocalProgramsPythonPython37-32libsite-packagessklearntreetree.py", line 116, in fit

    X = check_array(X, dtype=DTYPE, accept_sparse="csc")

    File "C:Userscrazy heroAppDataLocalProgramsPythonPython37-32libsite-packagessklearnutilsvalidation.py", line 552, in check_array

    "if it contains a single sample.".format(array))

    ValueError: Expected 2D array, got 1D array instead:

    array=[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

    0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

    1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

    1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

    2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

    2. 2. 2.].

    Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

    >>>

  • just to point out that the link to https://en.wikipedia.org/wiki/Iris_flower_data_set not work.
    Great video! tanks a lot!

  • Some of the sound fx on this vid sound like they're from System Shock 2, quite appropriate since it's about an AI.

  • here is the code for everyone that wants to try it in python 3,7

    import numpy as np

    from sklearn.datasets import load_iris

    from sklearn import tree

    iris = load_iris()

    test_idx = [0,50,100

    ]

    train_target = np.delete(iris.target, test_idx)

    train_data = np.delete(iris.data, test_idx, axis=0)

    test_target = iris.target[test_idx]

    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()

    clf.fit(train_data, train_target)

    print (test_target) #prints the answer

    print (clf.predict(test_data)) #answer prediction

  • For anyone having problems with visualization because you're using python 3.x,

    graph= pydot.graph_from_dot_data(dot_data.getvalue())
    graph.write_pdf("iris.pdf")

    change to

    (graph, ) = pydot.graph_from_dot_data(dot_data.getvalue())
    Image(graph.create_png())

    Also, he does not copy sklearn docs exactly so here is his code:

    from sklearn.externals.six import StringIO

    import pydot

    from IPython.display import Image

    dot_data = StringIO()

    tree.export_graphviz(clf,

    out_file=dot_data, feature_names=iris.feature_names,

    class_names = iris.target_names,

    filled=True, rounded=True,

    impurity=False)

    (graph, ) = pydot.graph_from_dot_data(dot_data.getvalue())

    # graph.write_pdf("iris.pdf")

    Image(graph.create_png())

  • People having issue with code that generates "iris.pdf".
    Follow the instructions mentioned here: https://scikit-learn.org/stable/modules/tree.html

    conda install python-graphviz

    >>>

    >>> import graphviz
    >>> dot_data = tree.export_graphviz(clf, out_file=None)
    >>> graph = graphviz.Source(dot_data)
    >>> graph.render("iris")

    OR

    >>>

    >>> dot_data = tree.export_graphviz(clf, out_file=None,
    … feature_names=iris.feature_names,
    … class_names=iris.target_names,
    … filled=True, rounded=True,
    … special_characters=True)
    >>> graph = graphviz.Source(dot_data)
    >>> graph

  • Ok guys. Working code With PDF (Using python 3.7 and PyCharm IDE). Please keep in mind that multiple packages need to be installed for this code to run.

    import numpy as np
    import io
    from sklearn import tree
    from sklearn.datasets import load_iris

    iris=load_iris()
    test_idx=[0, 50, 100]

    #training data
    train_target = np.delete(iris.target, test_idx)
    train_data = np.delete(iris.data, test_idx, axis=0)

    #testing data
    test_target = iris.target[test_idx]
    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()
    clf.fit(train_data, train_target)

    print(test_target)
    print(clf.predict(test_data))

    #viz code

    import pydot
    dot_data = io.StringIO()
    tree.export_graphviz(clf,
    out_file=dot_data,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    filled=True, rounded=True,
    impurity=False)

    graph = pydot.graph_from_dot_data(dot_data.getvalue())
    graph[0].write_pdf("iris.pdf")

  • if you using python3 (at 4:23), try graph.render('iris")

    here is the source code https://scikit-learn.org/stable/modules/tree.html#tree

    from sklearn.datasets import load_iris
    from sklearn import tree
    import graphviz

    iris = load_iris()

    clf = tree.DecisionTreeClassifier()
    clf.fit(iris.data, iris.target)

    dot_data = tree.export_graphviz(clf, out_file=None,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    filled=True, rounded=True,
    impurity=False)

    graph = graphviz.Source(dot_data)
    graph.render("iris")

  • I just started ML and found it very helpful, great start, great examples, well-organized lectures. I am a beginner in ML and loved it, Some times the motivation to TECH might be boosted if you found a great source of learning.

  • Does somebody have the right code for the visualisation part? because I get the error that the windows powershell doesn't recognize the statement 'open' when I try this statement in the Windows Powershell: open -a preview iris.pdf. I also can't open the pdf file manually crazy enough.The code in sciklearns looks different than in the video and I tried that. So does somebody have the right code for python 3?

  • Hi! I used Jupyter Notebook from Anaconda (with Python 3), and i used the next code, and
    it worked!:

    # Import

    from sklearn.datasets import load_iris

    import numpy as np

    from sklearn import tree

    iris = load_iris()

    # Showing the data (this part I changed too)

    print(iris.data[0])

    print(iris.target[0])

    for i in range(len(iris.target)):

    print('Example {},t label {},t features {}'.format(i , iris.target[i] , iris.data[i]))

    # Training data

    test_idx = [0,50,100]

    train_target = np.delete(iris.target, test_idx)

    train_data = np.delete(iris.data, test_idx, axis=0)

    # Testing data

    test_target = iris.target[test_idx]

    test_data = iris.data[test_idx]

    clf = tree.DecisionTreeClassifier()

    clf.fit(train_data, train_target)

    print(test_target)

    print(clf.predict(test_data))

    # Exporting the decision tree

    from sklearn.externals.six import StringIO

    import pydot

    dot_data = StringIO()

    tree.export_graphviz(clf,

    out_file=dot_data,

    feature_names=iris.feature_names,

    class_names=iris.target_names,

    filled=True, rounded=True,

    impurity=False)

    # I used this module (graphviz) to generate the graph

    import graphviz as gp

    graph = gp.Source(dot_data.getvalue())

    graph.render("iris", view = True)

    #Saludos desde Chile!

  • I'm following along with him in Spyder but the iris.pdf file I'm creating doesn't have any filled colors… can anyone help explain why?

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *