Extracting top feature names for a trained classifier in order for sci-kit learn

Post describes how to extract top feature names from a supervised learning classifier in sklearn.

Note: The training dataset X_train and y_train are pandas dataframe with column names.

After fitting/training a classifier clf, the scoring for features can be accessed (method varies depending on the classifier used).

  • For example, for logistic regression it is the magnitude of the coefficients and can be accessed as clf.coef_
  • For DecisionTree, it is clf.feature_importances_

Sort the scores in descending order using np.argsort() and pass it as an index to the column names of X_train.columns.

# For Decision Tree classifier

from sklearn.tree import DecisionTreeClassifier
import numpy as np

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

importances = clf.feature_importances_

# printing top 5 features of fitted classifier
print (X_train.columns[(np.argsort(importances)[::-1])][:5])
print(sorted(zip(X_train.columns,importances),key=lambda x: x[1])[::-1]

Horizontal bar chart with 3 encodings using matplotlib


The chart explains the gender difference in school performance based on different states of india. Full project report

View python source code inside packages


Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.

To know where the package is installed:


For the package pandas:

import pandas

To examine the source code of a given function or class, import the package inspect.

import inspect as insp
print insp.getsourcefile(pandas.DataFrame) # prints the path to source file

print insp.getsourcelines(pandas.DataFrame) # prints the source code

A documentation of inspect package can be found here.

Viewing the source code from IPython Notebook

Append ? to the function name inside the ipython-notebook cell to view source code and ?? for the entire source code.

import pandas

pandas.DataFrame? # shows the docstring</code>​

pandas.DataFrame?? # shows the source code and docstring