Extracting top feature names for a trained classifier in order

Post describes how to extract top feature names from a supervised learning classifier in sklearn.

Note: The training dataset X_train and y_train are pandas dataframe with column names.

After fitting/training a classifier clf, the scoring for features can be accessed (method varies depending on the classifier used).

  • For example, for logistic regression it is the magnitude of the coefficients and can be accessed as clf.coef_
  • For DecisionTree, it is clf.feature_importances_

Sort the scores in descending order using np.argsort() and pass it as an index to the column names of X_train.columns.


# For Decision Tree classifier

from sklearn.tree import DecisionTreeClassifier
import numpy as np

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

importances = clf.feature_importances_

# printing top 5 features of fitted classifier
print (X_train.columns[(np.argsort(importances)[::-1])][:5])

Horizontal bar chart with 3 encodings using matplotlib

output_20_0

The chart explains the gender difference in school performance based on different states of india. Full project report

View python source code inside packages

matrix_code

Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.

To know where the package is installed:

[package_name].__file__

For the package pandas:

import pandas
pandas.__file__

To examine the source code of a given function or class, import the package inspect.

import inspect as insp
print insp.getsourcefile(pandas.DataFrame) # prints the path to source file

print insp.getsourcelines(pandas.DataFrame) # prints the source code

A documentation of inspect package can be found here.

Viewing the source code from IPython Notebook

Append ? to the function name inside the ipython-notebook cell to view source code and ?? for the entire source code.


import pandas

pandas.DataFrame? # shows the docstring</code>​

pandas.DataFrame?? # shows the source code and docstring