Extracting top feature names for a trained classifier in order

Post describes how to extract top feature names from a supervised learning classifier in sklearn.

Note: The training dataset X_train and y_train are pandas dataframe with column names.

After fitting/training a classifier clf, the scoring for features can be accessed (method varies depending on the classifier used).

  • For example, for logistic regression it is the magnitude of the coefficients and can be accessed as clf.coef_
  • For DecisionTree, it is clf.feature_importances_

Sort the scores in descending order using np.argsort() and pass it as an index to the column names of X_train.columns.


# For Decision Tree classifier

from sklearn.tree import DecisionTreeClassifier
import numpy as np

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

importances = clf.feature_importances_

# printing top 5 features of fitted classifier
print (X_train.columns[(np.argsort(importances)[::-1])][:5])

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s