Usage of enumerate() with python list

enumerate() is a useful function to make an iterator when used with a for loop. Here we explain different ways of using
enumerate() using a python list. enumerate() acts as an iterator yielding a tuple (index,element)when applied on a list.

Continue reading “Usage of enumerate() with python list”

Extracting top feature names for a trained classifier in order for sci-kit learn

Post describes how to extract top feature names from a supervised learning classifier in sklearn.

Note: The training dataset X_train and y_train are pandas dataframe with column names.

After fitting/training a classifier clf, the scoring for features can be accessed (method varies depending on the classifier used).

  • For example, for logistic regression it is the magnitude of the coefficients and can be accessed as clf.coef_
  • For DecisionTree, it is clf.feature_importances_

Sort the scores in descending order using np.argsort() and pass it as an index to the column names of X_train.columns.


# For Decision Tree classifier

from sklearn.tree import DecisionTreeClassifier
import numpy as np

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

importances = clf.feature_importances_

# printing top 5 features of fitted classifier
print (X_train.columns[(np.argsort(importances)[::-1])][:5])
OR
print(sorted(zip(X_train.columns,importances),key=lambda x: x[1])[::-1]