This is a post which will get updated periodically with interesting tips and tricks in ipython notebook

# Tag: python

# Extracting top feature names for a trained classifier in order for sci-kit learn

Post describes how to extract top feature names from a supervised learning classifier in sklearn.

Note: The training dataset `X_train`

and `y_train`

are pandas dataframe with column names.

After fitting/training a classifier `clf`

, the scoring for features can be accessed (method varies depending on the classifier used).

- For example, for logistic regression it is the magnitude of the coefficients and can be accessed as
`clf.coef_`

- For DecisionTree, it is
`clf.feature_importances_`

Sort the scores in descending order using `np.argsort()`

and pass it as an index to the column names of `X_train.columns`

.

# For Decision Tree classifier from sklearn.tree import DecisionTreeClassifier import numpy as np clf = DecisionTreeClassifier(random_state=42) clf.fit(X_train, y_train) importances = clf.feature_importances_ # printing top 5 features of fitted classifier print (X_train.columns[(np.argsort(importances)[::-1])][:5]) OR print(sorted(zip(X_train.columns,importances),key=lambda x: x[1])[::-1]

# Sort & Join the longest k arrays

This function concatenate longest k number of arrays to a single string.

def longest_kconsec(strarr, k): if (k>0&k<=len(strarr)): sort = sorted(set(strarr),key = lambda x:len(x), reverse = True)[:k] return ''.join(sort) else: return ''

Executing…

l = ['r','d','wdfe','ff','gfg','wfdew','d','ff'] longest_kconsec(l, 2)

returns…

`'wfdewwdfe'`

# Horizontal bar chart with 3 encodings using matplotlib

The chart explains the gender difference in school performance based on different states of india. Full project report

- 1st Encoding (lines): median values of performance for boys and girls
- 2nd Encoding (colored bars): difference in median values
- 3rd Encoding (circle size): count of values used to find median

Continue reading “Horizontal bar chart with 3 encodings using matplotlib”

# Creating a filled barchart with matplotlib

This a tutorial is to make a filled bar chart with color-fill mapped to data. The chart was created for this project.

Continue reading “Creating a filled barchart with matplotlib”

# View python source code inside packages

Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.

To know where the package is installed:

`[package_name].__file__`

For the package pandas:

import pandas pandas.__file__

To examine the source code of a given function or class, import the package inspect.

import inspect as insp print insp.getsourcefile(pandas.DataFrame) # prints the path to source file print insp.getsourcelines(pandas.DataFrame) # prints the source code

A documentation of `inspect`

package can be found here.

## Viewing the source code from IPython Notebook

Append `?`

to the function name inside the ipython-notebook cell to view source code and `??`

for the entire source code.

import pandas pandas.DataFrame? # shows the docstring</code> pandas.DataFrame?? # shows the source code and docstring

# Return top “N” elements from an array

`top_n`

returns a `mask = [True, False, True, False, False ...]`

with “True” for top `n`

values. The mask is passed into an array as index to get “True” values.

import numpy as np from scipy.stats import rankdata def top_n(list_array, n = 1): """ Returns a boolean mask with "True" for greatest "n" number of values """ np_array = np.array(list_array) # creating a mask mask = np.zeros(len(np_array.flatten()), dtype=bool) r =rankdata(np_array, method ="dense") # rank matrix with highest value =1 r=(r.max()+1)-r for index, val in enumerate(r): if val <= (n): mask[index] = True return mask.reshape(np_array.shape)

`boolean_filter`

will return a list where boolean is true.

def boolean_filter(b_list, boolean): """ This function returns values in b_list where the boolean is true """ return [item for i, item in enumerate(b_list) if boolean[i]==True]

# Serialize python object to JSON

This is a wonderful article on how to serialize a python object into JSON