enumerate() is a useful function to make an iterator when used with a
for loop. Here we explain different ways of using
enumerate() using a python list.
enumerate() acts as an iterator yielding a tuple
(index,element)when applied on a list.
Extend() when acting on a list
get() method is useful when accessing key-value pair from a dictionary. It returns a pre-defined value (
-1 in the example below) if key is not present in dictionary, else it returns the value associated with key.
Although there are multiple packages which plots ROC curve, this one seems to be the most convenient.
library(caTools) # Predict on test: p p <- predict(model, test, type = "response") # create ROC Curve colAUC(p,test[["Class"]],plotROC = T)
Post describes how to extract top feature names from a supervised learning classifier in sklearn.
Note: The training dataset
y_train are pandas dataframe with column names.
After fitting/training a classifier
clf, the scoring for features can be accessed (method varies depending on the classifier used).
- For example, for logistic regression it is the magnitude of the coefficients and can be accessed as
- For DecisionTree, it is
Sort the scores in descending order using
np.argsort() and pass it as an index to the column names of
# For Decision Tree classifier from sklearn.tree import DecisionTreeClassifier import numpy as np clf = DecisionTreeClassifier(random_state=42) clf.fit(X_train, y_train) importances = clf.feature_importances_ # printing top 5 features of fitted classifier print (X_train.columns[(np.argsort(importances)[::-1])][:5]) OR print(sorted(zip(X_train.columns,importances),key=lambda x: x)[::-1]
This function concatenate longest k number of arrays to a single string.
def longest_kconsec(strarr, k): if (k>0&k<=len(strarr)): sort = sorted(set(strarr),key = lambda x:len(x), reverse = True)[:k] return ''.join(sort) else: return ''
l = ['r','d','wdfe','ff','gfg','wfdew','d','ff'] longest_kconsec(l, 2)
The chart explains the gender difference in school performance based on different states of india. Full project report
- 1st Encoding (lines): median values of performance for boys and girls
- 2nd Encoding (colored bars): difference in median values
- 3rd Encoding (circle size): count of values used to find median
Continue reading “Horizontal bar chart with 3 encodings using matplotlib”
- download/clone the repository from github link (or any other link) to a folder in your machine.
- Switch to terminal and
cdinto the folder containing downloaded files and start the webserver as follows
python -m http.server 8070
This will start a webserver on the port 8070.
- Open web browser and type:
index.htmlis loaded by default. Instead of
[name].htmlto view the corresponding page.
Was used for this project.
git has undoubtedly become the version control standard in the industry and this skill is inevitable for collaboration across multiple teams. Even solo projects can use
git to streamline development and experiment with multiple branches. This post is a small starting point for newbies.
git documentation can be overwhelming for most newbies with lots of options/commands. In reality, most developers end up using a handful of key features.
[Since it is always hard to remember], I have this cheatsheet (atlassian) posted on my desk.
Another nifty little command on MAC terminal to get the graphical repository browser :
Git Commit messages
Git commit messages are terse notes about the changes made since last commit. ideally messages are to follow the structure given below and It’s always a good practice to follow this guide for commit messages: GIT Style Guide
type: [subject] [body] [footer]
Check out these awesome websites which can teach you
- learngitbranching.js.org (highly recommended)
- Git-IT (Git is an excellent learn by doing cross-platform project)
Some useful links for beginners to get involved
top_n returns a
mask = [True, False, True, False, False ...] with “True” for top
n values. The mask is passed into an array as index to get “True” values.
import numpy as np from scipy.stats import rankdata def top_n(list_array, n = 1): """ Returns a boolean mask with "True" for greatest "n" number of values """ np_array = np.array(list_array) # creating a mask mask = np.zeros(len(np_array.flatten()), dtype=bool) r =rankdata(np_array, method ="dense") # rank matrix with highest value =1 r=(r.max()+1)-r for index, val in enumerate(r): if val <= (n): mask[index] = True return mask.reshape(np_array.shape)
boolean_filter will return a list where boolean is true.
def boolean_filter(b_list, boolean): """ This function returns values in b_list where the boolean is true """ return [item for i, item in enumerate(b_list) if boolean[i]==True]