This is a post which will get updated periodically with interesting tips and tricks in ipython notebook
How to impute missing values in a dataset before feeding to a classifier is often a difficult decision. Imputing with a wrong value can significantly skew the data and result in wrong classifier. The ideal solution is to get a clean data set without any NULL values but then, we might have to throw out most data. There are no perfect workarounds as most classifiers are built based on the information from data and lack thereof results in the wrong classifier. Continue reading “Handling missing values in a Dataset before training”
Although there are multiple packages which plots ROC curve, this one seems to be the most convenient.
library(caTools) # Predict on test: p p <- predict(model, test, type = "response") # create ROC Curve colAUC(p,test[["Class"]],plotROC = T)
Post describes how to extract top feature names from a supervised learning classifier in sklearn.
Note: The training dataset
y_train are pandas dataframe with column names.
After fitting/training a classifier
clf, the scoring for features can be accessed (method varies depending on the classifier used).
- For example, for logistic regression it is the magnitude of the coefficients and can be accessed as
- For DecisionTree, it is
Sort the scores in descending order using
np.argsort() and pass it as an index to the column names of
# For Decision Tree classifier from sklearn.tree import DecisionTreeClassifier import numpy as np clf = DecisionTreeClassifier(random_state=42) clf.fit(X_train, y_train) importances = clf.feature_importances_ # printing top 5 features of fitted classifier print (X_train.columns[(np.argsort(importances)[::-1])][:5]) OR print(sorted(zip(X_train.columns,importances),key=lambda x: x)[::-1]
This function concatenate longest k number of arrays to a single string.
def longest_kconsec(strarr, k): if (k>0&k<=len(strarr)): sort = sorted(set(strarr),key = lambda x:len(x), reverse = True)[:k] return ''.join(sort) else: return ''
l = ['r','d','wdfe','ff','gfg','wfdew','d','ff'] longest_kconsec(l, 2)
The chart explains the gender difference in school performance based on different states of india. Full project report
- 1st Encoding (lines): median values of performance for boys and girls
- 2nd Encoding (colored bars): difference in median values
- 3rd Encoding (circle size): count of values used to find median
Continue reading “Horizontal bar chart with 3 encodings using matplotlib”
This a tutorial is to make a filled bar chart with color-fill mapped to data. The chart was created for this project.
Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.
To know where the package is installed:
For the package pandas:
import pandas pandas.__file__
To examine the source code of a given function or class, import the package inspect.
import inspect as insp print insp.getsourcefile(pandas.DataFrame) # prints the path to source file print insp.getsourcelines(pandas.DataFrame) # prints the source code
A documentation of
inspect package can be found here.
Viewing the source code from IPython Notebook
? to the function name inside the ipython-notebook cell to view source code and
?? for the entire source code.
import pandas pandas.DataFrame? # shows the docstring</code> pandas.DataFrame?? # shows the source code and docstring
If you are a frequent user of github, you might have come across github pages, a service to publish websites. Github pages are often helpful to explain/showcase your small projects with a neat webpage for each repository.
- All you need is to include a markdown in your github repository by name
INDEX.mdand github pages will generate a webpage from it. There are many options as mentioned in documentation.
- 3 easy steps to setup a webpage for a project.
- If you are using
ipython notebook, download your notebook as a markdown file
- place “markdown file+resources” in your github repository inside
docsfolder (create one if doesn’t exist).
- Rename the markdown as
docs/INDEX.mdto make it the default loader. Your project website is ready at the link “[usename].github.io/[projectname]”
- By default project webpage shows repository name as the title of webpage. This can be edited if you create/edit the
_config.ymlfile and place this content inside as key:value pairs. Make sure the value is passed as a string in quotes.
theme: jekyll-theme-cayman title: "Title of the project goes here" description: "Subtitle/description which comes after title goes here" show_downloads: true # displays download button on the .io page
- If you are using