Post describes how to extract top feature names from a supervised learning classifier in sklearn.
Note: The training dataset
y_train are pandas dataframe with column names.
After fitting/training a classifier
clf, the scoring for features can be accessed (method varies depending on the classifier used).
- For example, for logistic regression it is the magnitude of the coefficients and can be accessed as
- For DecisionTree, it is
Sort the scores in descending order using
np.argsort() and pass it as an index to the column names of
# For Decision Tree classifier
from sklearn.tree import DecisionTreeClassifier
import numpy as np
clf = DecisionTreeClassifier(random_state=42)
importances = clf.feature_importances_
# printing top 5 features of fitted classifier
print(sorted(zip(X_train.columns,importances),key=lambda x: x)[::-1]
This function concatenate longest k number of arrays to a single string.
def longest_kconsec(strarr, k):
sort = sorted(set(strarr),key = lambda x:len(x), reverse = True)[:k]
l = ['r','d','wdfe','ff','gfg','wfdew','d','ff']
The chart explains the gender difference in school performance based on different states of india. Full project report
This a tutorial is to make a filled bar chart with color-fill mapped to data. The chart was created for this project.
Continue reading “Creating a filled barchart with matplotlib”
Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.
To know where the package is installed:
For the package pandas:
To examine the source code of a given function or class, import the package inspect.
import inspect as insp
print insp.getsourcefile(pandas.DataFrame) # prints the path to source file
print insp.getsourcelines(pandas.DataFrame) # prints the source code
inspect package can be found here.
Viewing the source code from IPython Notebook
? to the function name inside the ipython-notebook cell to view code description and
?? for the entire source code.
pandas.DataFrame? # shows the docstring
pandas.DataFrame?? # shows the source code and docstring
If you are a frequent user of github, you might have come across github pages, a service to publish websites. Github pages are often helpful to explain/showcase your small projects with a neat webpage for each repository.
- All you need is to include a markdown in your github repository by name
INDEX.md and github pages will generate a webpage from it. There are many options as mentioned in documentation.
- 3 easy steps to setup a webpage for a project.
- If you are using
ipython notebook, download your notebook as a markdown file
- place “markdown file+resources” in your github repository inside
docs folder (create one if doesn’t exist).
- Rename the markdown as
docs/INDEX.md to make it the default loader. Your project website is ready at the link “[usename].github.io/[projectname]”
You can even use this to publish a Web-resume (get a resume template from w3schools).
- download/clone the repository from github link (or any other link) to a folder in your machine.
- Switch to terminal and
cd into the folder containing downloaded files and start the webserver as follows
python -m http.server 8070
This will start a webserver on the port 8070.
- Open web browser and type:
index.html is loaded by default. Instead of
[name].html to view the corresponding page.
Was used for this project.
git has undoubtedly become the version control standard in the industry and this skill is inevitable for collaboration across multiple teams. Even solo projects can use
git to streamline development and experiment with multiple branches. This post is a small starting point for newbies.
git documentation can be overwhelming for most newbies with lots of options/commands. In reality, most developers end up using a handful of key features.
[Since it is always hard to remember], I have this cheatsheet (atlassian) posted on my desk.
Another nifty little command on MAC terminal to get the graphical repository browser :
Git Commit messages
Git commit messages are terse notes about the changes made since last commit. ideally messages are to follow the structure given below and It’s always a good practice to follow this guide for commit messages: GIT Style Guide
Check out these awesome websites which can teach you
- learngitbranching.js.org (highly recommended)
- Git-IT (Git is an excellent learn by doing cross-platform project)
Some useful links for beginners to get involved
- Medium blog for first timers
If you are a user of d3 for visualization, you might already know what a bl.ock is. Apart from Mike Bostock’s bl.ocks, often it is very hard to navigate/find a good examples which you can use and build on top. Since most of the d3 examples are new creative ventures, it is often hard to classify them and hence indexing. These are some websites where you can navigate bl.ocks of users :
View story at Medium.com