Running jobs from terminal in background [foreground and pausing]

This post is concerned with running jobs (one or more programs) in foreground and background using just a terminal. Here, I use firefox as an example to demonstrate the same. <username> on screenshots shown below is scrambled for privacy concerns.

Running programs from terminal

You can always run a  job in foreground by typing a command on to the terminal as

firefox

firefox_1
But once the program starts we lose the bash prompt. Another way to start firefox is to run the program in background. This will return bash prompt as shown below.
Continue reading “Running jobs from terminal in background [foreground and pausing]”

Setting up tmux with z-shell (zsh) as default

tmux , I would say is one of the finest interface to work on a remote server. But it comes with a boring shell. This post will help you configure zsh as the default shell when logging into ​​tmux.

Step 1 : Install zsh

sudo apt-get update
sudo apt-get install zsh
# check `zsh` version.
# It is prefereable to use a version above 5.0
zsh --version

Continue reading “Setting up tmux with z-shell (zsh) as default”

Running Jupyter Notebook on a remote server

With a command-line interface to the server, it is often hard to quickly scan through the contents on a server. This can be circumvented using jupyter-lab (or jupyter notebook) running on the server and accessing it using a client machine. I presume you have already installed jupyter-lab (or jupyter-notebook) on server. Jupyter-lab is a better option as it comes with a file-navigator, spread-sheet viewer (faster than excell, reminds me of sublime text) and an image-viewer. Check out this video for the latest feature updates in jupyter-lab.

Continue reading “Running Jupyter Notebook on a remote server”

Handling missing values in a Dataset before training

How to impute missing values in a dataset before feeding to a classifier is often a difficult decision. Imputing with a wrong value can significantly skew the data and result in wrong classifier. The ideal solution is to get a clean data set without any NULL values but then, we might have to throw out most data. There are no perfect workarounds as most classifiers are built based on the information from data and lack thereof results in the wrong classifier. Continue reading “Handling missing values in a Dataset before training”

Extracting top feature names for a trained classifier in order for sci-kit learn

Post describes how to extract top feature names from a supervised learning classifier in sklearn.

Note: The training dataset X_train and y_train are pandas dataframe with column names.

After fitting/training a classifier clf, the scoring for features can be accessed (method varies depending on the classifier used).

  • For example, for logistic regression it is the magnitude of the coefficients and can be accessed as clf.coef_
  • For DecisionTree, it is clf.feature_importances_

Sort the scores in descending order using np.argsort() and pass it as an index to the column names of X_train.columns.


# For Decision Tree classifier

from sklearn.tree import DecisionTreeClassifier
import numpy as np

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

importances = clf.feature_importances_

# printing top 5 features of fitted classifier
print (X_train.columns[(np.argsort(importances)[::-1])][:5])
OR
print(sorted(zip(X_train.columns,importances),key=lambda x: x[1])[::-1]

Horizontal bar chart with 3 encodings using matplotlib

output_20_0

The chart explains the gender difference in school performance based on different states of india. Full project report

View python source code inside packages

matrix_code

Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.

To know where the package is installed:

[package_name].__file__

For the package pandas:

import pandas
pandas.__file__

To examine the source code of a given function or class, import the package inspect.

import inspect as insp
print insp.getsourcefile(pandas.DataFrame) # prints the path to source file

print insp.getsourcelines(pandas.DataFrame) # prints the source code

Documentation of inspect package can be found here.

Viewing the source code from IPython Notebook

Append ? to the function name inside the ipython-notebook cell to view code description and ?? for the entire source code.


import pandas

pandas.DataFrame? # shows the docstring

pandas.DataFrame?? # shows the source code and docstring

Changing the title of github pages

Octocat

If you are a frequent user of github, you might have come across github pages, a service  to publish websites. Github pages are often helpful to explain/showcase your small projects with a neat webpage for each repository.

  • All you need is to include a markdown in your github repository by name INDEX.md and github pages will generate a webpage from it. There are many options as mentioned in documentation.
  • 3 easy steps to setup a webpage for a project.
    • If you are using ipython notebook, download your notebook as a markdown file
    • place “markdown file+resources” in your github repository inside docs folder (create one if doesn’t exist).
    • Rename the markdown as docs/INDEX.md to make it the default loader. Your project website is ready at the link “[usename].github.io/[projectname]”
    • By default project webpage shows repository name as the title of webpage. This can be edited if you create/edit the _config.yml file and place this content inside as key:value pairs. Make sure the value is passed as a string in quotes.
      theme: jekyll-theme-cayman
      title: "Title of the project goes here"
      description: "Subtitle/description which comes after title goes here"
      show_downloads: true # displays download button on the .io page
      

You can even use this to publish a Web-resume (get a resume template from w3schools).

Starting a Web Server on MAC for D3.js

Running the webpage locally requires starting a web server especially if it has javascript in the webpage.

  1. download/clone the repository from github link (or any other link) to a folder in your machine.
  2.  Switch to terminal and cd into the folder containing downloaded files and start the webserver as follows
     python -m http.server 8070

    This will start a webserver on the port 8070.

  3. Open web browser and type: ​http://localhost:8070/index.html
  4. index.html is loaded by default. Instead of index.html append [name].html to view the corresponding page.

Was used for this project.

 

Learning GIT version control

git has undoubtedly become the version control standard in the industry and this skill is inevitable for collaboration across multiple teams. Even solo projects can use git to streamline development and experiment with multiple branches. This post is a small starting point for newbies.

git documentation can be overwhelming for most newbies  with lots of options/commands. In reality, most developers end up using a handful of key features.

[Since it is always hard to remember], I have this cheatsheet (atlassian) posted on my desk.

Another nifty little command on MAC terminal to get the graphical repository browser :

$ gitk

Git Commit messages

Git commit messages are terse notes about the changes made since last commit. ideally messages are to follow the structure given below and It’s always a good practice to follow this guide for commit messages: GIT Style Guide

type: [subject]

[body]

[footer]

Check out these awesome websites which can teach you git graphically.

Resources

  1. learngitbranching.js.org (highly recommended)
  2. Git-IT (Git is an excellent learn by doing cross-platform project)
  3. https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud
  4. https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging

Some useful links for beginners to get involved

  1. http://www.firsttimersonly.com/
  2. https://github.com/search?utf8=%E2%9C%93&q=label%3Afirst-timers-only+is%3Aopen&type=Issues&ref=searchresults
  3. Medium blog for first timers

Finding d3 bl.ocks

If you are a user of d3 for visualization, you might already know what a bl.ock is. Apart from Mike Bostock’s bl.ocks, often it is very hard to navigate/find a good examples which you can use and build on top. Since most of the d3 examples are new creative ventures, it is often hard to classify them and hence indexing. These are some websites where you can navigate bl.ocks of users :

References:

  1. https://medium.com/@enjalot/searching-for-examples-2c0f75709c1a

View at Medium.com

Return top “N” elements from an array

top_n returns a mask = [True, False, True, False, False ...] with “True” for top n values. The mask is passed into an array as index to get “True” values.

import numpy as np
from scipy.stats import rankdata

def top_n(list_array, n = 1):

  """
  Returns a boolean mask with "True" for greatest "n" number of values
  """
  np_array = np.array(list_array)
  # creating a mask
  mask = np.zeros(len(np_array.flatten()), dtype=bool)
  r =rankdata(np_array, method ="dense")
  # rank matrix with highest value =1
  r=(r.max()+1)-r
  for index, val in enumerate(r):
    if  val <= (n):
	mask[index] = True
  return mask.reshape(np_array.shape)

boolean_filter will return a list where boolean is true.

def boolean_filter(b_list, boolean):
  """
  This function returns values in b_list where the boolean is true
  """
  return [item for i, item in enumerate(b_list) if boolean[i]==True]