This is a sample code to extract a tarball (tar.gz) and load data into a numpy array. You may also load the file into a pandas dataframe. Continue reading “Extract and load data directly from a tarball”
When creating training data for RNN, the target label for a given input label is the input label itself but shifted by one position. Please refer to the diagram below.
enumerate() is a useful function to make an iterator when used with a
for loop. Here we explain different ways of using
enumerate() using a python list.
enumerate() acts as an iterator yielding a tuple
(index,element)when applied on a list.
Extend() when acting on a list
get() method is useful when accessing key-value pair from a dictionary. It returns a pre-defined value (
-1 in the example below) if key is not present in dictionary, else it returns the value associated with key.
With a command-line interface to the server, it is often hard to quickly scan through the contents on a server. This can be circumvented using jupyter-lab (or jupyter notebook) running on the server and accessing it using a client machine. I presume you have already installed jupyter-lab (or jupyter-notebook) on server. Jupyter-lab is a better option as it comes with a file-navigator, spread-sheet viewer (faster than excell, reminds me of sublime text) and an image-viewer. Check out this video for the latest feature updates in jupyter-lab.
This is a post which will get updated periodically with interesting tips and tricks in ipython notebook
Post describes how to extract top feature names from a supervised learning classifier in sklearn.
Note: The training dataset
y_train are pandas dataframe with column names.
After fitting/training a classifier
clf, the scoring for features can be accessed (method varies depending on the classifier used).
- For example, for logistic regression it is the magnitude of the coefficients and can be accessed as
- For DecisionTree, it is
Sort the scores in descending order using
np.argsort() and pass it as an index to the column names of
# For Decision Tree classifier from sklearn.tree import DecisionTreeClassifier import numpy as np clf = DecisionTreeClassifier(random_state=42) clf.fit(X_train, y_train) importances = clf.feature_importances_ # printing top 5 features of fitted classifier print (X_train.columns[(np.argsort(importances)[::-1])][:5]) OR print(sorted(zip(X_train.columns,importances),key=lambda x: x)[::-1]
This function concatenate longest k number of arrays to a single string.
def longest_kconsec(strarr, k): if (k>0&k<=len(strarr)): sort = sorted(set(strarr),key = lambda x:len(x), reverse = True)[:k] return ''.join(sort) else: return ''
l = ['r','d','wdfe','ff','gfg','wfdew','d','ff'] longest_kconsec(l, 2)
The chart explains the gender difference in school performance based on different states of india. Full project report
- 1st Encoding (lines): median values of performance for boys and girls
- 2nd Encoding (colored bars): difference in median values
- 3rd Encoding (circle size): count of values used to find median
Continue reading “Horizontal bar chart with 3 encodings using matplotlib”
This a tutorial is to make a filled bar chart with color-fill mapped to data. The chart was created for this project.
Often we want to know how a function is written in an imported package. This post explains how to examine the source code of a function/class.
To know where the package is installed:
For the package pandas:
import pandas pandas.__file__
To examine the source code of a given function or class, import the package inspect.
import inspect as insp print insp.getsourcefile(pandas.DataFrame) # prints the path to source file print insp.getsourcelines(pandas.DataFrame) # prints the source code
inspect package can be found here.
Viewing the source code from IPython Notebook
? to the function name inside the ipython-notebook cell to view code description and
?? for the entire source code.
import pandas pandas.DataFrame? # shows the docstring pandas.DataFrame?? # shows the source code and docstring