Horizontal bar chart with 3 encodings using matplotlib

output_20_0

The chart explains the gender difference in school performance based on different states of india. Full project report

  • 1st Encoding (lines): median values of performance for boys and girls
  • 2nd Encoding (colored bars): difference in median values
  • 3rd Encoding (circle size): count of values used to find median

Structure of G_perfomance (pandas dataframe):

State Boy Girl diff
AN 34.0550 37.10125 -3.04625
AP 32.0000 32.65500 -0.65500
AR 32.0875 31.33500 0.75250
BR 37.0000 35.00000 2.00000
CG 33.9825 33.45500 0.52750

Structure of G_count (pandas dataframe):

State Boy Girl ratio
AN 971 956 1.0
AP 3450 4093 1.0
AR 2438 2543 1.0
BR 3407 3757 1.0
CG 3346 3401 1.0

fig = plt.figure(figsize = (5,14),dpi=100)

ax = fig.add_subplot(111)
ax2 = ax.twiny()

# bar plots on first axis ax
ax.barh(np.arange(len(G_perfomance["Boy"])),\
width = G_perfomance["Boy"],height = 0.1, color="k",\
align = "center", alpha =0.25, linewidth = 0)
ax.barh(np.arange(len(G_perfomance["Girl"])),\
width = -G_perfomance["Girl"],height = 0.1, color="k",\
align = "center", alpha =0.25, linewidth = 0)

# scatter plots on first axis ax with marker size mapped on to sampe size
ax.scatter(x = G_perfomance["Boy"],\
y = np.arange(len(G_perfomance["diff"])),\
s = G_count["Boy"]*0.1,\
color = "k", alpha =0.5)
ax.scatter(x = -G_perfomance["Girl"],\
y = np.arange(len(G_perfomance["diff"])),\
s = G_count["Girl"]*0.1,\
color = "k", alpha =0.5)

# First x-axis
ax.set_xlim(-60, 60)
ax.set_xticklabels([str(abs(x)) for x in ax.get_xticks()]) # changing the x ticks to remove "-"
ax.set_xlabel("Median performance")

for a in [100,500]:
ax.scatter([],[],c='k', alpha=0.5, s=a,label = "{0}".format(a*10))

# Second x-axis
ax2.barh(np.arange(len(G_perfomance["diff"])),
width = G_perfomance["diff"], height = 0.75, align = "center",\
color=_COLORS)

ax2.set_xlim(-10, 10)
ax2.grid(False)
ax2.set_xlabel("Median performance difference (Boys - Girls)")

# y-axis
ax.set_ylim(-1, len(G_perfomance.index)+2)
plt.yticks(np.arange(len(G_perfomance.index)),list(G_perfomance.index))
plt.axvline(x= 0, color='k', linewidth = 0.75, ymax = 0.94)

# legend
red_patch = mpatches.Patch(color='red', label='Boys Perform Better')
blue_patch = mpatches.Patch(color='blue', label='Girls Perform Better')
plt.legend(handles=[blue_patch, red_patch], loc=2, ncol =1, mode = "expand")
ax.legend(loc=1,ncol=2)

# annotation patch
tboy = ax.text(50, -2.2, "Boys", ha="center", va="center", rotation=0,
size=10,color = "w",
bbox=dict(boxstyle="rarrow,pad=0.3", fc="grey", ec="b", lw=0))
tgirl = ax.text(-50, -2.2, "Girls", ha="center", va="center", rotation=0,
size=10,color = "w",
bbox=dict(boxstyle="larrow,pad=0.3", fc="grey", ec="b", lw=0))

plt.show()

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s