Skip to content

Commit d228a67

Browse files
committedApr 30, 2018
Progress prior to solution lecture
1 parent 57d8f47 commit d228a67

File tree

1 file changed

+134
-0
lines changed

1 file changed

+134
-0
lines changed
 
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"name": "Troy Bradley Copy of Statistics - Coding Challenge #1.ipynb",
7+
"version": "0.3.2",
8+
"provenance": []
9+
},
10+
"kernelspec": {
11+
"name": "python3",
12+
"display_name": "Python 3"
13+
}
14+
},
15+
"cells": [
16+
{
17+
"cell_type": "markdown",
18+
"metadata": {
19+
"id": "view-in-github",
20+
"colab_type": "text"
21+
},
22+
"source": [
23+
"[View in Colaboratory](https://colab.research.google.com/github/bitcointroy/MLcodechallenges/blob/master/Troy_Bradley_Copy_of_Statistics_Coding_Challenge_1.ipynb)"
24+
]
25+
},
26+
{
27+
"metadata": {
28+
"id": "Vim6ATB8ADUI",
29+
"colab_type": "text"
30+
},
31+
"cell_type": "markdown",
32+
"source": [
33+
"# Statistics Coding Challenge #1\n",
34+
"\n",
35+
"In this coding challenge, we are going to use the \"Accidental Drug Related Deaths 2012-2017 (State of Connecticut)\" data set available from the Data.Gov website (https://catalog.data.gov/dataset?groups=local&organization_type=State+Government#topic=local_navigation). \n",
36+
"\n",
37+
"There are 2 main objectives you need to accomplish:\n",
38+
"\n",
39+
"1) First treat missing values for the \"Death City\" attribute - replace any missing values in the \"Death City\" with the city that has experienced the most number of deaths\n",
40+
"\n",
41+
"For each city, do the following:\n",
42+
"\n",
43+
"2) Compute summary statistics for the *age* attribute:\n",
44+
"\n",
45+
"\n",
46+
"a) Mean\n",
47+
"\n",
48+
"b) Median\n",
49+
"\n",
50+
"c) 25%, 50% and 75% percentiles using [np.percentile](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.percentile.html)\n",
51+
"\n",
52+
"d) Examine any outliers in data using [Boxplots](https://matplotlib.org/2.1.1/gallery/statistics/boxplot_demo.html)\n",
53+
"\n",
54+
"e) Construct a Bee Swarm plot to highlight the distibution of age by each city using [seaborn.swarmplot](https://seaborn.pydata.org/generated/seaborn.swarmplot.html)\n",
55+
"\n",
56+
"\n"
57+
]
58+
},
59+
{
60+
"metadata": {
61+
"id": "vchq-N1O9Nlk",
62+
"colab_type": "code",
63+
"colab": {
64+
"base_uri": "https://localhost:8080/",
65+
"height": 629
66+
},
67+
"outputId": "8fe82ce0-4ad0-477b-a058-fe274282149e"
68+
},
69+
"cell_type": "code",
70+
"source": [
71+
"# LAMBDA SCHOOL\n",
72+
"#\n",
73+
"# MACHINE LEARNING\n",
74+
"#\n",
75+
"# MIT LICENSE\n",
76+
"\n",
77+
"import pandas as pd\n",
78+
"\n",
79+
"# from google.colab import files\n",
80+
"\n",
81+
"dataset = pd.read_csv('https://data.ct.gov/api/views/rybz-nyjw/rows.csv?accessType=DOWNLOAD')\n",
82+
"# dataset2 = files.upload('https://data.ct.gov/api/views/rybz-nyjw/rows.csv?accessType=DOWNLOAD')\n",
83+
"\n",
84+
"print(dataset.head(5))\n",
85+
"# print(dataset2.head)"
86+
],
87+
"execution_count": 6,
88+
"outputs": [
89+
{
90+
"output_type": "stream",
91+
"text": [
92+
" CaseNumber Date Sex Race Age Residence City Residence State \\\n",
93+
"0 14-9876 06/28/2014 NaN NaN NaN NaN NaN \n",
94+
"1 12-16897 11/30/2012 Male White 45.0 NaN NaN \n",
95+
"2 13-11849 08/12/2013 Male White 30.0 NEW HAVEN NaN \n",
96+
"3 14-17578 11/23/2014 Male White 27.0 NAUGATUCK NaN \n",
97+
"4 12-11497 08/14/2012 Male White 21.0 ENFIELD NaN \n",
98+
"\n",
99+
" Residence County Death City Death State \\\n",
100+
"0 NaN NaN NaN \n",
101+
"1 NaN NEW HAVEN NaN \n",
102+
"2 NaN NEW HAVEN NaN \n",
103+
"3 NaN NEW MILFORD NaN \n",
104+
"4 NaN ENFIELD NaN \n",
105+
"\n",
106+
" ... Benzodiazepine Methadone Amphet \\\n",
107+
"0 ... Y NaN NaN \n",
108+
"1 ... NaN NaN NaN \n",
109+
"2 ... NaN Y NaN \n",
110+
"3 ... NaN NaN NaN \n",
111+
"4 ... NaN NaN NaN \n",
112+
"\n",
113+
" Tramad Morphine (not heroin) Other Any Opioid MannerofDeath \\\n",
114+
"0 NaN NaN NaN NaN Accident \n",
115+
"1 NaN NaN NaN NaN Accident \n",
116+
"2 NaN NaN NaN NaN Accident \n",
117+
"3 NaN NaN NaN NaN Accident \n",
118+
"4 NaN NaN NaN NaN Accident \n",
119+
"\n",
120+
" AmendedMannerofDeath DeathLoc \n",
121+
"0 NaN CT\\n(41.544654, -72.651713) \n",
122+
"1 NaN NEW HAVEN, CT\\n(41.308252, -72.924161) \n",
123+
"2 NaN NEW HAVEN, CT\\n(41.308252, -72.924161) \n",
124+
"3 NaN NEW MILFORD, CT\\n(41.576633, -73.408713) \n",
125+
"4 NaN ENFIELD, CT\\n(41.976501, -72.591985) \n",
126+
"\n",
127+
"[5 rows x 32 columns]\n"
128+
],
129+
"name": "stdout"
130+
}
131+
]
132+
}
133+
]
134+
}

0 commit comments

Comments
 (0)