Analysis - Academy of Py

As per the District and School Summary, students are doing better in reading rather than math. But the overall passing rate is not very satisfactory for all the schools. Going forward, more attention is required for getting better scores in Math to improve the overall scores and performance of the schools.
Top performing schools are mostly charter schools. There is a significant difference in overall passing rates of the charter and district schools - Overall passing rate : Charter -> 91% & District -> 71%
The overall performance of students is very good from the schools which have less budget (<585) as compared to the schools which are spending more on students.
The smaller and medium sized school's performance is much better than the large sized schools in Math. Performance in Math: Small and medium size -> 89% - 91% whereas large size -> 67%. This may be due to the reason that the student population in charter schools is less than the district schools.

# Dependencies
import pandas as pd
import numpy as np

# Load csv files
schools = 'Resources/schools_complete.csv'
students = 'Resources/students_complete.csv'

# Read school file with pandas
school_df = pd.read_csv(schools)

# Check columns of schools data 
school_df.columns

# Rename columns in school dataset
school = school_df.rename(columns={'name' : 'School','type' : 'Type','size' : 'Size','budget' : 'Budget'})
school.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School ID	School	Type	Size	Budget
0	0	Huang High School	District	2917	1910635
1	1	Figueroa High School	District	2949	1884411
2	2	Shelton High School	Charter	1761	1056600
3	3	Hernandez High School	District	4635	3022020
4	4	Griffin High School	Charter	1468	917500

# Read student file with pandas
student_df = pd.read_csv(students)

# Check columns of students data 
student_df.columns

# Rename columns in student dataset
student = student_df.rename(columns={'name' : 'Student','gender' : 'Gender','grade' : 'Grade',
                           'school' : 'School','reading_score' : 'Reading_Score', 'math_score' : 'Math_Score'})
student.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Student ID	Student	Gender	Grade	School	Reading_Score	Math_Score
0	0	Paul Bradley	M	9th	Huang High School	66	79
1	1	Victor Smith	M	12th	Huang High School	94	61
2	2	Kevin Rodriguez	M	12th	Huang High School	90	60
3	3	Dr. Richard Scott	M	12th	Huang High School	67	58
4	4	Bonnie Ray	F	9th	Huang High School	97	84

# Merge school and student dataframes
complete_data = pd.merge(school, student, on=('School'))
complete_data.head()

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School	Type	Size	Budget	Student ID	Student	Gender	Grade	Reading_Score	Math_Score
0	Huang High School	District	2917	1910635	0	Paul Bradley	M	9th	66	79
1	Huang High School	District	2917	1910635	1	Victor Smith	M	12th	94	61
2	Huang High School	District	2917	1910635	2	Kevin Rodriguez	M	12th	90	60
3	Huang High School	District	2917	1910635	3	Dr. Richard Scott	M	12th	67	58
4	Huang High School	District	2917	1910635	4	Bonnie Ray	F	9th	97	84

District Summary

# Find the total schools, students and budget from school dataframe
Total_Schools = complete_data["School"].nunique()
Total_Students = complete_data["Student ID"].nunique()
Total_Budget = school["Budget"].sum()

# Find average Math and Reading score
avg_math_score = complete_data["Math_Score"].mean()
avg_reading_score = complete_data["Reading_Score"].mean()

# Calculate the total count of passing Math, reading and overall passing count
count_passing_math = complete_data[complete_data["Math_Score"] > 70].count()["School"]
count_passing_reading = complete_data[complete_data["Reading_Score"] > 70].count()["School"]
overall_passing_count = complete_data[(complete_data["Math_Score"] > 70) & (complete_data["Reading_Score"] > 70)].count()["School"]

# Calculate Percentage Passing Math / Reading and overall passing both
percent_passing_math = (count_passing_math / Total_Students) * 100
percent_passing_reading = (count_passing_reading / Total_Students) * 100
percent_passing_both = (overall_passing_count / Total_Students) * 100

# District Summary Table
District_Summary = pd.DataFrame({"Total Schools":[Total_Schools], "Total Students":[Total_Students],
                                "Total Budget":[Total_Budget], "Average Math Score":[avg_math_score],
                                 "Average Reading Score":[avg_reading_score], "% Passing Math":percent_passing_math,
                                 "% Passing Reading":percent_passing_reading, "Overall Passing Rate": percent_passing_both})

District_Summary = District_Summary[["Total Schools", "Total Students", "Total Budget", "Average Math Score",
                                 "Average Reading Score", "% Passing Math", "% Passing Reading", "Overall Passing Rate"]]

District_Summary

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Total Schools	Total Students	Total Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
0	15	39170	24649428	78.985371	81.87784	72.392137	82.971662	60.801634

School Summary

# Determine the school type
school_types = school.set_index(["School"])["Type"]

# Calculate total student count per school
per_school_counts = complete_data["School"].value_counts()

# Calculate per school and per student budget
per_school_budget = complete_data.groupby(["School"]).mean()["Budget"]
per_student_budget = per_school_budget/ per_school_counts

# calculate average math and reading scores
avg_math_score = complete_data.groupby(["School"]).mean()["Math_Score"]
avg_reading_score = complete_data.groupby(["School"]).mean()["Reading_Score"]

# Count of schools passing Math and Reading
school_passing_math =  complete_data[complete_data["Math_Score"] > 70].groupby("School").count()["Student"]
school_passing_reading =  complete_data[complete_data["Reading_Score"] > 70].groupby("School").count()["Student"]

# Calculate percent passing Math / Reading and Overall passing rate
percent_passing_math = school_passing_math / per_school_counts * 100
percent_passing_reading = school_passing_reading / per_school_counts * 100
overall_passing_rate = (percent_passing_math + percent_passing_reading) / 2

# Convert to data frame
School_wise_Summary = pd.DataFrame({"School Type":school_types, "Total Students":per_school_counts,
                               "Total School Budget":per_school_budget, "Per Student Budget":per_student_budget,
                                "Average Math Score":avg_math_score, "Average Reading Score":avg_reading_score,
                                 "% Passing Math":percent_passing_math, "% Passing Reading":percent_passing_reading,
                                 "Overall Passing Rate":overall_passing_rate})            

# Data munging
School_wise_Summary = School_wise_Summary[["School Type","Total Students","Total School Budget","Per Student Budget",
                                "Average Math Score","Average Reading Score","% Passing Math","% Passing Reading",
                                 "Overall Passing Rate"]]

School_wise_Summary["Total School Budget"] = School_wise_Summary["Total School Budget"].map("${:,.2f}".format)
School_wise_Summary["Per Student Budget"] = School_wise_Summary["Per Student Budget"].map("${:,.2f}".format)

# Display the data frame
School_wise_Summary

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Bailey High School	District	4976	$3,124,928.00	$628.00	77.048432	81.033963	64.630225	79.300643	71.965434
Cabrera High School	Charter	1858	$1,081,356.00	$582.00	83.061895	83.975780	89.558665	93.864370	91.711518
Figueroa High School	District	2949	$1,884,411.00	$639.00	76.711767	81.158020	63.750424	78.433367	71.091896
Ford High School	District	2739	$1,763,916.00	$644.00	77.102592	80.746258	65.753925	77.510040	71.631982
Griffin High School	Charter	1468	$917,500.00	$625.00	83.351499	83.816757	89.713896	93.392371	91.553134
Hernandez High School	District	4635	$3,022,020.00	$652.00	77.289752	80.934412	64.746494	78.187702	71.467098
Holden High School	Charter	427	$248,087.00	$581.00	83.803279	83.814988	90.632319	92.740047	91.686183
Huang High School	District	2917	$1,910,635.00	$655.00	76.629414	81.182722	63.318478	78.813850	71.066164
Johnson High School	District	4761	$3,094,650.00	$650.00	77.072464	80.966394	63.852132	78.281874	71.067003
Pena High School	Charter	962	$585,858.00	$609.00	83.839917	84.044699	91.683992	92.203742	91.943867
Rodriguez High School	District	3999	$2,547,363.00	$637.00	76.842711	80.744686	64.066017	77.744436	70.905226
Shelton High School	Charter	1761	$1,056,600.00	$600.00	83.359455	83.725724	89.892107	92.617831	91.254969
Thomas High School	Charter	1635	$1,043,130.00	$638.00	83.418349	83.848930	90.214067	92.905199	91.559633
Wilson High School	Charter	2283	$1,319,574.00	$578.00	83.274201	83.989488	90.932983	93.254490	92.093736
Wright High School	Charter	1800	$1,049,400.00	$583.00	83.682222	83.955000	90.277778	93.444444	91.861111

Top Performing Schools (By Passing Rate)

# Top 5 performing schools(By passing rates)
Top_Schools = School_wise_Summary.sort_values(["Overall Passing Rate"], ascending = False)
Top_Schools.head(5)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Wilson High School	Charter	2283	$1,319,574.00	$578.00	83.274201	83.989488	90.932983	93.254490	92.093736
Pena High School	Charter	962	$585,858.00	$609.00	83.839917	84.044699	91.683992	92.203742	91.943867
Wright High School	Charter	1800	$1,049,400.00	$583.00	83.682222	83.955000	90.277778	93.444444	91.861111
Cabrera High School	Charter	1858	$1,081,356.00	$582.00	83.061895	83.975780	89.558665	93.864370	91.711518
Holden High School	Charter	427	$248,087.00	$581.00	83.803279	83.814988	90.632319	92.740047	91.686183

Bottom Performing Schools (By Passing Rate)

# Bottom 5 performing schools(By passing rates)
Bottom_Schools = School_wise_Summary.sort_values(["Overall Passing Rate"], ascending = True)
Bottom_Schools.head(5)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Rodriguez High School	District	3999	$2,547,363.00	$637.00	76.842711	80.744686	64.066017	77.744436	70.905226
Huang High School	District	2917	$1,910,635.00	$655.00	76.629414	81.182722	63.318478	78.813850	71.066164
Johnson High School	District	4761	$3,094,650.00	$650.00	77.072464	80.966394	63.852132	78.281874	71.067003
Figueroa High School	District	2949	$1,884,411.00	$639.00	76.711767	81.158020	63.750424	78.433367	71.091896
Hernandez High School	District	4635	$3,022,020.00	$652.00	77.289752	80.934412	64.746494	78.187702	71.467098

Math Scores By Grade

# Calculate math score by grade
nineth_grade_score =  complete_data[complete_data["Grade"] == "9th"].groupby("School").mean()["Math_Score"]
tenth_grade_score =  complete_data[complete_data["Grade"] == "10th"].groupby("School").mean()["Math_Score"]
eleventh_grade_score =  complete_data[complete_data["Grade"] == "11th"].groupby("School").mean()["Math_Score"]
twelveth_grade_score =  complete_data[complete_data["Grade"] == "12th"].groupby("School").mean()["Math_Score"]

# Math score by grade in table form
math_score_by_grade_df = pd.DataFrame({"9th":nineth_grade_score, "10th":tenth_grade_score,
                               "11th":eleventh_grade_score,"12th":twelveth_grade_score})            

# Data munging
Math_Score_by_Grade = math_score_by_grade_df[["9th", "10th", "11th", "12th"]]
Math_Score_by_Grade.index.name = None

# Display the data frame
Math_Score_by_Grade

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	9th	10th	11th	12th
Bailey High School	77.083676	76.996772	77.515588	76.492218
Cabrera High School	83.094697	83.154506	82.765560	83.277487
Figueroa High School	76.403037	76.539974	76.884344	77.151369
Ford High School	77.361345	77.672316	76.918058	76.179963
Griffin High School	82.044010	84.229064	83.842105	83.356164
Hernandez High School	77.438495	77.337408	77.136029	77.186567
Holden High School	83.787402	83.429825	85.000000	82.855422
Huang High School	77.027251	75.908735	76.446602	77.225641
Johnson High School	77.187857	76.691117	77.491653	76.863248
Pena High School	83.625455	83.372000	84.328125	84.121547
Rodriguez High School	76.859966	76.612500	76.395626	77.690748
Shelton High School	83.420755	82.917411	83.383495	83.778976
Thomas High School	83.590022	83.087886	83.498795	83.497041
Wilson High School	83.085578	83.724422	83.195326	83.035794
Wright High School	83.264706	84.010288	83.836782	83.644986

Reading Scores By Grade

# Reading score by grade
nineth_grade_score =  complete_data[complete_data["Grade"] == "9th"].groupby("School").mean()["Reading_Score"]
tenth_grade_score =  complete_data[complete_data["Grade"] == "10th"].groupby("School").mean()["Reading_Score"]
eleventh_grade_score =  complete_data[complete_data["Grade"] == "11th"].groupby("School").mean()["Reading_Score"]
twelveth_grade_score =  complete_data[complete_data["Grade"] == "12th"].groupby("School").mean()["Reading_Score"]

# Reading score by grade in table form
reading_score_by_grade_df = pd.DataFrame({"9th":nineth_grade_score, "10th":tenth_grade_score,
                               "11th":eleventh_grade_score,"12th":twelveth_grade_score})            

# Data munging
Reading_Score_by_Grade = reading_score_by_grade_df[["9th", "10th", "11th", "12th"]]
Reading_Score_by_Grade.index.name = None

# Display the data frame
Reading_Score_by_Grade

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	9th	10th	11th	12th
Bailey High School	81.303155	80.907183	80.945643	80.912451
Cabrera High School	83.676136	84.253219	83.788382	84.287958
Figueroa High School	81.198598	81.408912	80.640339	81.384863
Ford High School	80.632653	81.262712	80.403642	80.662338
Griffin High School	83.369193	83.706897	84.288089	84.013699
Hernandez High School	80.866860	80.660147	81.396140	80.857143
Holden High School	83.677165	83.324561	83.815534	84.698795
Huang High School	81.290284	81.512386	81.417476	80.305983
Johnson High School	81.260714	80.773431	80.616027	81.227564
Pena High School	83.807273	83.612000	84.335938	84.591160
Rodriguez High School	80.993127	80.629808	80.864811	80.376426
Shelton High School	84.122642	83.441964	84.373786	82.781671
Thomas High School	83.728850	84.254157	83.585542	83.831361
Wilson High School	83.939778	84.021452	83.764608	84.317673
Wright High School	83.833333	83.812757	84.156322	84.073171

Scores By School Spending

# Scores by school spending
spending_bins = [0, 585, 615, 645, 675]
spending_ranges = ["<585", "585-615", "615-645", "645-675"]

# Categorize spending based on spending bins
School_wise_Summary["Spending Ranges (Per Student)"] = pd.cut(per_student_budget, spending_bins, labels = spending_ranges)


spending_math_score = School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['Average Math Score']
Spending_reading_score = School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['Average Reading Score']
spending_passing_math =  School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['% Passing Math']
spending_passing_reading =  School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['% Passing Reading']
overall_passing_rate =  (spending_math_score + Spending_reading_score) / 2


# Scores by School Spending in table form
Spending_Score = pd.DataFrame({"Average Math Score":spending_math_score, "Average Reading Score":Spending_reading_score,
                               "% Passing Math":spending_passing_math,"% Passing Reading":spending_passing_reading,
                                    "Overall Passing Rate":overall_passing_rate}) 

# Data munging
Spending_Score = Spending_Score[["Average Math Score", "Average Reading Score",
                               "% Passing Math","% Passing Reading","Overall Passing Rate"]]

# Display the data frame
Spending_Score

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Spending Ranges (Per Student)
585-615	83.599686	83.885211	90.788049	92.410786	83.742449
615-645	79.079225	81.891436	73.021426	83.214343	80.485330
645-675	76.997210	81.027843	63.972368	78.427809	79.012526
<585	83.455399	83.933814	90.350436	93.325838	83.694607

Scores By School Size

# Scores by school size
size_bins = [0, 1000, 2000, 5000]
size_ranges = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]
School_wise_Summary["School Size"] = pd.cut(School_wise_Summary["Total Students"], size_bins, labels = size_ranges)

avg_math_score = School_wise_Summary.groupby(["School Size"]).mean()['Average Math Score']
avg_reading_score = School_wise_Summary.groupby(["School Size"]).mean()['Average Reading Score']
percent_passing_math =  School_wise_Summary.groupby(["School Size"]).mean()['% Passing Math']
percent_passing_reading =  School_wise_Summary.groupby(["School Size"]).mean()['% Passing Reading']
overall_passing_rate = School_wise_Summary.groupby(["School Size"]).mean()['Overall Passing Rate']


# Scores by School Size in table form
Size_Score = pd.DataFrame({"Average Math Score":avg_math_score, "Average Reading Score":avg_reading_score,
                               "% Passing Math":percent_passing_math,"% Passing Reading":percent_passing_reading,
                                    "Overall Passing Rate":overall_passing_rate})   

# Data munging
Size_Score = Size_Score[["Average Math Score", "Average Reading Score",
                               "% Passing Math","% Passing Reading","Overall Passing Rate"]]

# Display the data frame
Size_Score

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
School Size
Large (2000-5000)	77.746417	81.344493	67.631335	80.190800	73.911067
Medium (1000-2000)	83.374684	83.864438	89.931303	93.244843	91.588073
Small (<1000)	83.821598	83.929843	91.158155	92.471895	91.815025

Scores By School Type

avg_math_score = School_wise_Summary.groupby(["School Type"]).mean()['Average Math Score']
avg_reading_score = School_wise_Summary.groupby(["School Type"]).mean()['Average Reading Score']
percent_passing_math =  School_wise_Summary.groupby(["School Type"]).mean()['% Passing Math']
percent_passing_reading =  School_wise_Summary.groupby(["School Type"]).mean()['% Passing Reading']
overall_passing_rate = School_wise_Summary.groupby(["School Type"]).mean()['Overall Passing Rate']

# Scores by School Type in table form
Type_Score = pd.DataFrame({"Average Math Score":avg_math_score, "Average Reading Score":avg_reading_score,
                               "% Passing Math":percent_passing_math,"% Passing Reading":percent_passing_reading,
                                    "Overall Passing Rate":overall_passing_rate})            
# Data munging
Type_Score = Type_Score[["Average Math Score", "Average Reading Score",
                               "% Passing Math","% Passing Reading","Overall Passing Rate"]]
# Display the data frame
Type_Score

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
School Type
Charter	83.473852	83.896421	90.363226	93.052812	91.708019
District	76.956733	80.966636	64.302528	78.324559	71.313543

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
Academy of Py.ipynb		Academy of Py.ipynb
README.md		README.md
ridge_regression.ipynb		ridge_regression.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis - Academy of Py

District Summary

School Summary

Top Performing Schools (By Passing Rate)

Bottom Performing Schools (By Passing Rate)

Math Scores By Grade

Reading Scores By Grade

Scores By School Spending

Scores By School Size

Scores By School Type

About

Releases

Packages

Languages

ruchichandra/Academy-of-Py

Folders and files

Latest commit

History

Repository files navigation

Analysis - Academy of Py

District Summary

School Summary

Top Performing Schools (By Passing Rate)

Bottom Performing Schools (By Passing Rate)

Math Scores By Grade

Reading Scores By Grade

Scores By School Spending

Scores By School Size

Scores By School Type

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages