Skip to content

Analysis of school data (mock) utilizing the Pandas library.

Notifications You must be signed in to change notification settings

ruchichandra/Academy-of-Py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis - Academy of Py

  • As per the District and School Summary, students are doing better in reading rather than math. But the overall passing rate is not very satisfactory for all the schools. Going forward, more attention is required for getting better scores in Math to improve the overall scores and performance of the schools.

  • Top performing schools are mostly charter schools. There is a significant difference in overall passing rates of the charter and district schools - Overall passing rate : Charter -> 91% & District -> 71%

  • The overall performance of students is very good from the schools which have less budget (<585) as compared to the schools which are spending more on students.

  • The smaller and medium sized school's performance is much better than the large sized schools in Math. Performance in Math: Small and medium size -> 89% - 91% whereas large size -> 67%. This may be due to the reason that the student population in charter schools is less than the district schools.

# Dependencies
import pandas as pd
import numpy as np

# Load csv files
schools = 'Resources/schools_complete.csv'
students = 'Resources/students_complete.csv'
# Read school file with pandas
school_df = pd.read_csv(schools)

# Check columns of schools data 
school_df.columns

# Rename columns in school dataset
school = school_df.rename(columns={'name' : 'School','type' : 'Type','size' : 'Size','budget' : 'Budget'})
school.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School ID School Type Size Budget
0 0 Huang High School District 2917 1910635
1 1 Figueroa High School District 2949 1884411
2 2 Shelton High School Charter 1761 1056600
3 3 Hernandez High School District 4635 3022020
4 4 Griffin High School Charter 1468 917500
# Read student file with pandas
student_df = pd.read_csv(students)

# Check columns of students data 
student_df.columns

# Rename columns in student dataset
student = student_df.rename(columns={'name' : 'Student','gender' : 'Gender','grade' : 'Grade',
                           'school' : 'School','reading_score' : 'Reading_Score', 'math_score' : 'Math_Score'})
student.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Student ID Student Gender Grade School Reading_Score Math_Score
0 0 Paul Bradley M 9th Huang High School 66 79
1 1 Victor Smith M 12th Huang High School 94 61
2 2 Kevin Rodriguez M 12th Huang High School 90 60
3 3 Dr. Richard Scott M 12th Huang High School 67 58
4 4 Bonnie Ray F 9th Huang High School 97 84
# Merge school and student dataframes
complete_data = pd.merge(school, student, on=('School'))
complete_data.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School ID School Type Size Budget Student ID Student Gender Grade Reading_Score Math_Score
0 0 Huang High School District 2917 1910635 0 Paul Bradley M 9th 66 79
1 0 Huang High School District 2917 1910635 1 Victor Smith M 12th 94 61
2 0 Huang High School District 2917 1910635 2 Kevin Rodriguez M 12th 90 60
3 0 Huang High School District 2917 1910635 3 Dr. Richard Scott M 12th 67 58
4 0 Huang High School District 2917 1910635 4 Bonnie Ray F 9th 97 84

District Summary

# Find the total schools, students and budget from school dataframe
Total_Schools = complete_data["School"].nunique()
Total_Students = complete_data["Student ID"].nunique()
Total_Budget = school["Budget"].sum()

# Find average Math and Reading score
avg_math_score = complete_data["Math_Score"].mean()
avg_reading_score = complete_data["Reading_Score"].mean()

# Calculate the total count of passing Math, reading and overall passing count
count_passing_math = complete_data[complete_data["Math_Score"] > 70].count()["School"]
count_passing_reading = complete_data[complete_data["Reading_Score"] > 70].count()["School"]
overall_passing_count = complete_data[(complete_data["Math_Score"] > 70) & (complete_data["Reading_Score"] > 70)].count()["School"]

# Calculate Percentage Passing Math / Reading and overall passing both
percent_passing_math = (count_passing_math / Total_Students) * 100
percent_passing_reading = (count_passing_reading / Total_Students) * 100
percent_passing_both = (overall_passing_count / Total_Students) * 100

# District Summary Table
District_Summary = pd.DataFrame({"Total Schools":[Total_Schools], "Total Students":[Total_Students],
                                "Total Budget":[Total_Budget], "Average Math Score":[avg_math_score],
                                 "Average Reading Score":[avg_reading_score], "% Passing Math":percent_passing_math,
                                 "% Passing Reading":percent_passing_reading, "Overall Passing Rate": percent_passing_both})

District_Summary = District_Summary[["Total Schools", "Total Students", "Total Budget", "Average Math Score",
                                 "Average Reading Score", "% Passing Math", "% Passing Reading", "Overall Passing Rate"]]

District_Summary
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Total Schools Total Students Total Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
0 15 39170 24649428 78.985371 81.87784 72.392137 82.971662 60.801634

School Summary

# Determine the school type
school_types = school.set_index(["School"])["Type"]

# Calculate total student count per school
per_school_counts = complete_data["School"].value_counts()

# Calculate per school and per student budget
per_school_budget = complete_data.groupby(["School"]).mean()["Budget"]
per_student_budget = per_school_budget/ per_school_counts

# calculate average math and reading scores
avg_math_score = complete_data.groupby(["School"]).mean()["Math_Score"]
avg_reading_score = complete_data.groupby(["School"]).mean()["Reading_Score"]

# Count of schools passing Math and Reading
school_passing_math =  complete_data[complete_data["Math_Score"] > 70].groupby("School").count()["Student"]
school_passing_reading =  complete_data[complete_data["Reading_Score"] > 70].groupby("School").count()["Student"]

# Calculate percent passing Math / Reading and Overall passing rate
percent_passing_math = school_passing_math / per_school_counts * 100
percent_passing_reading = school_passing_reading / per_school_counts * 100
overall_passing_rate = (percent_passing_math + percent_passing_reading) / 2

# Convert to data frame
School_wise_Summary = pd.DataFrame({"School Type":school_types, "Total Students":per_school_counts,
                               "Total School Budget":per_school_budget, "Per Student Budget":per_student_budget,
                                "Average Math Score":avg_math_score, "Average Reading Score":avg_reading_score,
                                 "% Passing Math":percent_passing_math, "% Passing Reading":percent_passing_reading,
                                 "Overall Passing Rate":overall_passing_rate})            

# Data munging
School_wise_Summary = School_wise_Summary[["School Type","Total Students","Total School Budget","Per Student Budget",
                                "Average Math Score","Average Reading Score","% Passing Math","% Passing Reading",
                                 "Overall Passing Rate"]]

School_wise_Summary["Total School Budget"] = School_wise_Summary["Total School Budget"].map("${:,.2f}".format)
School_wise_Summary["Per Student Budget"] = School_wise_Summary["Per Student Budget"].map("${:,.2f}".format)

# Display the data frame
School_wise_Summary
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Bailey High School District 4976 $3,124,928.00 $628.00 77.048432 81.033963 64.630225 79.300643 71.965434
Cabrera High School Charter 1858 $1,081,356.00 $582.00 83.061895 83.975780 89.558665 93.864370 91.711518
Figueroa High School District 2949 $1,884,411.00 $639.00 76.711767 81.158020 63.750424 78.433367 71.091896
Ford High School District 2739 $1,763,916.00 $644.00 77.102592 80.746258 65.753925 77.510040 71.631982
Griffin High School Charter 1468 $917,500.00 $625.00 83.351499 83.816757 89.713896 93.392371 91.553134
Hernandez High School District 4635 $3,022,020.00 $652.00 77.289752 80.934412 64.746494 78.187702 71.467098
Holden High School Charter 427 $248,087.00 $581.00 83.803279 83.814988 90.632319 92.740047 91.686183
Huang High School District 2917 $1,910,635.00 $655.00 76.629414 81.182722 63.318478 78.813850 71.066164
Johnson High School District 4761 $3,094,650.00 $650.00 77.072464 80.966394 63.852132 78.281874 71.067003
Pena High School Charter 962 $585,858.00 $609.00 83.839917 84.044699 91.683992 92.203742 91.943867
Rodriguez High School District 3999 $2,547,363.00 $637.00 76.842711 80.744686 64.066017 77.744436 70.905226
Shelton High School Charter 1761 $1,056,600.00 $600.00 83.359455 83.725724 89.892107 92.617831 91.254969
Thomas High School Charter 1635 $1,043,130.00 $638.00 83.418349 83.848930 90.214067 92.905199 91.559633
Wilson High School Charter 2283 $1,319,574.00 $578.00 83.274201 83.989488 90.932983 93.254490 92.093736
Wright High School Charter 1800 $1,049,400.00 $583.00 83.682222 83.955000 90.277778 93.444444 91.861111

Top Performing Schools (By Passing Rate)

# Top 5 performing schools(By passing rates)
Top_Schools = School_wise_Summary.sort_values(["Overall Passing Rate"], ascending = False)
Top_Schools.head(5)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Wilson High School Charter 2283 $1,319,574.00 $578.00 83.274201 83.989488 90.932983 93.254490 92.093736
Pena High School Charter 962 $585,858.00 $609.00 83.839917 84.044699 91.683992 92.203742 91.943867
Wright High School Charter 1800 $1,049,400.00 $583.00 83.682222 83.955000 90.277778 93.444444 91.861111
Cabrera High School Charter 1858 $1,081,356.00 $582.00 83.061895 83.975780 89.558665 93.864370 91.711518
Holden High School Charter 427 $248,087.00 $581.00 83.803279 83.814988 90.632319 92.740047 91.686183

Bottom Performing Schools (By Passing Rate)

# Bottom 5 performing schools(By passing rates)
Bottom_Schools = School_wise_Summary.sort_values(["Overall Passing Rate"], ascending = True)
Bottom_Schools.head(5)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Rodriguez High School District 3999 $2,547,363.00 $637.00 76.842711 80.744686 64.066017 77.744436 70.905226
Huang High School District 2917 $1,910,635.00 $655.00 76.629414 81.182722 63.318478 78.813850 71.066164
Johnson High School District 4761 $3,094,650.00 $650.00 77.072464 80.966394 63.852132 78.281874 71.067003
Figueroa High School District 2949 $1,884,411.00 $639.00 76.711767 81.158020 63.750424 78.433367 71.091896
Hernandez High School District 4635 $3,022,020.00 $652.00 77.289752 80.934412 64.746494 78.187702 71.467098

Math Scores By Grade

# Calculate math score by grade
nineth_grade_score =  complete_data[complete_data["Grade"] == "9th"].groupby("School").mean()["Math_Score"]
tenth_grade_score =  complete_data[complete_data["Grade"] == "10th"].groupby("School").mean()["Math_Score"]
eleventh_grade_score =  complete_data[complete_data["Grade"] == "11th"].groupby("School").mean()["Math_Score"]
twelveth_grade_score =  complete_data[complete_data["Grade"] == "12th"].groupby("School").mean()["Math_Score"]

# Math score by grade in table form
math_score_by_grade_df = pd.DataFrame({"9th":nineth_grade_score, "10th":tenth_grade_score,
                               "11th":eleventh_grade_score,"12th":twelveth_grade_score})            

# Data munging
Math_Score_by_Grade = math_score_by_grade_df[["9th", "10th", "11th", "12th"]]
Math_Score_by_Grade.index.name = None

# Display the data frame
Math_Score_by_Grade
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
9th 10th 11th 12th
Bailey High School 77.083676 76.996772 77.515588 76.492218
Cabrera High School 83.094697 83.154506 82.765560 83.277487
Figueroa High School 76.403037 76.539974 76.884344 77.151369
Ford High School 77.361345 77.672316 76.918058 76.179963
Griffin High School 82.044010 84.229064 83.842105 83.356164
Hernandez High School 77.438495 77.337408 77.136029 77.186567
Holden High School 83.787402 83.429825 85.000000 82.855422
Huang High School 77.027251 75.908735 76.446602 77.225641
Johnson High School 77.187857 76.691117 77.491653 76.863248
Pena High School 83.625455 83.372000 84.328125 84.121547
Rodriguez High School 76.859966 76.612500 76.395626 77.690748
Shelton High School 83.420755 82.917411 83.383495 83.778976
Thomas High School 83.590022 83.087886 83.498795 83.497041
Wilson High School 83.085578 83.724422 83.195326 83.035794
Wright High School 83.264706 84.010288 83.836782 83.644986

Reading Scores By Grade

# Reading score by grade
nineth_grade_score =  complete_data[complete_data["Grade"] == "9th"].groupby("School").mean()["Reading_Score"]
tenth_grade_score =  complete_data[complete_data["Grade"] == "10th"].groupby("School").mean()["Reading_Score"]
eleventh_grade_score =  complete_data[complete_data["Grade"] == "11th"].groupby("School").mean()["Reading_Score"]
twelveth_grade_score =  complete_data[complete_data["Grade"] == "12th"].groupby("School").mean()["Reading_Score"]

# Reading score by grade in table form
reading_score_by_grade_df = pd.DataFrame({"9th":nineth_grade_score, "10th":tenth_grade_score,
                               "11th":eleventh_grade_score,"12th":twelveth_grade_score})            

# Data munging
Reading_Score_by_Grade = reading_score_by_grade_df[["9th", "10th", "11th", "12th"]]
Reading_Score_by_Grade.index.name = None

# Display the data frame
Reading_Score_by_Grade
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
9th 10th 11th 12th
Bailey High School 81.303155 80.907183 80.945643 80.912451
Cabrera High School 83.676136 84.253219 83.788382 84.287958
Figueroa High School 81.198598 81.408912 80.640339 81.384863
Ford High School 80.632653 81.262712 80.403642 80.662338
Griffin High School 83.369193 83.706897 84.288089 84.013699
Hernandez High School 80.866860 80.660147 81.396140 80.857143
Holden High School 83.677165 83.324561 83.815534 84.698795
Huang High School 81.290284 81.512386 81.417476 80.305983
Johnson High School 81.260714 80.773431 80.616027 81.227564
Pena High School 83.807273 83.612000 84.335938 84.591160
Rodriguez High School 80.993127 80.629808 80.864811 80.376426
Shelton High School 84.122642 83.441964 84.373786 82.781671
Thomas High School 83.728850 84.254157 83.585542 83.831361
Wilson High School 83.939778 84.021452 83.764608 84.317673
Wright High School 83.833333 83.812757 84.156322 84.073171

Scores By School Spending

# Scores by school spending
spending_bins = [0, 585, 615, 645, 675]
spending_ranges = ["<585", "585-615", "615-645", "645-675"]

# Categorize spending based on spending bins
School_wise_Summary["Spending Ranges (Per Student)"] = pd.cut(per_student_budget, spending_bins, labels = spending_ranges)


spending_math_score = School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['Average Math Score']
Spending_reading_score = School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['Average Reading Score']
spending_passing_math =  School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['% Passing Math']
spending_passing_reading =  School_wise_Summary.groupby(["Spending Ranges (Per Student)"]).mean()['% Passing Reading']
overall_passing_rate =  (spending_math_score + Spending_reading_score) / 2


# Scores by School Spending in table form
Spending_Score = pd.DataFrame({"Average Math Score":spending_math_score, "Average Reading Score":Spending_reading_score,
                               "% Passing Math":spending_passing_math,"% Passing Reading":spending_passing_reading,
                                    "Overall Passing Rate":overall_passing_rate}) 

# Data munging
Spending_Score = Spending_Score[["Average Math Score", "Average Reading Score",
                               "% Passing Math","% Passing Reading","Overall Passing Rate"]]

# Display the data frame
Spending_Score
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Spending Ranges (Per Student)
585-615 83.599686 83.885211 90.788049 92.410786 83.742449
615-645 79.079225 81.891436 73.021426 83.214343 80.485330
645-675 76.997210 81.027843 63.972368 78.427809 79.012526
<585 83.455399 83.933814 90.350436 93.325838 83.694607

Scores By School Size

# Scores by school size
size_bins = [0, 1000, 2000, 5000]
size_ranges = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]
School_wise_Summary["School Size"] = pd.cut(School_wise_Summary["Total Students"], size_bins, labels = size_ranges)

avg_math_score = School_wise_Summary.groupby(["School Size"]).mean()['Average Math Score']
avg_reading_score = School_wise_Summary.groupby(["School Size"]).mean()['Average Reading Score']
percent_passing_math =  School_wise_Summary.groupby(["School Size"]).mean()['% Passing Math']
percent_passing_reading =  School_wise_Summary.groupby(["School Size"]).mean()['% Passing Reading']
overall_passing_rate = School_wise_Summary.groupby(["School Size"]).mean()['Overall Passing Rate']


# Scores by School Size in table form
Size_Score = pd.DataFrame({"Average Math Score":avg_math_score, "Average Reading Score":avg_reading_score,
                               "% Passing Math":percent_passing_math,"% Passing Reading":percent_passing_reading,
                                    "Overall Passing Rate":overall_passing_rate})   

# Data munging
Size_Score = Size_Score[["Average Math Score", "Average Reading Score",
                               "% Passing Math","% Passing Reading","Overall Passing Rate"]]

# Display the data frame
Size_Score
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
School Size
Large (2000-5000) 77.746417 81.344493 67.631335 80.190800 73.911067
Medium (1000-2000) 83.374684 83.864438 89.931303 93.244843 91.588073
Small (<1000) 83.821598 83.929843 91.158155 92.471895 91.815025

Scores By School Type

avg_math_score = School_wise_Summary.groupby(["School Type"]).mean()['Average Math Score']
avg_reading_score = School_wise_Summary.groupby(["School Type"]).mean()['Average Reading Score']
percent_passing_math =  School_wise_Summary.groupby(["School Type"]).mean()['% Passing Math']
percent_passing_reading =  School_wise_Summary.groupby(["School Type"]).mean()['% Passing Reading']
overall_passing_rate = School_wise_Summary.groupby(["School Type"]).mean()['Overall Passing Rate']

# Scores by School Type in table form
Type_Score = pd.DataFrame({"Average Math Score":avg_math_score, "Average Reading Score":avg_reading_score,
                               "% Passing Math":percent_passing_math,"% Passing Reading":percent_passing_reading,
                                    "Overall Passing Rate":overall_passing_rate})            
# Data munging
Type_Score = Type_Score[["Average Math Score", "Average Reading Score",
                               "% Passing Math","% Passing Reading","Overall Passing Rate"]]
# Display the data frame
Type_Score
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
School Type
Charter 83.473852 83.896421 90.363226 93.052812 91.708019
District 76.956733 80.966636 64.302528 78.324559 71.313543

About

Analysis of school data (mock) utilizing the Pandas library.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published