Appointments Prediction¶

Introduction¶

This application performs a prediction on the number of weekly appointments at CDL Columbia University. To achieve the task, we need to perform some data cleaning and feature engineering in order to create relevant features that will help the LSTM model.

Documentation¶

Execute this notebook everytime you need to predict the next three months of weekly number of appointments.

If you are using Google Colab: Click on Runtime > Run All

Before running this notebook it's recommended to run the Appointments Api application, in order to update the 3 files that this application will use: data appointments, students data, and data appointments with custom values.

Please, remember that in the default values all the files should be saved in your local drive in a directory called CDL Api Files. If you manually changed the path of the directory, please adjust the following paths accordingly.

Once you run this notebook, wait for the Google Drive authentication. After that, you may switch to another tab, but don't close the tab before the runtime is over.

When the runtime is over, you will find the new predictions files in csv format in a new subfolder titled as today's date inside CDL Api Files/appointments_prediction_results.

Link Google Drive¶

In [ ]:
from google.colab import drive
import pandas as pd
drive.mount('/content/drive')
Mounted at /content/drive

Read Appointments Data¶

In [ ]:
data_appointments = pd.read_csv('/content/drive/MyDrive/CDL Api Files/data_appointments.csv').drop(['Unnamed: 0'], axis=1)
data_appointments.head()
Out[ ]:
Id_x AppointmentBlockId StartTime EndTime StudentId_x StudentName Topic PreferredLocationId PreferredLocationName Location ... Term Program Shortcut YOE Employment Status at Beginning of Program Desired Industry Graduation Term Attendance AppointmentId StudentId UNI
0 220007404103835 200006052546409 2023-05-15T13:30:00.000 2023-05-15T14:00:00.000 540016052247406 Liam Hiester NaN 1.000760e+13 Virtual NaN ... Summer APAN 0 - 3 Years Yes Finance, Consulting, Pharmaceuticals, Technology Spring 2023 No 220007404103835 540016052247406 llh2154
1 220007404103834 200006052546407 2023-05-08T13:30:00.000 2023-05-08T14:00:00.000 540016052247406 Liam Hiester NaN 1.000760e+13 Virtual NaN ... Spring APAN 0 - 3 Years Yes Finance, Consulting, Pharmaceuticals, Technology Spring 2023 No 220007404103834 540016052247406 llh2154
2 220007404104491 200006052487245 2023-05-02T16:00:00.000 2023-05-02T16:30:00.000 540016051804044 Zilu Zhang NaN 1.000760e+13 Virtual NaN ... Spring STRAT COMM 0 - 3 Years No Technology, Marketing Spring 2023 No 220007404104491 540016051804044 zz2635
3 220007404103087 200006052548044 2023-05-02T15:45:00.000 2023-05-02T16:15:00.000 540016052574956 Sofia Ruiz de la Concha NaN 1.000760e+13 Virtual NaN ... Spring ERM 3 - 5 Years No Other Fall 2024 No 220007404103087 540016052574956 sr4013
4 220007404103800 200006052546410 2023-05-01T13:30:00.000 2023-05-01T14:00:00.000 540016052247406 Liam Hiester NaN 1.000760e+13 Virtual NaN ... Spring APAN 0 - 3 Years Yes Finance, Consulting, Pharmaceuticals, Technology Spring 2023 No 220007404103800 540016052247406 llh2154

5 rows × 42 columns

Read Students Data¶

In [ ]:
students = pd.read_csv('/content/drive/MyDrive/CDL Api Files/students.csv').drop(['Unnamed: 0'], axis=1)
display(len(students))
list(students.columns)
13591
Out[ ]:
['RoleId',
 'StudentGroups',
 'FullName',
 'IsLgbtq',
 'SelfIdentifiedGenderDescription',
 'ParentEducationLevelId',
 'ParentEducationLevelName',
 'DoesParentHaveJD',
 'AbaGraduateId',
 'IsMultipleEnrollmentLinkedAccount',
 'OutcomeStatusInternship',
 'OutcomeStatusPostGraduation',
 'ReportingCategoryMbaCseaPostGraduation',
 'ReportingCategoryMbaCseaInternship',
 'Id',
 'FirstName',
 'MiddleName',
 'LastName',
 'EmailAddress',
 'GraduationYearId',
 'GraduationClass',
 'GraduationTerm',
 'StudentId',
 'IsAlumni',
 'IncludeInResumeBook',
 'PreferredEmailAddress',
 'JoinDate',
 'IsEnrolled',
 'LinkedInProfileUrl',
 'IsTransferStudent',
 'DeclineToStateIsTransferStudent',
 'HasPhoto',
 'AssignedAdvisor',
 'AssignedAdvisor2',
 'AssignedAdvisor3',
 'AssignedAdvisor4',
 'AssignedAdvisor5',
 'SubInfoDisplay',
 'CountryOfCitizenship1',
 'CountryOfCitizenship2',
 'CountryOfCitizenship',
 'DualCountryOfCitizenship',
 'PreferredConsolidatedIndustry',
 'PreferredConsolidatedIndustry2',
 'PreferredConsolidatedIndustry3',
 'PreferredConsolidatedIndustry4',
 'PreferredConsolidatedIndustry5',
 'PreferredConsolidatedJobFunction',
 'PreferredConsolidatedJobFunction2',
 'PreferredConsolidatedJobFunction3',
 'PreferredConsolidatedJobFunction4',
 'PreferredConsolidatedJobFunction5',
 'PreferredCountry',
 'PreferredCountry2',
 'PreferredCountry3',
 'PreferredCountry4',
 'PreferredCountry5',
 'PreferredCity',
 'PreferredCity2',
 'PreferredCity3',
 'PreferredCity4',
 'PreferredCity5',
 'College',
 'College2',
 'College3',
 'Program',
 'ConsolidatedMajor1',
 'ConsolidatedMajor2',
 'ConsolidatedMajor3',
 'ConsolidatedMajor4',
 'ConsolidatedMajor5',
 'DegreeLevel',
 'Degree1',
 'Degree2',
 'Degree3',
 'Ethnicity1',
 'Ethnicity2',
 'Placeability',
 'Engagement',
 'FirstLanguageSpoken',
 'SecondLanguageSpoken',
 'ThirdLanguageSpoken',
 'FirstLanguageWritten',
 'SecondLanguageWritten',
 'ThirdLanguageWritten',
 'LanguageSpoken1',
 'LanguageSpoken2',
 'LanguageSpoken3',
 'LanguageWritten1',
 'LanguageWritten2',
 'LanguageWritten3',
 'ConsolidatedWorkAuthorization',
 'CustomAttributeValues',
 'LastLoginDate',
 'Gender',
 'Gpa',
 'SchoolStartDate',
 'MilitaryBackground',
 'ShouldSyncToExternalCalendar',
 'ShouldSyncFromExternalCalendar',
 'HasDualCitizenship',
 'Phone1',
 'ProBonoPlacements']

Read Appointments Data with custom field¶

In [ ]:
appointments_report = pd.read_excel('/content/drive/MyDrive/CDL Api Files/appointment_report.xlsx')
display(len(appointments_report))
appointments_report.head()
/usr/local/lib/python3.10/dist-packages/openpyxl/styles/stylesheet.py:226: UserWarning: Workbook contains no default style, apply openpyxl's default
  warn("Workbook contains no default style, apply openpyxl's default")
6556
Out[ ]:
Student Email Address Career Adviser Date and Time Appointment Date First Name Last Name Email Address.1 Preferred Location Career Adviser.1 ... Phone Number Graduation Term Desired Industry Desired City Attendance Country of Citizenship Graduate GPA Employment Status at Beginning of Program YOE No Show
0 Jia Hu jh4390@columbia.edu Tiya McIver 04/25/2023, 3:00PM - 3:30PM EDT 04/25/2023, 3:00PM EDT Jia Hu jh4390@columbia.edu Tiya McIver Virtual Room: https://columbiasps.... Tiya McIver ... NaN Fall 2022 Finance New York - NY NaN China Mainland NaN No 0 - 3 Years NaN
1 Jia Hu jh4390@columbia.edu Tiya McIver 04/20/2023, 3:30PM - 4:00PM EDT 04/20/2023, 3:30PM EDT Jia Hu jh4390@columbia.edu Tiya McIver Virtual Room: https://columbiasps.... Tiya McIver ... NaN Fall 2022 Finance New York - NY NaN China Mainland NaN No 0 - 3 Years NaN
2 Philip Spiler pms2179@columbia.edu Dorlene Curwen 04/18/2023, 2:15PM - 2:45PM EDT 04/18/2023, 2:15PM EDT Philip Spiler pms2179@columbia.edu Dorlene Curwen Virtual Room: https://columbias... Dorlene Curwen ... NaN Spring 2026 Sports New York - NY NaN United States (USA) NaN Yes 3 - 5 Years NaN
3 Nan Chen nc3019@columbia.edu Tiya McIver 04/14/2023, 9:30AM - 10:00AM EDT 04/14/2023, 9:30AM EDT Nan Chen nc3019@columbia.edu Tiya McIver Virtual Room: https://columbiasps.... Tiya McIver ... NaN Fall 2024 Finance NaN NaN NaN NaN No 0 - 3 Years NaN
4 Rongrong Chen rc3533@columbia.edu Tiya McIver 04/14/2023, 9:00AM - 9:30AM EDT 04/14/2023, 9:00AM EDT Rongrong Chen rc3533@columbia.edu Tiya McIver Virtual Room: https://columbiasps.... Tiya McIver ... NaN Spring 2027 Technology New York - NY NaN China Mainland NaN No 0 - 3 Years NaN

5 rows × 24 columns

This code manipulates data related to student appointments and merging it with other student data. Here's a line-by-line breakdown of what each line is doing:

The first line selects a subset of columns from the "appointments_report" dataframe and drops any duplicates based on the "Student Id" column.

The second line merges the "students" dataframe with the modified "appointments_report" dataframe based on the "StudentId" and "Student Id" columns. It uses a left join and drops the "GraduationTerm" column from the resulting dataframe.

The third line drops any columns from the merged dataframe that contain all null values.

The fourth line displays the length of the merged dataframe.

The final line displays a list of the column names in the merged dataframe.

In [ ]:
appointments_report = appointments_report[['Student Id', 'YOE', 'Employment Status at Beginning of Program', 'Desired Industry', 'Graduation Term']].drop_duplicates(subset='Student Id')
students_merged = students.merge(appointments_report, left_on='StudentId', right_on='Student Id', how='left').drop('GraduationTerm', axis=1)
students_merged.dropna(axis=1, how='all')
display(len(students_merged))
list(students_merged.columns)
13591
Out[ ]:
['RoleId',
 'StudentGroups',
 'FullName',
 'IsLgbtq',
 'SelfIdentifiedGenderDescription',
 'ParentEducationLevelId',
 'ParentEducationLevelName',
 'DoesParentHaveJD',
 'AbaGraduateId',
 'IsMultipleEnrollmentLinkedAccount',
 'OutcomeStatusInternship',
 'OutcomeStatusPostGraduation',
 'ReportingCategoryMbaCseaPostGraduation',
 'ReportingCategoryMbaCseaInternship',
 'Id',
 'FirstName',
 'MiddleName',
 'LastName',
 'EmailAddress',
 'GraduationYearId',
 'GraduationClass',
 'StudentId',
 'IsAlumni',
 'IncludeInResumeBook',
 'PreferredEmailAddress',
 'JoinDate',
 'IsEnrolled',
 'LinkedInProfileUrl',
 'IsTransferStudent',
 'DeclineToStateIsTransferStudent',
 'HasPhoto',
 'AssignedAdvisor',
 'AssignedAdvisor2',
 'AssignedAdvisor3',
 'AssignedAdvisor4',
 'AssignedAdvisor5',
 'SubInfoDisplay',
 'CountryOfCitizenship1',
 'CountryOfCitizenship2',
 'CountryOfCitizenship',
 'DualCountryOfCitizenship',
 'PreferredConsolidatedIndustry',
 'PreferredConsolidatedIndustry2',
 'PreferredConsolidatedIndustry3',
 'PreferredConsolidatedIndustry4',
 'PreferredConsolidatedIndustry5',
 'PreferredConsolidatedJobFunction',
 'PreferredConsolidatedJobFunction2',
 'PreferredConsolidatedJobFunction3',
 'PreferredConsolidatedJobFunction4',
 'PreferredConsolidatedJobFunction5',
 'PreferredCountry',
 'PreferredCountry2',
 'PreferredCountry3',
 'PreferredCountry4',
 'PreferredCountry5',
 'PreferredCity',
 'PreferredCity2',
 'PreferredCity3',
 'PreferredCity4',
 'PreferredCity5',
 'College',
 'College2',
 'College3',
 'Program',
 'ConsolidatedMajor1',
 'ConsolidatedMajor2',
 'ConsolidatedMajor3',
 'ConsolidatedMajor4',
 'ConsolidatedMajor5',
 'DegreeLevel',
 'Degree1',
 'Degree2',
 'Degree3',
 'Ethnicity1',
 'Ethnicity2',
 'Placeability',
 'Engagement',
 'FirstLanguageSpoken',
 'SecondLanguageSpoken',
 'ThirdLanguageSpoken',
 'FirstLanguageWritten',
 'SecondLanguageWritten',
 'ThirdLanguageWritten',
 'LanguageSpoken1',
 'LanguageSpoken2',
 'LanguageSpoken3',
 'LanguageWritten1',
 'LanguageWritten2',
 'LanguageWritten3',
 'ConsolidatedWorkAuthorization',
 'CustomAttributeValues',
 'LastLoginDate',
 'Gender',
 'Gpa',
 'SchoolStartDate',
 'MilitaryBackground',
 'ShouldSyncToExternalCalendar',
 'ShouldSyncFromExternalCalendar',
 'HasDualCitizenship',
 'Phone1',
 'ProBonoPlacements',
 'Student Id',
 'YOE',
 'Employment Status at Beginning of Program',
 'Desired Industry',
 'Graduation Term']

Map graduation date to numeric values¶

The first line groups the "students_merged" dataframe by "Graduation Term" and counts the number of unique "Student Ids" in each group. It then creates a new dataframe called "students_term" with the "Graduation Term" and "grad_term_count" columns.

The second line creates a Python dictionary called "term_map" that maps each string value of "Graduation Term" to a corresponding integer value.

The third line replaces the string values in the "Graduation Term" column of the "students_term" dataframe with their corresponding integer values using the "term_map" dictionary.

The fourth line replaces the string values in the "Graduation Term" column of the "data_appointments" dataframe with their corresponding integer values using the "term_map" dictionary.

The fifth line fills any missing values in the "Graduation Term" column of the "data_appointments" dataframe using the "ffill" method, which propagates the last non-null value forward to the missing values.

The sixth line prints the unique values in the "Graduation Term" column of the "data_appointments" dataframe.

In [ ]:
students_term = students_merged.groupby('Graduation Term')['Student Id'].count().reset_index(name='grad_term_count')

term_map = {
    'Spring 2009': 1, 'Spring 2011': 2, 'Spring 2012': 3, 
    'Fall 2013': 4, 'Spring 2014': 5, 'Spring 2016': 6,
    'Fall 2016': 7, 'Spring 2017': 8, 'Spring 2018': 9, 'Summer 2018': 10,
    'Spring 2019': 11, 'Summer 2019': 12, 'Fall 2019': 13,
    'Spring 2020': 14, 'Summer 2020': 15, 'Fall 2020': 16,
    'Spring 2021': 17, 'Summer 2021': 18, 'Fall 2021': 19,
    'Spring 2022': 20, 'Summer 2022': 21, 'Fall 2022': 22,
    'Spring 2023': 23, 'Summer 2023': 24, 'Fall 2023': 25,
    'Spring 2024': 26, 'Summer 2024': 27, 'Fall 2024': 28,
    'Spring 2025': 29, 'Summer 2025': 30, 'Fall 2025': 31,
    'Spring 2026': 32, 'Summer 2026': 33, 'Fall 2026': 34,
    'Spring 2027': 35, 'Summer 2027': 36, 'Fall 2027': 37
}

students_term['Graduation Term'] = students_term['Graduation Term'].replace(term_map)
data_appointments['Graduation Term'] = data_appointments['Graduation Term'].replace(term_map)
data_appointments['Graduation Term'] = data_appointments['Graduation Term'].fillna(method='ffill')
data_appointments['Graduation Term'].unique()

students_term.head()
Out[ ]:
Graduation Term grad_term_count
0 4 1
1 7 1
2 13 5
3 16 32
4 19 135

The first line merges the "students_merged" dataframe with a subset of columns from the "data_appointments" dataframe (specifically, the "UNI" and "Program Shortcut" columns). It performs a left join based on the "StudentId" and "UNI" columns.

The second line groups the resulting "students_merged" dataframe by both "Graduation Term" and "Program Shortcut", counts the number of unique "Student Ids" in each group, and creates a new dataframe called "students_programs" with "Graduation Term", "Program Shortcut", and "Student Id" columns.

The third line pivots the "students_programs" dataframe, with "Graduation Term" as the index, "Program Shortcut" as the columns, and "Student Id" as the values. It uses the "sum" aggregation function to combine any duplicate values.

The fourth line converts the resulting pivoted dataframe back to a regular dataframe, with "Graduation Term", program names as columns, and student count as values.

The fifth line replaces the string values in the "Graduation Term" column of the "students_programs" dataframe with their corresponding integer values using the "term_map" dictionary.

The sixth line fills any missing values in the "students_programs" dataframe with zero.

In [ ]:
students_merged = students_merged.merge(data_appointments[['UNI', 'Program Shortcut']], left_on='StudentId', right_on='UNI', how='left')
students_programs = students_merged.groupby(['Graduation Term', 'Program Shortcut'])['Student Id'].count().reset_index()
students_programs = pd.pivot_table(students_programs, values='Student Id', index='Graduation Term', columns='Program Shortcut', aggfunc='sum')
students_programs = pd.DataFrame(students_programs).reset_index()
students_programs['Graduation Term'] = students_programs['Graduation Term'].replace(term_map)
students_programs.fillna(0, inplace=True)
students_programs
Out[ ]:
Program Shortcut Graduation Term ACTU APAN BIET CNAD ERM HCM IKNS INSURANCE NECR NMED NON-DEGREE NOPM SPORT STRAT COMM SUMA SUSC TEMT WEALTH
0 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
1 7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 13 0.0 3.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 1.0 0.0
3 16 0.0 11.0 2.0 1.0 31.0 0.0 0.0 0.0 3.0 0.0 0.0 7.0 0.0 15.0 1.0 0.0 25.0 0.0
4 19 7.0 196.0 0.0 3.0 43.0 7.0 6.0 0.0 13.0 10.0 0.0 8.0 10.0 54.0 20.0 0.0 79.0 3.0
5 22 41.0 1034.0 7.0 39.0 423.0 3.0 6.0 6.0 22.0 0.0 2.0 23.0 97.0 169.0 104.0 16.0 89.0 5.0
6 25 0.0 7.0 1.0 3.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 9.0 4.0 0.0 0.0 0.0
7 28 9.0 605.0 8.0 14.0 319.0 23.0 8.0 0.0 21.0 3.0 0.0 25.0 50.0 74.0 160.0 2.0 99.0 1.0
8 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0
9 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
11 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0
12 6 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 2.0 2.0 0.0 0.0 3.0 0.0 0.0 4.0 0.0
13 8 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 1.0 0.0 0.0 1.0 4.0 1.0 2.0 0.0 0.0 0.0
14 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 6.0 0.0 12.0 1.0 0.0 1.0 0.0
15 11 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0 1.0 0.0 7.0 2.0 2.0 1.0 0.0 4.0 0.0
16 14 3.0 6.0 0.0 1.0 2.0 0.0 2.0 0.0 2.0 0.0 0.0 2.0 4.0 1.0 5.0 0.0 7.0 0.0
17 17 0.0 10.0 2.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 30.0 0.0 2.0 0.0
18 20 14.0 148.0 7.0 31.0 141.0 2.0 9.0 0.0 18.0 8.0 2.0 4.0 12.0 3.0 67.0 3.0 2.0 0.0
19 23 17.0 222.0 12.0 5.0 42.0 14.0 26.0 0.0 30.0 4.0 24.0 12.0 53.0 28.0 113.0 6.0 26.0 6.0
20 26 0.0 35.0 0.0 5.0 4.0 1.0 5.0 0.0 0.0 0.0 0.0 0.0 5.0 9.0 30.0 0.0 5.0 0.0
21 29 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 26.0 0.0 0.0 0.0
22 32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 6.0 0.0 0.0 0.0
23 35 0.0 30.0 0.0 3.0 11.0 1.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 2.0 0.0
24 10 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
25 12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0
26 15 0.0 2.0 0.0 2.0 0.0 8.0 0.0 0.0 1.0 1.0 0.0 3.0 4.0 12.0 0.0 0.0 1.0 0.0
27 18 0.0 28.0 1.0 1.0 15.0 6.0 0.0 0.0 2.0 0.0 0.0 20.0 0.0 18.0 0.0 3.0 17.0 0.0
28 21 0.0 129.0 37.0 3.0 81.0 55.0 0.0 0.0 5.0 12.0 0.0 94.0 23.0 27.0 48.0 1.0 162.0 0.0
29 24 0.0 29.0 5.0 2.0 2.0 9.0 1.0 0.0 0.0 0.0 4.0 71.0 0.0 8.0 23.0 0.0 3.0 0.0
30 27 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0
31 33 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0

The first line groups the "students_merged" dataframe by both "Graduation Term" and "YOE", counts the number of unique "Student Ids" in each group, and creates a new dataframe called "students_yoe" with "Graduation Term", "YOE", and "Student Id" columns.

The second line pivots the "students_yoe" dataframe, with "Graduation Term" as the index, "YOE" as the columns, and "Student Id" as the values. It uses the "sum" aggregation function to combine any duplicate values.

The third line converts the resulting pivoted dataframe back to a regular dataframe, with "Graduation Term", "YOE" values as columns, and student count as values.

The fourth line fills any missing values in the "students_yoe" dataframe with zero.

The fifth line replaces the string values in the "Graduation Term" column of the "students_yoe" dataframe with their corresponding integer values using the "term_map" dictionary.

In [ ]:
students_yoe = students_merged.groupby(['Graduation Term', 'YOE'])['Student Id'].count().reset_index()
students_yoe = pd.pivot_table(students_yoe, values='Student Id', index='Graduation Term', columns='YOE', aggfunc='sum')
students_yoe = pd.DataFrame(students_yoe).reset_index()
students_yoe.fillna(0, inplace=True)
students_yoe['Graduation Term'] = students_yoe['Graduation Term'].replace(term_map)
students_yoe
Out[ ]:
YOE Graduation Term 0 - 3 Years 3 - 5 Years 5+ Years
0 4 0.0 0.0 1.0
1 7 0.0 1.0 0.0
2 13 7.0 0.0 0.0
3 16 50.0 14.0 24.0
4 19 218.0 114.0 113.0
5 22 1604.0 235.0 236.0
6 25 41.0 3.0 2.0
7 28 1123.0 164.0 134.0
8 1 0.0 0.0 3.0
9 2 0.0 0.0 1.0
10 3 0.0 0.0 1.0
11 5 0.0 0.0 3.0
12 6 0.0 1.0 8.0
13 8 2.0 0.0 10.0
14 9 0.0 3.0 19.0
15 11 1.0 7.0 17.0
16 14 6.0 14.0 10.0
17 17 8.0 17.0 10.0
18 20 274.0 109.0 79.0
19 23 340.0 114.0 184.0
20 26 30.0 13.0 56.0
21 29 11.0 2.0 13.0
22 32 13.0 4.0 2.0
23 35 47.0 4.0 1.0
24 12 0.0 1.0 1.0
25 15 3.0 7.0 20.0
26 18 38.0 26.0 30.0
27 21 375.0 124.0 168.0
28 24 76.0 32.0 40.0
29 27 3.0 5.0 3.0
30 33 5.0 0.0 0.0

This code groups the "data_appointments" dataframe by the "Date" column, counts the number of unique "StudentName" values in each group, and creates a new Series called "appointments_time" with the "Date" values as the index and the count of "StudentName" values as the values.

The second line converts this Series to a pandas DataFrame object, with "Date" and "StudentName" columns. The "reset_index()" method is called to reset the index and create a new integer index, so that the "Date" values become a regular column.

In [ ]:
appointments_time = data_appointments.groupby('Date')['StudentName'].count()
appointments_time = pd.DataFrame(appointments_time).reset_index()
appointments_time
Out[ ]:
Date StudentName
0 2021-04-29 1
1 2021-05-03 6
2 2021-05-04 3
3 2021-05-05 6
4 2021-05-06 6
... ... ...
505 2023-04-26 1
506 2023-05-01 1
507 2023-05-02 2
508 2023-05-08 1
509 2023-05-15 1

510 rows × 2 columns

This code creates a list of dates between "2021-04-29" and the current date, excluding the last 7 days. It then creates a pandas DataFrame object called "dates" with a single column named "Date" that contains the dates in the list.

The "data_appointments" DataFrame is modified by converting the "Date" column to a datetime format using the "pd.to_datetime()" method. Similarly, the "appointments_time" DataFrame's "Date" column is converted to a datetime format.

The "dates" DataFrame is then merged with the "appointments_time" DataFrame on the "Date" column, and with the "data_appointments" DataFrame on the "Date" column and "Term" column. The "how='left'" parameter is specified in both cases to keep all dates in the "dates" DataFrame, even if there is no corresponding data in the other DataFrames.

The "fillna()" method is called to replace missing values in the "Term" column with the previous value (using the "ffill" method), and missing values in the "StudentName" column with 0.

Finally, the "drop_duplicates()" method is called to remove any duplicate rows in the "dates" DataFrame. The resulting "dates" DataFrame can be used for further analysis or visualization.

In [ ]:
from datetime import date
dates = pd.date_range(start='2021-04-29', end=date.today(), freq='D')[:-7]
dates = pd.DataFrame({'Date': pd.to_datetime(dates)})
data_appointments['Date'] = pd.to_datetime(data_appointments['Date'])
appointments_time['Date'] = pd.to_datetime(appointments_time['Date'])
dates = dates.merge(appointments_time, on='Date', how='left')
dates = dates.merge(data_appointments[['Date', 'Term']], on='Date', how='left')
dates['Term'] = dates['Term'].fillna(method='ffill')
dates['StudentName'] = dates['StudentName'].fillna(0)
dates.drop_duplicates(inplace=True)
dates.tail()
Out[ ]:
Date StudentName Term
6881 2023-04-22 0.0 Spring
6882 2023-04-23 0.0 Spring
6883 2023-04-24 6.0 Spring
6889 2023-04-25 4.0 Spring
6893 2023-04-26 1.0 Spring

The above code merges the data_appointments dataframe with students_term, students_programs, and students_yoe dataframes based on the Graduation Term column.

The first line of code merges data_appointments and students_term dataframes. The resulting dataframe contains all columns from both dataframes and is merged on the Graduation Term column using a left join.

The second line of code merges the resulting dataframe from the first merge with students_programs dataframe. The resulting dataframe contains all columns from both dataframes and is merged on the Graduation Term column using a left join.

The third line of code merges the resulting dataframe from the second merge with students_yoe dataframe. The resulting dataframe contains all columns from both dataframes and is merged on the Graduation Term column using a left join.

In [ ]:
appts_df = data_appointments.merge(students_term, on='Graduation Term', how='left')
appts_df = appts_df.merge(students_programs, on='Graduation Term', how='left')
appts_df = appts_df.merge(students_yoe, on='Graduation Term', how='left')
display(len(appts_df))
6677

This code merges two dataframes dates and appts_df on the common column Date using a left join, and assigns the resulting merged dataframe to appts. The display(len(appts)) statement is then used to display the number of rows in the merged dataframe.

In [ ]:
appts = dates.merge(appts_df, on='Date', how='left')
display(len(appts))
appts.tail()
6894
Out[ ]:
Date StudentName_x Term_x Id_x AppointmentBlockId StartTime EndTime StudentId_x StudentName_y Topic ... NOPM SPORT STRAT COMM SUMA SUSC TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years
6889 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T19:00:00.000 2023-04-25T19:30:00.000 5.400161e+14 Jia Hu NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6890 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T16:00:00.000 2023-04-25T16:30:00.000 5.400161e+14 Zilu Zhang NaN ... 12.0 53.0 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0
6891 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T14:00:00.000 2023-04-25T14:30:00.000 5.400161e+14 Catalina Ramos NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6892 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T14:00:00.000 2023-04-25T14:30:00.000 5.400161e+14 Cheng Ye NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6893 2023-04-26 1.0 Spring 2.200074e+14 2.000061e+14 2023-04-26T18:15:00.000 2023-04-26T18:45:00.000 5.400161e+14 Ran Wang NaN ... 12.0 53.0 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0

5 rows × 66 columns

In [ ]:
dates.tail()
Out[ ]:
Date StudentName Term
6881 2023-04-22 0.0 Spring
6882 2023-04-23 0.0 Spring
6883 2023-04-24 6.0 Spring
6889 2023-04-25 4.0 Spring
6893 2023-04-26 1.0 Spring
In [ ]:
appts.tail()
Out[ ]:
Date StudentName_x Term_x Id_x AppointmentBlockId StartTime EndTime StudentId_x StudentName_y Topic ... NOPM SPORT STRAT COMM SUMA SUSC TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years
6889 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T19:00:00.000 2023-04-25T19:30:00.000 5.400161e+14 Jia Hu NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6890 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T16:00:00.000 2023-04-25T16:30:00.000 5.400161e+14 Zilu Zhang NaN ... 12.0 53.0 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0
6891 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T14:00:00.000 2023-04-25T14:30:00.000 5.400161e+14 Catalina Ramos NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6892 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T14:00:00.000 2023-04-25T14:30:00.000 5.400161e+14 Cheng Ye NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6893 2023-04-26 1.0 Spring 2.200074e+14 2.000061e+14 2023-04-26T18:15:00.000 2023-04-26T18:45:00.000 5.400161e+14 Ran Wang NaN ... 12.0 53.0 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0

5 rows × 66 columns

Drop duplicates¶

In [ ]:
appts.drop_duplicates(subset=['Date'], inplace=True)
appts.tail()
Out[ ]:
Date StudentName_x Term_x Id_x AppointmentBlockId StartTime EndTime StudentId_x StudentName_y Topic ... NOPM SPORT STRAT COMM SUMA SUSC TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years
6881 2023-04-22 0.0 Spring NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6882 2023-04-23 0.0 Spring NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6883 2023-04-24 6.0 Spring 2.200074e+14 2.000061e+14 2023-04-24T20:00:00.000 2023-04-24T20:30:00.000 5.400161e+14 Yuchong Liu NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6889 2023-04-25 4.0 Spring 2.200074e+14 2.000061e+14 2023-04-25T19:00:00.000 2023-04-25T19:30:00.000 5.400161e+14 Jia Hu NaN ... 23.0 97.0 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0
6893 2023-04-26 1.0 Spring 2.200074e+14 2.000061e+14 2023-04-26T18:15:00.000 2023-04-26T18:45:00.000 5.400161e+14 Ran Wang NaN ... 12.0 53.0 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0

5 rows × 66 columns

The first line selects specific columns from the appts DataFrame and reorders them. These columns are Date, StudentName_x, Graduation Term, Term_y, and several program and year-of-experience columns.

The second line fills in missing values in the DataFrame using the previous value in the same column.

The third line creates dummy variables for the Term_y column and drops the first dummy variable.

The fourth line merges the updated appts DataFrame with the dates DataFrame on the Date column, with missing values filled in using the previous value in the same column.

In [ ]:
appts = appts[['Date', 'StudentName_x', 'Graduation Term', 'Term_y',
                      'grad_term_count', 'ACTU', 'APAN', 'BIET', 'CNAD', 'ERM', 'HCM', 'IKNS',
                      'INSURANCE', 'NECR', 'NMED', 'NON-DEGREE', 'NOPM', 'SPORT',
                      'STRAT COMM', 'SUMA', 'SUSC', 'TEMT', 'WEALTH', '0 - 3 Years',
                      '3 - 5 Years', '5+ Years']]

appts = appts.fillna(method='ffill')
appts = pd.get_dummies(appts, columns=['Term_y'], drop_first=True)
appts = dates.merge(appts, on='Date', how='left')

appts.tail()
Out[ ]:
Date StudentName Term StudentName_x Graduation Term grad_term_count ACTU APAN BIET CNAD ... STRAT COMM SUMA SUSC TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years Term_y_Spring Term_y_Summer
723 2023-04-22 0.0 Spring 0.0 23.0 221.0 17.0 222.0 12.0 5.0 ... 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0 1 0
724 2023-04-23 0.0 Spring 0.0 23.0 221.0 17.0 222.0 12.0 5.0 ... 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0 1 0
725 2023-04-24 6.0 Spring 6.0 22.0 567.0 41.0 1034.0 7.0 39.0 ... 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0 1 0
726 2023-04-25 4.0 Spring 4.0 22.0 567.0 41.0 1034.0 7.0 39.0 ... 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0 1 0
727 2023-04-26 1.0 Spring 1.0 23.0 221.0 17.0 222.0 12.0 5.0 ... 28.0 113.0 6.0 26.0 6.0 340.0 114.0 184.0 1 0

5 rows × 29 columns

This is a Python function conv_to_utc that takes a Pandas Series (or column) of datetime values as input, and converts them to the number of days since the Unix epoch (January 1, 1970 at 00:00:00 UTC).

The function first converts the datetime values to string format using the strftime method and the %Y-%m-%d format code, which gives the date in the format "YYYY-MM-DD". It then uses datetime.strptime to convert the date string to a datetime object.

Next, the function calculates the number of days between the datetime object and the Unix epoch, which is defined as January 1, 1970 at 00:00:00 UTC. To do this, it first creates a new datetime object for the epoch, and then calculates the difference between the input datetime object and the epoch using the - operator. This gives a timedelta object, which represents the difference between two datetime objects as a duration.

Finally, the function extracts the number of days from the timedelta object using the days attribute, and returns it as the output.

In [ ]:
from datetime import datetime, timedelta
def conv_to_utc(col):
  date_str = col.strftime('%Y-%m-%d')
  date_obj = datetime.strptime(date_str, '%Y-%m-%d')
  epoch = datetime.utcfromtimestamp(0)
  delta = date_obj - epoch
  days_since_epoch = delta.days
  return days_since_epoch

#appts['Date'] = appts['Date'].apply(conv_to_utc)
#appts['Date']

his is a function called conv_from_utc that takes in the number of days since the Unix epoch (January 1, 1970) as an argument, and returns a date string in the format of "YYYY-MM-DD".

The function first converts the Unix epoch into a datetime object called epoch. Then it creates a timedelta object representing the number of days since the epoch that was passed in as an argument. It adds this timedelta object to epoch to get a new datetime object representing the desired date.

It then converts this datetime object into a Unix timestamp, and converts that timestamp back into a datetime object using the datetime.fromtimestamp() method. Finally, it formats this datetime object as a string in the desired format using the strftime() method, and returns that string.

In [ ]:
def conv_from_utc(days_since_epoch):
    epoch = datetime.utcfromtimestamp(0)
    delta = timedelta(days=days_since_epoch)
    date_obj = (epoch + delta).timestamp()
    date_obj = datetime.fromtimestamp(date_obj)
    date_str = date_obj.strftime('%Y-%m-%d')
    return date_str

This code creates a new dataframe called appts_model by resampling the appts dataframe by 7 days and summing the values for each column. The on parameter specifies that the resampling is to be done based on the Date column. The resulting appts_model dataframe will have a new index with dates resampled to 7-day intervals.

The code then modifies the Term_y_Spring and Term_y_Summer columns by applying a lambda function to each value. The lambda function checks if the value is greater than 0 and returns 1 if true and 0 otherwise. This effectively converts the columns to binary indicators.

In [ ]:
appts_model = pd.DataFrame(appts)
appts_model = appts_model.resample('7D', on='Date').sum().reset_index()
appts_model['Term_y_Spring'] = appts_model['Term_y_Spring'].apply(lambda val: 1 if val > 0 else 0)
appts_model['Term_y_Summer'] = appts_model['Term_y_Summer'].apply(lambda val: 1 if val > 0 else 0)
#appts_model['Date'] = appts_model['Date'].apply(conv_to_utc)
appts_model.tail()
<ipython-input-19-6b9419386967>:2: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  appts_model = appts_model.resample('7D', on='Date').sum().reset_index()
Out[ ]:
Date StudentName StudentName_x Graduation Term grad_term_count ACTU APAN BIET CNAD ERM ... STRAT COMM SUMA SUSC TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years Term_y_Spring Term_y_Summer
99 2023-03-23 52.0 52.0 180.0 3486.0 86.0 4087.0 48.0 110.0 2033.0 ... 557.0 904.0 29.0 601.0 10.0 7257.0 1081.0 936.0 1 0
100 2023-03-30 76.0 76.0 165.0 2730.0 166.0 4158.0 54.0 144.0 1692.0 ... 646.0 702.0 62.0 418.0 28.0 6656.0 1100.0 1212.0 1 0
101 2023-04-06 32.0 32.0 162.0 3185.0 148.0 4637.0 74.0 151.0 2031.0 ... 736.0 700.0 53.0 706.0 20.0 7651.0 1271.0 1257.0 1 0
102 2023-04-13 51.0 51.0 185.0 1474.0 60.0 1430.0 81.0 35.0 537.0 ... 185.0 553.0 22.0 341.0 19.0 2578.0 638.0 857.0 1 0
103 2023-04-20 28.0 28.0 164.0 2593.0 159.0 3561.0 70.0 112.0 1333.0 ... 524.0 820.0 58.0 381.0 35.0 5691.0 1090.0 1342.0 1 0

5 rows × 28 columns

The above code defines three functions extract_year, extract_month, and extract_day that are used to extract year, month, and day from the Date column of the appts_model DataFrame.

Then, these functions are applied to the Date column using the apply() method to create three new columns: Year, Month, and Day. The Year, Month, and Day columns contain the respective values extracted from the Date column.

In [ ]:
def extract_year(col):
  year = col.year
  return int(year)

def extract_month(col):
  month = col.month
  return int(month)

def extract_day(col):
  day = col.day
  return int(day)

appts_model['Year'] = appts_model['Date'].apply(extract_year)
appts_model['Month'] = appts_model['Date'].apply(extract_month)
appts_model['Day'] = appts_model['Date'].apply(extract_day)
appts_model.head()
Out[ ]:
Date StudentName StudentName_x Graduation Term grad_term_count ACTU APAN BIET CNAD ERM ... TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years Term_y_Spring Term_y_Summer Year Month Day
0 2021-04-29 16.0 16.0 116.0 227.0 14.0 193.0 15.0 33.0 154.0 ... 12.0 0.0 316.0 184.0 139.0 1 0 2021 4 29
1 2021-05-06 32.0 32.0 132.0 254.0 0.0 147.0 18.0 9.0 127.0 ... 115.0 0.0 378.0 158.0 212.0 1 0 2021 5 6
2 2021-05-13 57.0 57.0 126.0 569.0 42.0 505.0 28.0 97.0 531.0 ... 98.0 0.0 1010.0 395.0 339.0 1 1 2021 5 13
3 2021-05-20 32.0 32.0 134.0 1032.0 48.0 1421.0 55.0 50.0 642.0 ... 408.0 8.0 2423.0 547.0 629.0 0 1 2021 5 20
4 2021-05-27 35.0 35.0 134.0 661.0 59.0 638.0 35.0 128.0 599.0 ... 43.0 0.0 1228.0 496.0 390.0 0 1 2021 5 27

5 rows × 31 columns

The code imports the StandardScaler class from the sklearn.preprocessing module.

Then, it creates a StandardScaler object and assigns it to the variable scaler.

Finally, it applies the fit_transform() method of the scaler object to the appts_model dataframe, scaling all the columns except the first two (Date and StudentName_x) using the mean and standard deviation of each column.

The fit_transform() method returns a new array that has the same shape as the original dataframe, with the scaled values replacing the original values.

In [ ]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
appts_model.iloc[:, 2:] = scaler.fit_transform(appts_model.iloc[:, 2:])
appts_model.head()
Out[ ]:
Date StudentName StudentName_x Graduation Term grad_term_count ACTU APAN BIET CNAD ERM ... TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years Term_y_Spring Term_y_Summer Year Month Day
0 2021-04-29 16.0 -1.346267 -1.810215 -1.899306 -1.562777 -1.721080 -1.253025 -1.418312 -1.687410 ... -2.232138 -1.688486 -1.789549 -2.282362 -2.517994 1.264911 -0.666667 -1.18699 -0.748706 1.524250
1 2021-05-06 32.0 -0.898945 -1.037659 -1.872285 -1.781209 -1.748882 -1.175641 -1.811657 -1.726154 ... -1.705912 -1.688486 -1.765871 -2.361839 -2.301070 1.264911 -0.666667 -1.18699 -0.459243 -1.087502
2 2021-05-13 57.0 -0.200005 -1.327367 -1.557042 -1.125914 -1.532512 -0.917694 -0.369391 -1.146432 ... -1.792765 -1.688486 -1.524510 -1.637371 -1.923682 1.264911 1.500000 -1.18699 -0.459243 -0.292621
3 2021-05-20 32.0 -0.898945 -0.941089 -1.093683 -1.032300 -0.978893 -0.221239 -1.139692 -0.987152 ... -0.208977 -0.879258 -0.984886 -1.172734 -1.061931 -0.790569 1.500000 -1.18699 -0.459243 0.502260
4 2021-05-27 35.0 -0.815072 -0.941089 -1.464970 -0.860675 -1.452128 -0.737132 0.138679 -1.048855 ... -2.073760 -1.688486 -1.441256 -1.328632 -1.772133 -0.790569 1.500000 -1.18699 -0.459243 1.297141

5 rows × 31 columns

Data Modeling¶

This code trains a LSTM neural network using Keras to predict the number of student appointments. Here's what each part does:

X = appts_model.drop(['StudentName', 'StudentName_x','Date'], axis=1).values: Extract the feature matrix X by removing the columns that represent the target variable and the date from the appts_model DataFrame.

y = appts_model['StudentName'].values: Extract the target variable y from the appts_model DataFrame.

X = X.reshape((X.shape[0], 1, X.shape[1])): Reshape the feature matrix to match the input shape of a LSTM model.

split_idx = int(len(X) * 0.8): Define the index to split the data into training and testing sets. In this case, 80% of the data will be used for training and 20% for testing.

X_train, X_test = X[:split_idx], X[split_idx:]: Split the feature matrix into training and testing sets.

y_train, y_test = y[:split_idx], y[split_idx:]: Split the target variable into training and testing sets.

model = Sequential(): Define the neural network model as a sequential model.

model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, X.shape[2]))): Add a LSTM layer with 50 neurons, ReLU activation function, and return_sequences=True parameter.

This layer expects an input shape of 1 time step and X.shape[2] features.

for n in range(1):: Add a loop that repeats once to add a second LSTM layer with 20 neurons and ReLU activation function.

model.add(Dropout(0.1)): Add a dropout layer with a 10% dropout rate.

model.add(Dense(20, activation='relu')): Add a fully connected layer with 20 neurons and ReLU activation function.

model.add(Dense(1)): Add a final fully connected layer with a single output neuron.

opt = Adam(learning_rate=0.01): Define the optimizer as Adam with a learning rate of 0.01.

model.compile(optimizer=opt, loss='mean_absolute_error', metrics=['mae']): Compile the model with the mean absolute error as the loss function and the mean absolute error as the evaluation metric.

early_stopping = EarlyStopping(monitor='val_loss', patience=200): Define the early stopping callback to monitor the validation loss and stop training if it doesn't improve for 200 epochs.

history = model.fit(X_train, y_train, epochs=2000, verbose=1, validation_data=(X_test, y_test), callbacks=[early_stopping]): Train the model for 2000 epochs, with early stopping callback, and print the training and validation loss. y_pred = model.predict(X_test): Predict the target variable for the testing set.

x_pred = model.predict(X_train): Predict the target variable for the training set.

mae = np.mean(np.abs(y_pred - y_test)): Calculate the mean absolute error between the predicted and actual values for the testing set.

print("MAE: ", mae): Print the mean absolute error.

model.save("best_model.h5"): Save the trained model to a file called "best_model.h5".

In [ ]:
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
import numpy as np


# Prepare the data
X = appts_model.drop(['StudentName', 'StudentName_x','Date'], axis=1).values
#X = appts_model[['Month', 'Day']].values
y = appts_model['StudentName'].values

# Reshape X to match LSTM input shape
X = X.reshape((X.shape[0], 1, X.shape[1]))

# Split the data into training and testing sets
split_idx = int(len(X) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

# Define the LSTM model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, X.shape[2])))
for n in range(1):
    model.add(LSTM(20, activation='relu', return_sequences=False))
    model.add(Dropout(0.1))
model.add(Dense(20, activation='relu'))
model.add(Dense(1))

# Compile the model with an appropriate learning rate and metric
opt = Adam(learning_rate=0.01)
model.compile(optimizer=opt, loss='mean_absolute_error', metrics=['mae'])

# Define early stopping criteria
early_stopping = EarlyStopping(monitor='val_loss', patience=200)

# Train the model with early stopping
history = model.fit(X_train, y_train, epochs=2000, verbose=1, validation_data=(X_test, y_test), callbacks=[early_stopping])

# Use the best model for predictions
y_pred = model.predict(X_test)
x_pred = model.predict(X_train)

# Calculate the MAE
mae = np.mean(np.abs(y_pred - y_test))

# Print the MAE
print("MAE: ", mae)

# Save the model
model.save("best_model.h5")
Epoch 1/2000
3/3 [==============================] - 2s 136ms/step - loss: 65.6594 - mae: 65.6594 - val_loss: 57.9173 - val_mae: 57.9173
Epoch 2/2000
3/3 [==============================] - 0s 14ms/step - loss: 65.3888 - mae: 65.3888 - val_loss: 57.6825 - val_mae: 57.6825
Epoch 3/2000
3/3 [==============================] - 0s 13ms/step - loss: 64.1830 - mae: 64.1830 - val_loss: 57.2135 - val_mae: 57.2135
Epoch 4/2000
3/3 [==============================] - 0s 15ms/step - loss: 60.7769 - mae: 60.7769 - val_loss: 56.1112 - val_mae: 56.1112
Epoch 5/2000
3/3 [==============================] - 0s 13ms/step - loss: 57.1792 - mae: 57.1792 - val_loss: 53.8096 - val_mae: 53.8096
Epoch 6/2000
3/3 [==============================] - 0s 14ms/step - loss: 55.2383 - mae: 55.2383 - val_loss: 50.4756 - val_mae: 50.4756
Epoch 7/2000
3/3 [==============================] - 0s 14ms/step - loss: 49.4992 - mae: 49.4992 - val_loss: 46.4241 - val_mae: 46.4241
Epoch 8/2000
3/3 [==============================] - 0s 17ms/step - loss: 44.9667 - mae: 44.9667 - val_loss: 42.7488 - val_mae: 42.7488
Epoch 9/2000
3/3 [==============================] - 0s 14ms/step - loss: 39.8870 - mae: 39.8870 - val_loss: 42.5257 - val_mae: 42.5257
Epoch 10/2000
3/3 [==============================] - 0s 13ms/step - loss: 40.1450 - mae: 40.1450 - val_loss: 41.1010 - val_mae: 41.1010
Epoch 11/2000
3/3 [==============================] - 0s 14ms/step - loss: 38.4705 - mae: 38.4705 - val_loss: 38.5406 - val_mae: 38.5406
Epoch 12/2000
3/3 [==============================] - 0s 13ms/step - loss: 36.4090 - mae: 36.4090 - val_loss: 35.4776 - val_mae: 35.4776
Epoch 13/2000
3/3 [==============================] - 0s 13ms/step - loss: 33.1722 - mae: 33.1722 - val_loss: 32.3042 - val_mae: 32.3042
Epoch 14/2000
3/3 [==============================] - 0s 15ms/step - loss: 32.5614 - mae: 32.5614 - val_loss: 29.4547 - val_mae: 29.4547
Epoch 15/2000
3/3 [==============================] - 0s 17ms/step - loss: 29.9416 - mae: 29.9416 - val_loss: 26.0844 - val_mae: 26.0844
Epoch 16/2000
3/3 [==============================] - 0s 14ms/step - loss: 29.9382 - mae: 29.9382 - val_loss: 26.9616 - val_mae: 26.9616
Epoch 17/2000
3/3 [==============================] - 0s 13ms/step - loss: 27.7822 - mae: 27.7822 - val_loss: 32.5226 - val_mae: 32.5226
Epoch 18/2000
3/3 [==============================] - 0s 13ms/step - loss: 25.0008 - mae: 25.0008 - val_loss: 33.1078 - val_mae: 33.1078
Epoch 19/2000
3/3 [==============================] - 0s 15ms/step - loss: 25.3558 - mae: 25.3558 - val_loss: 31.9796 - val_mae: 31.9796
Epoch 20/2000
3/3 [==============================] - 0s 14ms/step - loss: 23.2345 - mae: 23.2345 - val_loss: 31.1476 - val_mae: 31.1476
Epoch 21/2000
3/3 [==============================] - 0s 14ms/step - loss: 22.0640 - mae: 22.0640 - val_loss: 27.8523 - val_mae: 27.8523
Epoch 22/2000
3/3 [==============================] - 0s 13ms/step - loss: 22.8327 - mae: 22.8327 - val_loss: 25.4486 - val_mae: 25.4486
Epoch 23/2000
3/3 [==============================] - 0s 14ms/step - loss: 21.7929 - mae: 21.7929 - val_loss: 26.6549 - val_mae: 26.6549
Epoch 24/2000
3/3 [==============================] - 0s 15ms/step - loss: 21.1745 - mae: 21.1745 - val_loss: 32.8938 - val_mae: 32.8938
Epoch 25/2000
3/3 [==============================] - 0s 14ms/step - loss: 22.0353 - mae: 22.0353 - val_loss: 34.9448 - val_mae: 34.9448
Epoch 26/2000
3/3 [==============================] - 0s 14ms/step - loss: 19.4157 - mae: 19.4157 - val_loss: 30.9210 - val_mae: 30.9210
Epoch 27/2000
3/3 [==============================] - 0s 13ms/step - loss: 19.8537 - mae: 19.8537 - val_loss: 31.9370 - val_mae: 31.9370
Epoch 28/2000
3/3 [==============================] - 0s 16ms/step - loss: 18.5745 - mae: 18.5745 - val_loss: 35.4408 - val_mae: 35.4408
Epoch 29/2000
3/3 [==============================] - 0s 14ms/step - loss: 17.5455 - mae: 17.5455 - val_loss: 41.9922 - val_mae: 41.9922
Epoch 30/2000
3/3 [==============================] - 0s 13ms/step - loss: 18.3806 - mae: 18.3806 - val_loss: 42.3071 - val_mae: 42.3071
Epoch 31/2000
3/3 [==============================] - 0s 14ms/step - loss: 20.1847 - mae: 20.1847 - val_loss: 39.8727 - val_mae: 39.8727
Epoch 32/2000
3/3 [==============================] - 0s 17ms/step - loss: 17.8307 - mae: 17.8307 - val_loss: 38.7233 - val_mae: 38.7233
Epoch 33/2000
3/3 [==============================] - 0s 13ms/step - loss: 18.0392 - mae: 18.0392 - val_loss: 36.1398 - val_mae: 36.1398
Epoch 34/2000
3/3 [==============================] - 0s 14ms/step - loss: 18.1370 - mae: 18.1370 - val_loss: 35.5407 - val_mae: 35.5407
Epoch 35/2000
3/3 [==============================] - 0s 14ms/step - loss: 17.5399 - mae: 17.5399 - val_loss: 36.6247 - val_mae: 36.6247
Epoch 36/2000
3/3 [==============================] - 0s 15ms/step - loss: 15.7147 - mae: 15.7147 - val_loss: 36.1848 - val_mae: 36.1848
Epoch 37/2000
3/3 [==============================] - 0s 14ms/step - loss: 17.0706 - mae: 17.0706 - val_loss: 31.9961 - val_mae: 31.9961
Epoch 38/2000
3/3 [==============================] - 0s 13ms/step - loss: 15.8502 - mae: 15.8502 - val_loss: 31.8947 - val_mae: 31.8947
Epoch 39/2000
3/3 [==============================] - 0s 14ms/step - loss: 15.1236 - mae: 15.1236 - val_loss: 37.1315 - val_mae: 37.1315
Epoch 40/2000
3/3 [==============================] - 0s 14ms/step - loss: 14.5542 - mae: 14.5542 - val_loss: 39.9920 - val_mae: 39.9920
Epoch 41/2000
3/3 [==============================] - 0s 12ms/step - loss: 15.7683 - mae: 15.7683 - val_loss: 38.6527 - val_mae: 38.6527
Epoch 42/2000
3/3 [==============================] - 0s 13ms/step - loss: 14.7443 - mae: 14.7443 - val_loss: 34.1499 - val_mae: 34.1499
Epoch 43/2000
3/3 [==============================] - 0s 12ms/step - loss: 14.6164 - mae: 14.6164 - val_loss: 35.0073 - val_mae: 35.0073
Epoch 44/2000
3/3 [==============================] - 0s 22ms/step - loss: 15.7462 - mae: 15.7462 - val_loss: 41.1161 - val_mae: 41.1161
Epoch 45/2000
3/3 [==============================] - 0s 13ms/step - loss: 14.0941 - mae: 14.0941 - val_loss: 37.8386 - val_mae: 37.8386
Epoch 46/2000
3/3 [==============================] - 0s 12ms/step - loss: 13.1793 - mae: 13.1793 - val_loss: 36.0567 - val_mae: 36.0567
Epoch 47/2000
3/3 [==============================] - 0s 14ms/step - loss: 13.6378 - mae: 13.6378 - val_loss: 36.8020 - val_mae: 36.8020
Epoch 48/2000
3/3 [==============================] - 0s 12ms/step - loss: 14.1050 - mae: 14.1050 - val_loss: 35.7387 - val_mae: 35.7387
Epoch 49/2000
3/3 [==============================] - 0s 13ms/step - loss: 14.8178 - mae: 14.8178 - val_loss: 36.7813 - val_mae: 36.7813
Epoch 50/2000
3/3 [==============================] - 0s 16ms/step - loss: 13.1547 - mae: 13.1547 - val_loss: 37.8106 - val_mae: 37.8106
Epoch 51/2000
3/3 [==============================] - 0s 16ms/step - loss: 13.2360 - mae: 13.2360 - val_loss: 39.1768 - val_mae: 39.1768
Epoch 52/2000
3/3 [==============================] - 0s 12ms/step - loss: 12.6343 - mae: 12.6343 - val_loss: 42.2396 - val_mae: 42.2396
Epoch 53/2000
3/3 [==============================] - 0s 17ms/step - loss: 11.7763 - mae: 11.7763 - val_loss: 46.0133 - val_mae: 46.0133
Epoch 54/2000
3/3 [==============================] - 0s 15ms/step - loss: 11.8244 - mae: 11.8244 - val_loss: 45.6585 - val_mae: 45.6585
Epoch 55/2000
3/3 [==============================] - 0s 15ms/step - loss: 11.7305 - mae: 11.7305 - val_loss: 43.4166 - val_mae: 43.4166
Epoch 56/2000
3/3 [==============================] - 0s 14ms/step - loss: 11.1510 - mae: 11.1510 - val_loss: 43.9059 - val_mae: 43.9059
Epoch 57/2000
3/3 [==============================] - 0s 14ms/step - loss: 13.5887 - mae: 13.5887 - val_loss: 48.0962 - val_mae: 48.0962
Epoch 58/2000
3/3 [==============================] - 0s 23ms/step - loss: 12.0542 - mae: 12.0542 - val_loss: 45.7443 - val_mae: 45.7443
Epoch 59/2000
3/3 [==============================] - 0s 15ms/step - loss: 12.1945 - mae: 12.1945 - val_loss: 45.7496 - val_mae: 45.7496
Epoch 60/2000
3/3 [==============================] - 0s 16ms/step - loss: 11.5942 - mae: 11.5942 - val_loss: 42.9063 - val_mae: 42.9063
Epoch 61/2000
3/3 [==============================] - 0s 15ms/step - loss: 12.3560 - mae: 12.3560 - val_loss: 46.3837 - val_mae: 46.3837
Epoch 62/2000
3/3 [==============================] - 0s 18ms/step - loss: 11.2488 - mae: 11.2488 - val_loss: 51.5675 - val_mae: 51.5675
Epoch 63/2000
3/3 [==============================] - 0s 15ms/step - loss: 12.7946 - mae: 12.7946 - val_loss: 45.8392 - val_mae: 45.8392
Epoch 64/2000
3/3 [==============================] - 0s 13ms/step - loss: 12.7588 - mae: 12.7588 - val_loss: 49.2444 - val_mae: 49.2444
Epoch 65/2000
3/3 [==============================] - 0s 12ms/step - loss: 12.5600 - mae: 12.5600 - val_loss: 57.1721 - val_mae: 57.1721
Epoch 66/2000
3/3 [==============================] - 0s 19ms/step - loss: 11.5631 - mae: 11.5631 - val_loss: 49.9618 - val_mae: 49.9618
Epoch 67/2000
3/3 [==============================] - 0s 13ms/step - loss: 10.1033 - mae: 10.1033 - val_loss: 42.8820 - val_mae: 42.8820
Epoch 68/2000
3/3 [==============================] - 0s 13ms/step - loss: 10.2092 - mae: 10.2092 - val_loss: 42.8750 - val_mae: 42.8750
Epoch 69/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.7382 - mae: 10.7382 - val_loss: 53.8710 - val_mae: 53.8710
Epoch 70/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.2029 - mae: 10.2029 - val_loss: 55.7346 - val_mae: 55.7346
Epoch 71/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.6345 - mae: 10.6345 - val_loss: 51.5551 - val_mae: 51.5551
Epoch 72/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.8427 - mae: 10.8427 - val_loss: 44.3041 - val_mae: 44.3041
Epoch 73/2000
3/3 [==============================] - 0s 13ms/step - loss: 12.0354 - mae: 12.0354 - val_loss: 43.5346 - val_mae: 43.5346
Epoch 74/2000
3/3 [==============================] - 0s 14ms/step - loss: 11.8039 - mae: 11.8039 - val_loss: 46.8963 - val_mae: 46.8963
Epoch 75/2000
3/3 [==============================] - 0s 23ms/step - loss: 10.6882 - mae: 10.6882 - val_loss: 49.7387 - val_mae: 49.7387
Epoch 76/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.6896 - mae: 10.6896 - val_loss: 52.4447 - val_mae: 52.4447
Epoch 77/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.4314 - mae: 10.4314 - val_loss: 46.7270 - val_mae: 46.7270
Epoch 78/2000
3/3 [==============================] - 0s 13ms/step - loss: 11.5726 - mae: 11.5726 - val_loss: 42.0226 - val_mae: 42.0226
Epoch 79/2000
3/3 [==============================] - 0s 13ms/step - loss: 9.8880 - mae: 9.8880 - val_loss: 50.7565 - val_mae: 50.7565
Epoch 80/2000
3/3 [==============================] - 0s 14ms/step - loss: 9.6103 - mae: 9.6103 - val_loss: 51.0823 - val_mae: 51.0823
Epoch 81/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.4422 - mae: 10.4422 - val_loss: 48.6071 - val_mae: 48.6071
Epoch 82/2000
3/3 [==============================] - 0s 14ms/step - loss: 9.3867 - mae: 9.3867 - val_loss: 50.0550 - val_mae: 50.0550
Epoch 83/2000
3/3 [==============================] - 0s 16ms/step - loss: 8.8887 - mae: 8.8887 - val_loss: 47.1068 - val_mae: 47.1068
Epoch 84/2000
3/3 [==============================] - 0s 14ms/step - loss: 9.1134 - mae: 9.1134 - val_loss: 47.5599 - val_mae: 47.5599
Epoch 85/2000
3/3 [==============================] - 0s 14ms/step - loss: 9.3343 - mae: 9.3343 - val_loss: 52.2930 - val_mae: 52.2930
Epoch 86/2000
3/3 [==============================] - 0s 13ms/step - loss: 10.2298 - mae: 10.2298 - val_loss: 52.1765 - val_mae: 52.1765
Epoch 87/2000
3/3 [==============================] - 0s 16ms/step - loss: 9.7367 - mae: 9.7367 - val_loss: 51.8713 - val_mae: 51.8713
Epoch 88/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.4726 - mae: 10.4726 - val_loss: 48.2432 - val_mae: 48.2432
Epoch 89/2000
3/3 [==============================] - 0s 16ms/step - loss: 8.5060 - mae: 8.5060 - val_loss: 55.9073 - val_mae: 55.9073
Epoch 90/2000
3/3 [==============================] - 0s 12ms/step - loss: 9.2348 - mae: 9.2348 - val_loss: 50.7694 - val_mae: 50.7694
Epoch 91/2000
3/3 [==============================] - 0s 12ms/step - loss: 8.7689 - mae: 8.7689 - val_loss: 46.5178 - val_mae: 46.5178
Epoch 92/2000
3/3 [==============================] - 0s 16ms/step - loss: 9.3534 - mae: 9.3534 - val_loss: 53.4171 - val_mae: 53.4171
Epoch 93/2000
3/3 [==============================] - 0s 13ms/step - loss: 10.8286 - mae: 10.8286 - val_loss: 50.3070 - val_mae: 50.3070
Epoch 94/2000
3/3 [==============================] - 0s 12ms/step - loss: 9.8411 - mae: 9.8411 - val_loss: 41.9002 - val_mae: 41.9002
Epoch 95/2000
3/3 [==============================] - 0s 12ms/step - loss: 9.1196 - mae: 9.1196 - val_loss: 45.0728 - val_mae: 45.0728
Epoch 96/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.3401 - mae: 8.3401 - val_loss: 55.8644 - val_mae: 55.8644
Epoch 97/2000
3/3 [==============================] - 0s 21ms/step - loss: 9.3364 - mae: 9.3364 - val_loss: 53.9418 - val_mae: 53.9418
Epoch 98/2000
3/3 [==============================] - 0s 12ms/step - loss: 9.4609 - mae: 9.4609 - val_loss: 45.3852 - val_mae: 45.3852
Epoch 99/2000
3/3 [==============================] - 0s 12ms/step - loss: 9.9319 - mae: 9.9319 - val_loss: 43.0055 - val_mae: 43.0055
Epoch 100/2000
3/3 [==============================] - 0s 16ms/step - loss: 8.9935 - mae: 8.9935 - val_loss: 46.0197 - val_mae: 46.0197
Epoch 101/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.6514 - mae: 7.6514 - val_loss: 45.5071 - val_mae: 45.5071
Epoch 102/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.6725 - mae: 7.6725 - val_loss: 47.5667 - val_mae: 47.5667
Epoch 103/2000
3/3 [==============================] - 0s 15ms/step - loss: 9.3475 - mae: 9.3475 - val_loss: 51.4599 - val_mae: 51.4599
Epoch 104/2000
3/3 [==============================] - 0s 15ms/step - loss: 8.2146 - mae: 8.2146 - val_loss: 59.5677 - val_mae: 59.5677
Epoch 105/2000
3/3 [==============================] - 0s 15ms/step - loss: 9.3032 - mae: 9.3032 - val_loss: 58.1018 - val_mae: 58.1018
Epoch 106/2000
3/3 [==============================] - 0s 15ms/step - loss: 8.9479 - mae: 8.9479 - val_loss: 50.4879 - val_mae: 50.4879
Epoch 107/2000
3/3 [==============================] - 0s 21ms/step - loss: 9.7014 - mae: 9.7014 - val_loss: 49.3719 - val_mae: 49.3719
Epoch 108/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.7604 - mae: 8.7604 - val_loss: 50.2776 - val_mae: 50.2776
Epoch 109/2000
3/3 [==============================] - 0s 14ms/step - loss: 9.1010 - mae: 9.1010 - val_loss: 47.4385 - val_mae: 47.4385
Epoch 110/2000
3/3 [==============================] - 0s 15ms/step - loss: 8.7268 - mae: 8.7268 - val_loss: 52.1213 - val_mae: 52.1213
Epoch 111/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.6154 - mae: 7.6154 - val_loss: 53.5929 - val_mae: 53.5929
Epoch 112/2000
3/3 [==============================] - 0s 13ms/step - loss: 7.3597 - mae: 7.3597 - val_loss: 52.2027 - val_mae: 52.2027
Epoch 113/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.0514 - mae: 8.0514 - val_loss: 50.1243 - val_mae: 50.1243
Epoch 114/2000
3/3 [==============================] - 0s 17ms/step - loss: 8.2966 - mae: 8.2966 - val_loss: 52.7593 - val_mae: 52.7593
Epoch 115/2000
3/3 [==============================] - 0s 18ms/step - loss: 7.8944 - mae: 7.8944 - val_loss: 51.6669 - val_mae: 51.6669
Epoch 116/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.9033 - mae: 7.9033 - val_loss: 47.8277 - val_mae: 47.8277
Epoch 117/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.4776 - mae: 8.4776 - val_loss: 45.8913 - val_mae: 45.8913
Epoch 118/2000
3/3 [==============================] - 0s 18ms/step - loss: 9.0770 - mae: 9.0770 - val_loss: 48.1003 - val_mae: 48.1003
Epoch 119/2000
3/3 [==============================] - 0s 22ms/step - loss: 7.8054 - mae: 7.8054 - val_loss: 43.4427 - val_mae: 43.4427
Epoch 120/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.7750 - mae: 8.7750 - val_loss: 43.7592 - val_mae: 43.7592
Epoch 121/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.2736 - mae: 8.2736 - val_loss: 50.0069 - val_mae: 50.0069
Epoch 122/2000
3/3 [==============================] - 0s 25ms/step - loss: 8.0372 - mae: 8.0372 - val_loss: 52.0640 - val_mae: 52.0640
Epoch 123/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.2649 - mae: 7.2649 - val_loss: 52.6505 - val_mae: 52.6505
Epoch 124/2000
3/3 [==============================] - 0s 13ms/step - loss: 7.2975 - mae: 7.2975 - val_loss: 49.8391 - val_mae: 49.8391
Epoch 125/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.0272 - mae: 7.0272 - val_loss: 50.7343 - val_mae: 50.7343
Epoch 126/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.4023 - mae: 7.4023 - val_loss: 49.5731 - val_mae: 49.5731
Epoch 127/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.6924 - mae: 7.6924 - val_loss: 50.9214 - val_mae: 50.9214
Epoch 128/2000
3/3 [==============================] - 0s 15ms/step - loss: 8.5295 - mae: 8.5295 - val_loss: 50.0794 - val_mae: 50.0794
Epoch 129/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.7992 - mae: 7.7992 - val_loss: 41.3225 - val_mae: 41.3225
Epoch 130/2000
3/3 [==============================] - 0s 14ms/step - loss: 10.5170 - mae: 10.5170 - val_loss: 42.3643 - val_mae: 42.3643
Epoch 131/2000
3/3 [==============================] - 0s 13ms/step - loss: 8.5295 - mae: 8.5295 - val_loss: 46.5348 - val_mae: 46.5348
Epoch 132/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.6599 - mae: 7.6599 - val_loss: 46.1024 - val_mae: 46.1024
Epoch 133/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.3747 - mae: 7.3747 - val_loss: 48.9075 - val_mae: 48.9075
Epoch 134/2000
3/3 [==============================] - 0s 16ms/step - loss: 6.5468 - mae: 6.5468 - val_loss: 53.3925 - val_mae: 53.3925
Epoch 135/2000
3/3 [==============================] - 0s 17ms/step - loss: 8.0911 - mae: 8.0911 - val_loss: 51.5424 - val_mae: 51.5424
Epoch 136/2000
3/3 [==============================] - 0s 16ms/step - loss: 8.9849 - mae: 8.9849 - val_loss: 48.5263 - val_mae: 48.5263
Epoch 137/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.9064 - mae: 7.9064 - val_loss: 54.1236 - val_mae: 54.1236
Epoch 138/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.3686 - mae: 8.3686 - val_loss: 51.0214 - val_mae: 51.0214
Epoch 139/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.1817 - mae: 7.1817 - val_loss: 48.4314 - val_mae: 48.4314
Epoch 140/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.6221 - mae: 7.6221 - val_loss: 54.4150 - val_mae: 54.4150
Epoch 141/2000
3/3 [==============================] - 0s 13ms/step - loss: 6.6205 - mae: 6.6205 - val_loss: 54.7625 - val_mae: 54.7625
Epoch 142/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.4504 - mae: 7.4504 - val_loss: 52.7956 - val_mae: 52.7956
Epoch 143/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.0234 - mae: 8.0234 - val_loss: 51.5886 - val_mae: 51.5886
Epoch 144/2000
3/3 [==============================] - 0s 20ms/step - loss: 6.7870 - mae: 6.7870 - val_loss: 48.5788 - val_mae: 48.5788
Epoch 145/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.0250 - mae: 7.0250 - val_loss: 45.4811 - val_mae: 45.4811
Epoch 146/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.4842 - mae: 7.4842 - val_loss: 43.7979 - val_mae: 43.7979
Epoch 147/2000
3/3 [==============================] - 0s 25ms/step - loss: 8.1806 - mae: 8.1806 - val_loss: 46.8272 - val_mae: 46.8272
Epoch 148/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.3420 - mae: 7.3420 - val_loss: 45.6751 - val_mae: 45.6751
Epoch 149/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.1533 - mae: 7.1533 - val_loss: 46.7933 - val_mae: 46.7933
Epoch 150/2000
3/3 [==============================] - 0s 15ms/step - loss: 6.7921 - mae: 6.7921 - val_loss: 50.9273 - val_mae: 50.9273
Epoch 151/2000
3/3 [==============================] - 0s 18ms/step - loss: 6.9409 - mae: 6.9409 - val_loss: 48.8998 - val_mae: 48.8998
Epoch 152/2000
3/3 [==============================] - 0s 15ms/step - loss: 8.1230 - mae: 8.1230 - val_loss: 46.4866 - val_mae: 46.4866
Epoch 153/2000
3/3 [==============================] - 0s 15ms/step - loss: 6.5463 - mae: 6.5463 - val_loss: 49.7782 - val_mae: 49.7782
Epoch 154/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.7665 - mae: 7.7665 - val_loss: 48.9684 - val_mae: 48.9684
Epoch 155/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.8801 - mae: 7.8801 - val_loss: 47.2146 - val_mae: 47.2146
Epoch 156/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.9400 - mae: 7.9400 - val_loss: 49.3399 - val_mae: 49.3399
Epoch 157/2000
3/3 [==============================] - 0s 15ms/step - loss: 8.1681 - mae: 8.1681 - val_loss: 44.3276 - val_mae: 44.3276
Epoch 158/2000
3/3 [==============================] - 0s 14ms/step - loss: 6.8002 - mae: 6.8002 - val_loss: 43.4104 - val_mae: 43.4104
Epoch 159/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.2185 - mae: 7.2185 - val_loss: 48.1937 - val_mae: 48.1937
Epoch 160/2000
3/3 [==============================] - 0s 19ms/step - loss: 6.7969 - mae: 6.7969 - val_loss: 50.2245 - val_mae: 50.2245
Epoch 161/2000
3/3 [==============================] - 0s 13ms/step - loss: 7.2662 - mae: 7.2662 - val_loss: 50.5404 - val_mae: 50.5404
Epoch 162/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.0823 - mae: 7.0823 - val_loss: 45.0135 - val_mae: 45.0135
Epoch 163/2000
3/3 [==============================] - 0s 17ms/step - loss: 5.3986 - mae: 5.3986 - val_loss: 50.6354 - val_mae: 50.6354
Epoch 164/2000
3/3 [==============================] - 0s 18ms/step - loss: 7.0827 - mae: 7.0827 - val_loss: 51.7896 - val_mae: 51.7896
Epoch 165/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.7857 - mae: 7.7857 - val_loss: 45.5520 - val_mae: 45.5520
Epoch 166/2000
3/3 [==============================] - 0s 18ms/step - loss: 8.1538 - mae: 8.1538 - val_loss: 42.2963 - val_mae: 42.2963
Epoch 167/2000
3/3 [==============================] - 0s 13ms/step - loss: 8.0918 - mae: 8.0918 - val_loss: 47.1810 - val_mae: 47.1810
Epoch 168/2000
3/3 [==============================] - 0s 16ms/step - loss: 9.7394 - mae: 9.7394 - val_loss: 49.0362 - val_mae: 49.0362
Epoch 169/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.0714 - mae: 8.0714 - val_loss: 38.1750 - val_mae: 38.1750
Epoch 170/2000
3/3 [==============================] - 0s 16ms/step - loss: 9.7255 - mae: 9.7255 - val_loss: 38.6910 - val_mae: 38.6910
Epoch 171/2000
3/3 [==============================] - 0s 18ms/step - loss: 7.4218 - mae: 7.4218 - val_loss: 51.3892 - val_mae: 51.3892
Epoch 172/2000
3/3 [==============================] - 0s 14ms/step - loss: 9.1759 - mae: 9.1759 - val_loss: 50.8813 - val_mae: 50.8813
Epoch 173/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.3182 - mae: 7.3182 - val_loss: 39.9601 - val_mae: 39.9601
Epoch 174/2000
3/3 [==============================] - 0s 15ms/step - loss: 6.7631 - mae: 6.7631 - val_loss: 43.5258 - val_mae: 43.5258
Epoch 175/2000
3/3 [==============================] - 0s 17ms/step - loss: 6.9990 - mae: 6.9990 - val_loss: 50.6018 - val_mae: 50.6018
Epoch 176/2000
3/3 [==============================] - 0s 13ms/step - loss: 8.5642 - mae: 8.5642 - val_loss: 48.1345 - val_mae: 48.1345
Epoch 177/2000
3/3 [==============================] - 0s 14ms/step - loss: 6.3842 - mae: 6.3842 - val_loss: 42.1444 - val_mae: 42.1444
Epoch 178/2000
3/3 [==============================] - 0s 13ms/step - loss: 7.2369 - mae: 7.2369 - val_loss: 42.6923 - val_mae: 42.6923
Epoch 179/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.3038 - mae: 7.3038 - val_loss: 44.8732 - val_mae: 44.8732
Epoch 180/2000
3/3 [==============================] - 0s 26ms/step - loss: 7.1911 - mae: 7.1911 - val_loss: 40.4869 - val_mae: 40.4869
Epoch 181/2000
3/3 [==============================] - 0s 12ms/step - loss: 6.2947 - mae: 6.2947 - val_loss: 46.4633 - val_mae: 46.4633
Epoch 182/2000
3/3 [==============================] - 0s 19ms/step - loss: 6.9016 - mae: 6.9016 - val_loss: 51.9065 - val_mae: 51.9065
Epoch 183/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.5087 - mae: 7.5087 - val_loss: 44.3242 - val_mae: 44.3242
Epoch 184/2000
3/3 [==============================] - 0s 15ms/step - loss: 6.8675 - mae: 6.8675 - val_loss: 41.0084 - val_mae: 41.0084
Epoch 185/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.1289 - mae: 7.1289 - val_loss: 44.8715 - val_mae: 44.8715
Epoch 186/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.4899 - mae: 7.4899 - val_loss: 48.9164 - val_mae: 48.9164
Epoch 187/2000
3/3 [==============================] - 0s 14ms/step - loss: 6.5070 - mae: 6.5070 - val_loss: 48.7849 - val_mae: 48.7849
Epoch 188/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.9294 - mae: 7.9294 - val_loss: 44.5779 - val_mae: 44.5779
Epoch 189/2000
3/3 [==============================] - 0s 25ms/step - loss: 6.7593 - mae: 6.7593 - val_loss: 42.8806 - val_mae: 42.8806
Epoch 190/2000
3/3 [==============================] - 0s 18ms/step - loss: 6.8462 - mae: 6.8462 - val_loss: 47.8771 - val_mae: 47.8771
Epoch 191/2000
3/3 [==============================] - 0s 16ms/step - loss: 6.7201 - mae: 6.7201 - val_loss: 51.1026 - val_mae: 51.1026
Epoch 192/2000
3/3 [==============================] - 0s 13ms/step - loss: 6.9480 - mae: 6.9480 - val_loss: 47.1931 - val_mae: 47.1931
Epoch 193/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.0101 - mae: 7.0101 - val_loss: 46.4540 - val_mae: 46.4540
Epoch 194/2000
3/3 [==============================] - 0s 15ms/step - loss: 6.9024 - mae: 6.9024 - val_loss: 50.3397 - val_mae: 50.3397
Epoch 195/2000
3/3 [==============================] - 0s 14ms/step - loss: 6.5681 - mae: 6.5681 - val_loss: 42.1270 - val_mae: 42.1270
Epoch 196/2000
3/3 [==============================] - 0s 14ms/step - loss: 8.1130 - mae: 8.1130 - val_loss: 39.4059 - val_mae: 39.4059
Epoch 197/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.4489 - mae: 7.4489 - val_loss: 48.2340 - val_mae: 48.2340
Epoch 198/2000
3/3 [==============================] - 0s 15ms/step - loss: 6.5331 - mae: 6.5331 - val_loss: 47.9230 - val_mae: 47.9230
Epoch 199/2000
3/3 [==============================] - 0s 14ms/step - loss: 7.2790 - mae: 7.2790 - val_loss: 42.0083 - val_mae: 42.0083
Epoch 200/2000
3/3 [==============================] - 0s 15ms/step - loss: 7.0790 - mae: 7.0790 - val_loss: 47.2796 - val_mae: 47.2796
Epoch 201/2000
3/3 [==============================] - 0s 16ms/step - loss: 7.1935 - mae: 7.1935 - val_loss: 54.3979 - val_mae: 54.3979
Epoch 202/2000
3/3 [==============================] - 0s 17ms/step - loss: 7.6016 - mae: 7.6016 - val_loss: 53.0855 - val_mae: 53.0855
Epoch 203/2000
3/3 [==============================] - 0s 13ms/step - loss: 7.6932 - mae: 7.6932 - val_loss: 49.1075 - val_mae: 49.1075
Epoch 204/2000
3/3 [==============================] - 0s 13ms/step - loss: 6.9903 - mae: 6.9903 - val_loss: 47.6789 - val_mae: 47.6789
Epoch 205/2000
3/3 [==============================] - 0s 12ms/step - loss: 6.1540 - mae: 6.1540 - val_loss: 49.1183 - val_mae: 49.1183
Epoch 206/2000
3/3 [==============================] - 0s 14ms/step - loss: 6.0819 - mae: 6.0819 - val_loss: 46.5438 - val_mae: 46.5438
Epoch 207/2000
3/3 [==============================] - 0s 19ms/step - loss: 5.5868 - mae: 5.5868 - val_loss: 44.6183 - val_mae: 44.6183
Epoch 208/2000
3/3 [==============================] - 0s 13ms/step - loss: 6.4061 - mae: 6.4061 - val_loss: 38.2505 - val_mae: 38.2505
Epoch 209/2000
3/3 [==============================] - 0s 13ms/step - loss: 7.2082 - mae: 7.2082 - val_loss: 39.6067 - val_mae: 39.6067
Epoch 210/2000
3/3 [==============================] - 0s 12ms/step - loss: 7.1549 - mae: 7.1549 - val_loss: 43.9873 - val_mae: 43.9873
Epoch 211/2000
3/3 [==============================] - 0s 22ms/step - loss: 6.4218 - mae: 6.4218 - val_loss: 41.0495 - val_mae: 41.0495
Epoch 212/2000
3/3 [==============================] - 0s 12ms/step - loss: 6.4829 - mae: 6.4829 - val_loss: 42.6283 - val_mae: 42.6283
Epoch 213/2000
3/3 [==============================] - 0s 12ms/step - loss: 5.4856 - mae: 5.4856 - val_loss: 44.6421 - val_mae: 44.6421
Epoch 214/2000
3/3 [==============================] - 0s 12ms/step - loss: 5.3503 - mae: 5.3503 - val_loss: 46.5398 - val_mae: 46.5398
Epoch 215/2000
3/3 [==============================] - 0s 13ms/step - loss: 6.4040 - mae: 6.4040 - val_loss: 47.6458 - val_mae: 47.6458
Epoch 216/2000
3/3 [==============================] - 0s 12ms/step - loss: 6.6471 - mae: 6.6471 - val_loss: 49.0750 - val_mae: 49.0750
Epoch 217/2000
3/3 [==============================] - 0s 17ms/step - loss: 6.0777 - mae: 6.0777 - val_loss: 53.9519 - val_mae: 53.9519
Epoch 218/2000
3/3 [==============================] - 0s 26ms/step - loss: 6.2945 - mae: 6.2945 - val_loss: 47.7473 - val_mae: 47.7473
Epoch 219/2000
3/3 [==============================] - 0s 17ms/step - loss: 6.2076 - mae: 6.2076 - val_loss: 42.1212 - val_mae: 42.1212
Epoch 220/2000
3/3 [==============================] - 0s 21ms/step - loss: 7.5077 - mae: 7.5077 - val_loss: 47.3427 - val_mae: 47.3427
Epoch 221/2000
3/3 [==============================] - 0s 25ms/step - loss: 7.5085 - mae: 7.5085 - val_loss: 48.1796 - val_mae: 48.1796
Epoch 222/2000
3/3 [==============================] - 0s 19ms/step - loss: 6.5414 - mae: 6.5414 - val_loss: 42.0083 - val_mae: 42.0083
1/1 [==============================] - 0s 261ms/step
3/3 [==============================] - 0s 4ms/step
MAE:  45.891181115390495

This code creates a directory path in Google Drive to save the results of an appointments prediction model run.

First, the date class from the datetime module is imported. Then, today's date is obtained using the today() method of the date class and assigned to the today variable.

The path variable is created by formatting a string that includes the current date as a subdirectory of a larger directory drive/My Drive/CDL Api Files/appointments_prediction_results/.

Finally, the !mkdir command is used to create the directory specified by path if it does not already exist. The -p option creates intermediate directories if necessary, and the $ before the variable name is used to substitute the value of the variable in the shell command.

In [ ]:
from datetime import date
today = date.today()
path = f"drive/My Drive/CDL Api Files/appointments_prediction_results/{str(today)}"
!mkdir -p "$path"

Train Predictions Plot¶

In [ ]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

preds_train = pd.DataFrame({
    'Date': appts_model['Date'][:split_idx],
    'Predictions': x_pred,
    'Observed': appts_model['StudentName'][:split_idx]
})

preds_train.to_csv('train_pred.csv')
!cp train_pred.csv "$path"

fig, ax = plt.subplots(figsize=(12,8))
plt.title('Train Predictions')
ax.plot('Date', 'Predictions', data=preds_train, label='Predictions')
ax.plot('Date', 'Observed', data=preds_train, label='Observed', linewidth=0.5) # set alpha to 0.5 for the Observed line
plt.xticks(rotation=60)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=3))
ax.legend()
plt.show()

Test Predictions Plot¶

In [ ]:
import matplotlib.pyplot as plt
preds_test = pd.DataFrame({
    'Date': appts_model['Date'][split_idx:],
    'Predictions': y_pred,
    'Observed': appts_model['StudentName'][split_idx:]
})

preds_test.to_csv('test_pred.csv')
!cp test_pred.csv "$path"

fig, ax = plt.subplots(figsize=(12,8))
plt.title('Test Predictions')
ax.plot('Date', 'Predictions', data=preds_test, label='Predictions')
ax.plot('Date', 'Observed', data=preds_test, label='Observed', linewidth=0.5) # set alpha to 0.5 for the Observed line
plt.xticks(rotation=60)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
ax.legend()
plt.show()

Predict Next 3 Months¶

In [ ]:
# Predict next 3 months
# Prepare the data
new_data = pd.DataFrame(appts).tail(90)
display(new_data.head())

def next_3months(date):
  date += timedelta(days=90)
  date = date.strftime('%Y-%m-%d')
  date = pd.to_datetime(date)
  return date

def extract_year(col):
  #col = datetime.strptime(col, '%Y-%m-%d')
  year = col.year
  return int(year)

def extract_month(col):
  #col = datetime.strptime(col, '%Y-%m-%d')
  month = col.month
  return int(month)

def extract_day(col):
  #col = datetime.strptime(col, '%Y-%m-%d')
  day = col.day
  return int(day)

#new_data['Date'] = pd.to_datetime(new_data['Date'])
new_data['Date'] = new_data['Date'].apply(next_3months)
new_data = new_data.resample('7D', on='Date').sum().reset_index()
new_data['Year'] = new_data['Date'].apply(extract_year)
new_data['Month'] = new_data['Date'].apply(extract_month)
new_data['Day'] = new_data['Date'].apply(extract_day)
new_data.iloc[:, 2:] = scaler.transform(new_data.iloc[:, 2:])
new_data.head()

X = new_data.drop(['StudentName', 'StudentName_x','Date'], axis=1).values
y = appts_model['StudentName'].tail(90).values

#Reshape X to match LSTM input shape
X = X.reshape((X.shape[0], 1, X.shape[1]))
X.shape
new_pred = model.predict(X)
new_pred = [x[0] for x in new_pred]
new_pred[:5]
Date StudentName Term StudentName_x Graduation Term grad_term_count ACTU APAN BIET CNAD ... STRAT COMM SUMA SUSC TEMT WEALTH 0 - 3 Years 3 - 5 Years 5+ Years Term_y_Spring Term_y_Summer
638 2023-01-27 19.0 Spring 19.0 35.0 25.0 0.0 30.0 0.0 3.0 ... 0.0 0.0 1.0 2.0 0.0 47.0 4.0 1.0 1 0
639 2023-01-28 0.0 Spring 0.0 35.0 25.0 0.0 30.0 0.0 3.0 ... 0.0 0.0 1.0 2.0 0.0 47.0 4.0 1.0 1 0
640 2023-01-29 0.0 Spring 0.0 35.0 25.0 0.0 30.0 0.0 3.0 ... 0.0 0.0 1.0 2.0 0.0 47.0 4.0 1.0 1 0
641 2023-01-30 12.0 Spring 12.0 28.0 575.0 9.0 605.0 8.0 14.0 ... 74.0 160.0 2.0 99.0 1.0 1123.0 164.0 134.0 1 0
642 2023-01-31 29.0 Spring 29.0 22.0 567.0 41.0 1034.0 7.0 39.0 ... 169.0 104.0 16.0 89.0 5.0 1604.0 235.0 236.0 1 0

5 rows × 29 columns

1/1 [==============================] - 0s 16ms/step
<ipython-input-47-47ba913aff7d>:29: FutureWarning:

The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.

Out[ ]:
[126.090775, 142.50012, 126.13736, 125.77501, 144.65273]

Next 3 Months Prediction Plot¶

In [ ]:
import matplotlib.pyplot as plt
new_preds = pd.DataFrame({
    'Date': new_data['Date'],
    'Predictions': new_pred
})

new_preds.to_csv('new_preds.csv')
!cp new_preds.csv "$path"

fig, ax = plt.subplots(figsize=(12,8))
plt.title('Next 3 Months Predictions')
ax.plot('Date', 'Predictions', data=new_preds, label='Predictions')
plt.xticks(rotation=60)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
ax.legend()
plt.show()

Predictions Combined¶

In [ ]:
date_new = pd.concat([appts_model['Date'], new_data['Date']], axis=0)
len_zeros = int(len(date_new)-len(appts_model['StudentName']))
observed_new = pd.concat([appts_model['StudentName'], pd.Series(np.zeros(len_zeros))], axis=0)
predictions_new = pd.concat([preds_train['Predictions'], preds_test['Predictions'], new_preds['Predictions']])

predictions_concat = pd.DataFrame({
    'Date': date_new,
    'Observed': observed_new,
    'Predictions': predictions_new
})

import plotly.graph_objs as go
import pandas as pd

# Create a trace for the observed values
trace_observed = go.Scatter(x=predictions_concat['Date'], y=predictions_concat['Observed'], name='Observed')

# Create a trace for the predictions
trace_predictions = go.Scatter(x=predictions_concat['Date'], y=predictions_concat['Predictions'], name='Predictions')

# Create a layout for the graph
layout = go.Layout(
    title='Next 3 Months Predictions',
    xaxis=dict(title='Date', tickangle=60),
    yaxis=dict(title='Values'),
)

# Create a figure and add the traces and layout
fig = go.Figure(data=[trace_observed, trace_predictions], layout=layout)

# Show the graph
fig.show()
In [ ]:
# # Define the LSTM model for Date column
# from keras.models import Sequential
# from keras.layers import Dense, LSTM, Dropout
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import OneHotEncoder

# # Prepare the data for the LSTM
# X_date = appts_model['Date'].values.reshape((-1, 1))
# y = appts_model['StudentName_x'].values

# # Reshape X_date to match LSTM input shape
# X_date = X_date.reshape((X_date.shape[0], 1, X_date.shape[1]))

# # Split the data into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(appts_model['Date'], appts_model.drop(['Date', 'StudentName_x'], axis=1), test_size=0.2, shuffle=False)

# # Encode the Date column
# date_encoder = OneHotEncoder(sparse=False)
# X_date_train_encoded = date_encoder.fit_transform(X_train.values.reshape(-1, 1))
# X_date_test_encoded = date_encoder.transform(X_test.values.reshape(-1, 1))

# # Reshape the encoded Date features for LSTM input shape
# X_date_train_encoded = X_date_train_encoded.reshape(X_date_train_encoded.shape[0], 1, X_date_train_encoded.shape[1])
# X_date_test_encoded = X_date_test_encoded.reshape(X_date_test_encoded.shape[0], 1, X_date_test_encoded.shape[1])

# # Define the LSTM model
# lstm_model = Sequential()
# lstm_model.add(LSTM(64, input_shape=(1, X_date_train_encoded.shape[2]), activation='relu'))
# lstm_model.add(Dense(1, activation='linear'))
# lstm_model.compile(optimizer='adam', loss='mse')

# # Train the LSTM model
# lstm_model.fit(X_date_train_encoded, y_train['StudentName_x'], epochs=50, batch_size=32)

# # Define the other model
# other_model = Sequential()
# other_model.add(Dense(128, input_dim=X_other_train.shape[1], activation='relu'))
# other_model.add(Dense(64, activation='relu'))
# other_model.add(Dense(32, activation='relu'))
# other_model.add(Dense(1, activation='linear'))
# other_model.compile(optimizer='adam', loss='mse')

# # Train the other model
# other_model.fit(X_other_train, y_train['StudentName_x'], epochs=50, batch_size=32)

# # Evaluate both models on the test data
# lstm_score = lstm_model.evaluate(X_date_test_encoded, y_test['StudentName_x'])
# other_score = other_model.evaluate(X_other_test, y_test['StudentName_x'])

# print(f"LSTM model test loss: {lstm_score:.4f}")
# print(f"Other model test loss: {other_score:.4f}")

# # Concatenate the encoded Date features with the other features
# X_train_final = np.concatenate([X_date_train_encoded, X_other_train], axis=2)
# X_test_final = np.concatenate([X_date_test_encoded, X_other_test], axis=2)

# # Define the final model with LSTM layer and other layers
# final_model = Sequential()
# final_model.add(LSTM(64, input_shape=(1, X_train_final.shape[2]), activation='relu'))
# final_model.add(Dense(128, activation='relu'))
# final_model.add(Dense(64, activation='relu'))
# final_model.add(Dense(32, activation='relu'))
# final_model.add(Dense(1, activation='linear'))
# final_model.compile(optimizer='adam', loss='mse')

# # Train the final model
# final_model.fit(X_train_final, y_train['StudentName_x'], epochs=50, batch_size=32)

# # Evaluate the final model on the test data
# final_score = final_model.evaluate(X_test_final, y_test['StudentName_x'])
# print(f"Final model test loss: {final_score:.4f}")