Titanic - Machine Learning from Disaster
This project involved me building an interactive UI using Flask and HTML for a Titanic survival prediction model. Furthermore, I had the opportunity to conduct data collection, cleaning, and applying classification models to predict if a passenger would survive. Additionally, I applied a chi-squared test to analyze whether survival rates differed across passenger classes The project also includes data visualizations that break down survival rates by gender, age, and class.
​
​Dataset used and information on the competition: https://lnkd.in/gWpjjZNh
Titanic Survival Analysis

The histogram above portrays the age distribution of male (blue) and female (pink) passengers in the Titanic. The graph reveals that the majority of passengers were between 20 and 40 years old. Males had a higher frequency in this age range, particularly around the early 20s. There are fewer passengers above the age of 50, and the gender distribution is relatively even at older ages.
The age distribution graph suggests that younger adults (especially males) were more frequent among passengers, but this does not directly correlate with survival in this specific visual. Their survival rates are explored in the visuals produced below.


The overall survival distribution pie chart shows the overall survival rate of Titanic passengers. 58.9% of the passengers did not survive the disaster, while 41.1% survived. Suggesting that the majority of passengers on the Titanic perished.
Survival distribution by gender breaks down survival by gender. A higher proportion of females (26.4%) survived compared to males (14.7%). Nearly half (49.7%) of the passengers who did not survive were male, highlighting that men had a significantly lower survival rate. The small percentage of females who did not survive (9.2%) further reinforces the gender disparity in survival rates. Women had a higher survival rate than men, reflecting the "women and children first" protocol during the evacuation.​
Machine Learning analysis

As an additional feature for this project, I created an interactive (UI) using Flask and HTML, which allows users to input their details such as ticket class, age, gender, number of siblings, and parents aboard. Upon submitting this information, my trained machine learning model processes these inputs and predicts whether the user would have survived the Titanic disaster based on the available data.
The UI enhances the user experience by making the model accessible and interactive. Instead of simply observing static predictions, users can engage with the model and see real-time predictions based on their own hypothetical or personalized input. This adds an interactive, educational layer to the project, helping users understand how factors like class, age, and family size impacted survival chances in this historical event.
In my own experience with the model, I found that, based on my hypothetical inputs, I wouldn’t have survived the Titanic disaster—an outcome that reflects how critical these features were in determining survival during the event. This illustrates how data can provide insights not only on a broad, collective level but also on an individual scale, allowing users to make personalized assessments.
Statistical analysis

This part of my portfolio aimed to answer a key question which related to bias among the various economic classes within the Titanic. I went out to investigate "Is there a statistically significant difference in the survival rates of passengers across different ticket classes on the Titanic?"
​
In order to find an answer for this question I had decided to conduct a chi squared test. This statistical testing method is a measure of the difference between the observed and expected frequencies of the outcomes of a set of events or variables. In the case of the Titanic, we can assume that in a perfect world, the distribution of surviving passengers is equal across all passenger classes within the Titanic. Leading to the folowing hypothesizes:
​
Null Hypothesis (H0): The distribution of surviving passengers is equal across passenger classes.
​
Alternative Hypothesis (H1): The distribution of surviving passengers is unequal across passenger classes.
​
​
Key details from the test:
-
Significance Level: 0.05
-
Degrees of Freedom: 2
-
Chi-square Table Value: 5.991
-
Chi-square Value from Dataset: 29.39
​
Since the calculated Chi-square value (29.39) is much larger than the table value (5.991), the test results lead to rejecting the null hypothesis at a 5% significance level. This provides strong evidence that there is a significant difference in the distribution of surviving passengers across different passenger classes.
​
This suggests that class played a significant role in determining survival, with certain classes likely having a better chance of survival than others. This finding aligns with the historical context of the Titanic, where passengers from higher classes had better access to lifeboats and safety provisions.​​​​​​​​​​​​​​​
​