This project is from 2022 SoCal RUG Data Science Hackathon.
Team: Taeho Lee, Rohit Cherian Bijoy, Michelle Ng, Chloe Ko, Jiwon Ko
Tool: Python, Tableau
This project won the “Best Analysis” award in this hackathon.
Presentation
We are team Lasagna. (Lasagna was so good!)
We have selected the top 5 California cities with the highest population as well as Irvine since we’re here, and we will be comparing 5-year estimates retrieved from the US Census in years 2015 and 2020.
The graph on the left shows the changes in number of units available in the respective price ranges from 2015 to 2020. We were able to see that housing units have generally increased in price over the years, with a lot more units being rented out at $2000 and above in Irvine, San Jose, and San Diego.
On the right, we tried to identify the relationship between the value of a housing unit and the price it is rented out at. Our hypothesis is that as the house value increases, the rent price increases too. However, that is not the case in Irvine unfortunately, and our conclusion is that there is a housing monopoly run by Irvine Company Apartments.
We also wanted to see the relationship between income and housing. As shown from the graphs above, we can see that the incomes increased from 2015 to 2020, and so did the housing costs. Out of the 6 cities, Irvine is the most costly city to live in, in terms of housing.
Zooming in income, we realized that high rent is driven by the fact that income growth is disparate - In irvine people earning > 200K grew by 53% where as people earning between 15-25k experienced negative growth of 20%.
Our next question was why is there such a disparate growth in income? Our hypothesis was that it had to do something with population migration - what we found especially in high rent areas is that they’re characterized by people moving in from abroad and less people moving in from another internal state.
With all of these factors, we built a regression model that takes into account the rate of change to predict the potential change in rent - we get a pretty accurate model and we’re able to estimate change in rent for Anaheim, Long Beach and Tustin.
Personal Feelings
I was fortunate to participate in the Hackathon with other great professionals. I learned a lot from other team’s great presentations as well.
It was not easy to find an insight in such a short time, but I got valuable lessons thanks to the time constraint: Narrowing down questions, story-telling, and strong technical skills are the most important for the good data scientist.
Thank you every organizers, mentors (SoCal RUG (John Peach, Pablo Barajas, Javier Orraca, Tommy Steed, FSA)) and Merage Analytics Club for hosting and organizing this great event.