DockWatch
Real-time BCycle wait time prediction based on usage patterns and environmental factors.
Real-time BCycle wait time prediction based on usage patterns and environmental factors.
Short on time? Skip the scrolling and hit play- I'll walk you through it all in this video!
Prefer a quick read? Check out the highlights in this short presentation!
Boulder BCycle is a popular and convenient transportation service in Boulder, Colorado. It offers electric bikes stationed throughout the city, providing a sustainable and efficient means of commuting. Notably, the service is tailored to the needs of the community, as it offers a unique benefit to students by providing them with free rides for up to one hour per trip. This student-friendly policy makes it an attractive choice for the University of Colorado Boulder's student population, who frequently utilize Boulder BCycle to commute between their residences and the university. The service plays a key role in the city's transport system, supporting sustainability goals and meeting student needs.
In Boulder's BCycle system, users often encounter reliability issues and struggle to predict bike station availability. Hence, this project is focused on transforming the BCycle experience by dissecting usage patterns and calculating waiting times for each station. By analyzing historical data and real-time inputs, I have provided users with valuable insights, enabling them to make informed decisions about when and where to pick up a bike. This initiative seeks to enhance the overall convenience and efficiency of the service, ultimately making BCycle an even more reliable and accessible mode of transportation for the Boulder community.
Data Collection: Relevant data were collected from 3 API sources
GBFS: GBFS API was used to collect live station status data. A major problem with this data is that the Problem with this API is it only provides the current station status but not historical data, the was called frequently and the data was pulled every 3 mins.
OpenMeteo API: OpenMeteo API was used to collect the historical and current weather information.
CU Boulder Class Search: The CU class calendar was extracted to record classes that were scheduled on a particular day.
Data Storage: Amazon EC2 was used to run the data collection script without interruptions and the results were stored in a Postgres SQL server.
Data Preprocessing: A Python preprocessing pipeline was set up to implement the following data preprocessing.
Handle missing values
Create Synthetic Features
Normalization
Label encoding
Join Tables
Analysis & Modeling: The pre-processed data was finally pulled into Tableau for EDA and Visualization and into another Python pipeline for Modeling.
Tools and Algorithms Used
Python, Tableau, SQL, Postgres Database, AWS EC2
Data Streaming, API, Random Forest Classifier, Support Vector Machine, XGBoost
Stations with less wait time variance recorded higher accuracy of over with the highest in C4C of 89%.