Self-directed · MovieLens dataset
Movie Recommender
Collaborative filtering, evaluated honestly.
- Year
- 2026
- Role
- Full-stack + data
- Status
- Shipped
- Type
- Self-directed
A full-stack recommender on MovieLens (610 users, 100K+ ratings) comparing user-based and item-based collaborative filtering — measured, not just shipped.
The problem
Most student recommenders stop at “it returns movies.” I wanted one that could answer a harder question: which collaborative-filtering approach is actually better on this data, and by how much?
Approach
- Built a FastAPI + PostgreSQL backend serving top-K neighbor predictions over a normalized schema (users, movies, ratings, tags, links).
- Implemented both user-based and item-based collaborative filtering with Pearson and cosine similarity over mean-centered ratings.
- Cleaned the data the way it actually needs cleaning: dedupe, keep-latest on repeat ratings, filter low-activity users/items, and a time-aware train/test split so the test set represents future behavior.
- Added a baseline correction (global mean + user bias + item bias) and evaluated with RMSE / MAE plus Precision@K rather than eyeballing results.
- Built the frontend in React 19 + Vite + Tailwind against the REST API.
Impact
- Side-by-side, measured comparison of user- vs item-based CF on a held-out, time-aware split.
- Clean separation: typed REST API, normalized Postgres schema, reproducible preprocessing.
Stack
Frontend
React 19ViteTailwind
Backend
FastAPIPostgreSQLREST
Data / ML
NumPyPandasCollaborative filteringRMSE / MAE / P@K