From 64d62e79ce8b8d8b6cdfb3820e38e53c0661d762 Mon Sep 17 00:00:00 2001 From: mkelvers Date: Sat, 6 Jun 2026 15:54:10 +0200 Subject: [PATCH] refactor: remove docs folder --- docs/recommendation-architecture.md | 183 ---------------------------- 1 file changed, 183 deletions(-) delete mode 100644 docs/recommendation-architecture.md diff --git a/docs/recommendation-architecture.md b/docs/recommendation-architecture.md deleted file mode 100644 index c2a8434..0000000 --- a/docs/recommendation-architecture.md +++ /dev/null @@ -1,183 +0,0 @@ -# Recommendation Architecture - -This document defines the long-term shape of the `Top Pick for You` -recommendation system. -The goal is to keep the current implementation simple enough to operate inside -the existing Go application while preserving a clean path toward a larger -recommender system. - -## Current Serving Model - -The current `Top Pick for You` implementation is a bounded hybrid ranker: - -- builds weighted seeds from user watch history -- uses Jikan recommendation edges as collaborative candidates -- uses watchlist-derived genres, themes, studios, and demographics as profile - search candidates -- excludes anime already present in the watchlist -- boosts candidates that match user taste signals -- reranks the final list to reduce genre pileups - -The online request path stays intentionally small: - -1. load recent watchlist state -2. derive strong seeds -3. build a weighted taste profile from those seeds -4. fetch bounded collaborative and profile-search candidate sets -5. score candidates -6. rerank for diversity -7. return top results - -## Target System Shape - -The future recommender should keep four stable layers: - -1. event collection -2. feature aggregation -3. candidate generation -4. ranking and reranking - -That separation matters more than the specific model used at each stage. - -## Event Collection - -Recommendations should eventually be driven by behavior events, not only by -watchlist state. - -Important events: - -- `impression` -- `click` -- `add_to_watchlist` -- `start_watch` -- `progress_update` -- `complete` -- `drop` -- `hide_recommendation` -- `search` - -Event capture should preserve: - -- `user_id` -- `anime_id` -- `event_type` -- `occurred_at` -- `source` -- contextual metadata as JSON - -## Feature Aggregation - -Online requests should not recompute the full user profile from raw events. -Instead, background jobs should maintain aggregated feature snapshots. - -Useful profile features: - -- genre affinity -- theme affinity -- studio affinity -- demographic affinity -- completion rate by genre -- abandonment rate by genre -- preference for airing vs finished anime -- preference for recent vs older anime -- short-term interest profile -- long-term stable taste profile - -These features should eventually live in a durable profile snapshot table so -the serving path remains cheap. - -## Candidate Generation - -Candidate generation should be modular. Each source should produce: - -- `anime_id` -- `source` -- `source_score` -- explanation metadata - -Primary candidate sources: - -- item-item recommendation edges -- related anime and sequel chains -- content-similar anime from genres, themes, studios, and demographics -- trending titles inside the user taste envelope -- seasonal titles aligned with recent behavior -- editorial or promoted rails when needed - -Candidate generation should stay bounded. Ranking the full catalog online is -not a viable long-term approach. - -## Ranking - -The current ranker is heuristic by design. That is the correct starting point. - -Near-term ranking inputs: - -- collaborative recommendation weight -- watch history status weight -- recency decay -- progress-based engagement -- genre overlap -- theme overlap -- studio overlap -- demographic overlap -- airing or freshness alignment -- popularity moderation - -The ranking API should remain stable even if the scoring model changes later. -That allows a future move to gradient-boosted trees or other learned rankers -without rewriting candidate generation or serving. - -## Reranking - -The final serving stage should apply product constraints that raw ranking will -not handle well on its own: - -- genre diversity -- franchise caps -- duplicate suppression -- hide or negative-feedback suppression -- maturity filtering -- freshness and exploration budget - -This is intentionally a separate concern from relevance scoring. - -## Data Tables - -The first recommendation-specific schema additions should support: - -- append-only event capture -- recommendation impression tracking -- cached user profile snapshots - -These tables are created in migration `024_add_recommendation_foundation.sql`. - -## Roadmap - -### V1 - -- bounded hybrid ranker in request path -- uses watchlist history and Jikan metadata -- no offline jobs required - -### V2 - -- capture user recommendation and watch behavior events -- persist user profile snapshots -- precompute candidate caches -- add explicit feedback controls such as hide or not interested - -### V3 - -- split retrieval from ranking -- precompute similarity graphs and user candidate pools -- run offline evaluation on impressions, clicks, starts, and completes -- introduce learned ranking only when enough behavior data exists - -## Operational Rules - -- keep request-time fanout bounded -- keep scoring explainable -- log recommendation impressions before introducing heavier models -- prefer replaceable modules over one large recommendation function -- treat data collection as the foundation for later ML, not an optional extra