refactor: remove docs folder

This commit is contained in:
2026-06-06 15:54:10 +02:00
parent b85b29aa13
commit 4bcfc8fdb7

View File

@@ -1,183 +0,0 @@
# Recommendation Architecture
This document defines the long-term shape of the `Top Pick for You`
recommendation system.
The goal is to keep the current implementation simple enough to operate inside
the existing Go application while preserving a clean path toward a larger
recommender system.
## Current Serving Model
The current `Top Pick for You` implementation is a bounded hybrid ranker:
- builds weighted seeds from user watch history
- uses Jikan recommendation edges as collaborative candidates
- uses watchlist-derived genres, themes, studios, and demographics as profile
search candidates
- excludes anime already present in the watchlist
- boosts candidates that match user taste signals
- reranks the final list to reduce genre pileups
The online request path stays intentionally small:
1. load recent watchlist state
2. derive strong seeds
3. build a weighted taste profile from those seeds
4. fetch bounded collaborative and profile-search candidate sets
5. score candidates
6. rerank for diversity
7. return top results
## Target System Shape
The future recommender should keep four stable layers:
1. event collection
2. feature aggregation
3. candidate generation
4. ranking and reranking
That separation matters more than the specific model used at each stage.
## Event Collection
Recommendations should eventually be driven by behavior events, not only by
watchlist state.
Important events:
- `impression`
- `click`
- `add_to_watchlist`
- `start_watch`
- `progress_update`
- `complete`
- `drop`
- `hide_recommendation`
- `search`
Event capture should preserve:
- `user_id`
- `anime_id`
- `event_type`
- `occurred_at`
- `source`
- contextual metadata as JSON
## Feature Aggregation
Online requests should not recompute the full user profile from raw events.
Instead, background jobs should maintain aggregated feature snapshots.
Useful profile features:
- genre affinity
- theme affinity
- studio affinity
- demographic affinity
- completion rate by genre
- abandonment rate by genre
- preference for airing vs finished anime
- preference for recent vs older anime
- short-term interest profile
- long-term stable taste profile
These features should eventually live in a durable profile snapshot table so
the serving path remains cheap.
## Candidate Generation
Candidate generation should be modular. Each source should produce:
- `anime_id`
- `source`
- `source_score`
- explanation metadata
Primary candidate sources:
- item-item recommendation edges
- related anime and sequel chains
- content-similar anime from genres, themes, studios, and demographics
- trending titles inside the user taste envelope
- seasonal titles aligned with recent behavior
- editorial or promoted rails when needed
Candidate generation should stay bounded. Ranking the full catalog online is
not a viable long-term approach.
## Ranking
The current ranker is heuristic by design. That is the correct starting point.
Near-term ranking inputs:
- collaborative recommendation weight
- watch history status weight
- recency decay
- progress-based engagement
- genre overlap
- theme overlap
- studio overlap
- demographic overlap
- airing or freshness alignment
- popularity moderation
The ranking API should remain stable even if the scoring model changes later.
That allows a future move to gradient-boosted trees or other learned rankers
without rewriting candidate generation or serving.
## Reranking
The final serving stage should apply product constraints that raw ranking will
not handle well on its own:
- genre diversity
- franchise caps
- duplicate suppression
- hide or negative-feedback suppression
- maturity filtering
- freshness and exploration budget
This is intentionally a separate concern from relevance scoring.
## Data Tables
The first recommendation-specific schema additions should support:
- append-only event capture
- recommendation impression tracking
- cached user profile snapshots
These tables are created in migration `024_add_recommendation_foundation.sql`.
## Roadmap
### V1
- bounded hybrid ranker in request path
- uses watchlist history and Jikan metadata
- no offline jobs required
### V2
- capture user recommendation and watch behavior events
- persist user profile snapshots
- precompute candidate caches
- add explicit feedback controls such as hide or not interested
### V3
- split retrieval from ranking
- precompute similarity graphs and user candidate pools
- run offline evaluation on impressions, clicks, starts, and completes
- introduce learned ranking only when enough behavior data exists
## Operational Rules
- keep request-time fanout bounded
- keep scoring explainable
- log recommendation impressions before introducing heavier models
- prefer replaceable modules over one large recommendation function
- treat data collection as the foundation for later ML, not an optional extra