docs: add recommendation architecture document
This commit is contained in:
179
docs/recommendation-architecture.md
Normal file
179
docs/recommendation-architecture.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Recommendation Architecture
|
||||
|
||||
This document defines the long-term shape of the `For You` discovery system.
|
||||
The goal is to keep the current implementation simple enough to operate inside
|
||||
the existing Go application while preserving a clean path toward a larger
|
||||
recommender system.
|
||||
|
||||
## Current Serving Model
|
||||
|
||||
The current `For You` implementation is a bounded hybrid ranker:
|
||||
|
||||
- builds weighted seeds from user watch history
|
||||
- uses Jikan recommendation edges as collaborative candidates
|
||||
- excludes anime already present in the watchlist
|
||||
- boosts candidates that match user taste signals
|
||||
- reranks the final list to reduce genre pileups
|
||||
|
||||
The online request path stays intentionally small:
|
||||
|
||||
1. load recent watchlist state
|
||||
2. derive strong seeds
|
||||
3. fetch bounded candidate set
|
||||
4. score candidates
|
||||
5. rerank for diversity
|
||||
6. return top results
|
||||
|
||||
## Target System Shape
|
||||
|
||||
The future recommender should keep four stable layers:
|
||||
|
||||
1. event collection
|
||||
2. feature aggregation
|
||||
3. candidate generation
|
||||
4. ranking and reranking
|
||||
|
||||
That separation matters more than the specific model used at each stage.
|
||||
|
||||
## Event Collection
|
||||
|
||||
Recommendations should eventually be driven by behavior events, not only by
|
||||
watchlist state.
|
||||
|
||||
Important events:
|
||||
|
||||
- `impression`
|
||||
- `click`
|
||||
- `add_to_watchlist`
|
||||
- `start_watch`
|
||||
- `progress_update`
|
||||
- `complete`
|
||||
- `drop`
|
||||
- `hide_recommendation`
|
||||
- `search`
|
||||
|
||||
Event capture should preserve:
|
||||
|
||||
- `user_id`
|
||||
- `anime_id`
|
||||
- `event_type`
|
||||
- `occurred_at`
|
||||
- `source`
|
||||
- contextual metadata as JSON
|
||||
|
||||
## Feature Aggregation
|
||||
|
||||
Online requests should not recompute the full user profile from raw events.
|
||||
Instead, background jobs should maintain aggregated feature snapshots.
|
||||
|
||||
Useful profile features:
|
||||
|
||||
- genre affinity
|
||||
- theme affinity
|
||||
- studio affinity
|
||||
- demographic affinity
|
||||
- completion rate by genre
|
||||
- abandonment rate by genre
|
||||
- preference for airing vs finished anime
|
||||
- preference for recent vs older anime
|
||||
- short-term interest profile
|
||||
- long-term stable taste profile
|
||||
|
||||
These features should eventually live in a durable profile snapshot table so
|
||||
the serving path remains cheap.
|
||||
|
||||
## Candidate Generation
|
||||
|
||||
Candidate generation should be modular. Each source should produce:
|
||||
|
||||
- `anime_id`
|
||||
- `source`
|
||||
- `source_score`
|
||||
- explanation metadata
|
||||
|
||||
Primary candidate sources:
|
||||
|
||||
- item-item recommendation edges
|
||||
- related anime and sequel chains
|
||||
- content-similar anime from genres, themes, studios, and demographics
|
||||
- trending titles inside the user taste envelope
|
||||
- seasonal titles aligned with recent behavior
|
||||
- editorial or promoted rails when needed
|
||||
|
||||
Candidate generation should stay bounded. Ranking the full catalog online is
|
||||
not a viable long-term approach.
|
||||
|
||||
## Ranking
|
||||
|
||||
The current ranker is heuristic by design. That is the correct starting point.
|
||||
|
||||
Near-term ranking inputs:
|
||||
|
||||
- collaborative recommendation weight
|
||||
- watch history status weight
|
||||
- recency decay
|
||||
- progress-based engagement
|
||||
- genre overlap
|
||||
- theme overlap
|
||||
- studio overlap
|
||||
- demographic overlap
|
||||
- airing or freshness alignment
|
||||
- popularity moderation
|
||||
|
||||
The ranking API should remain stable even if the scoring model changes later.
|
||||
That allows a future move to gradient-boosted trees or other learned rankers
|
||||
without rewriting candidate generation or serving.
|
||||
|
||||
## Reranking
|
||||
|
||||
The final serving stage should apply product constraints that raw ranking will
|
||||
not handle well on its own:
|
||||
|
||||
- genre diversity
|
||||
- franchise caps
|
||||
- duplicate suppression
|
||||
- hide or negative-feedback suppression
|
||||
- maturity filtering
|
||||
- freshness and exploration budget
|
||||
|
||||
This is intentionally a separate concern from relevance scoring.
|
||||
|
||||
## Data Tables
|
||||
|
||||
The first recommendation-specific schema additions should support:
|
||||
|
||||
- append-only event capture
|
||||
- recommendation impression tracking
|
||||
- cached user profile snapshots
|
||||
|
||||
These tables are created in migration `024_add_recommendation_foundation.sql`.
|
||||
|
||||
## Roadmap
|
||||
|
||||
### V1
|
||||
|
||||
- bounded hybrid ranker in request path
|
||||
- uses watchlist history and Jikan metadata
|
||||
- no offline jobs required
|
||||
|
||||
### V2
|
||||
|
||||
- capture user recommendation and watch behavior events
|
||||
- persist user profile snapshots
|
||||
- precompute candidate caches
|
||||
- add explicit feedback controls such as hide or not interested
|
||||
|
||||
### V3
|
||||
|
||||
- split retrieval from ranking
|
||||
- precompute similarity graphs and user candidate pools
|
||||
- run offline evaluation on impressions, clicks, starts, and completes
|
||||
- introduce learned ranking only when enough behavior data exists
|
||||
|
||||
## Operational Rules
|
||||
|
||||
- keep request-time fanout bounded
|
||||
- keep scoring explainable
|
||||
- log recommendation impressions before introducing heavier models
|
||||
- prefer replaceable modules over one large recommendation function
|
||||
- treat data collection as the foundation for later ML, not an optional extra
|
||||
@@ -0,0 +1,62 @@
|
||||
-- +goose Up
|
||||
CREATE TABLE IF NOT EXISTS recommendation_event (
|
||||
id TEXT PRIMARY KEY,
|
||||
user_id TEXT NOT NULL,
|
||||
anime_id INTEGER,
|
||||
event_type TEXT NOT NULL,
|
||||
source TEXT,
|
||||
metadata_json TEXT,
|
||||
occurred_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY(user_id) REFERENCES user(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY(anime_id) REFERENCES anime(id) ON DELETE SET NULL
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_recommendation_event_user_occurred_at
|
||||
ON recommendation_event(user_id, occurred_at DESC);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_recommendation_event_user_event_type_occurred_at
|
||||
ON recommendation_event(user_id, event_type, occurred_at DESC);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_recommendation_event_anime_occurred_at
|
||||
ON recommendation_event(anime_id, occurred_at DESC);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS recommendation_impression (
|
||||
id TEXT PRIMARY KEY,
|
||||
user_id TEXT NOT NULL,
|
||||
anime_id INTEGER NOT NULL,
|
||||
rail TEXT NOT NULL,
|
||||
position INTEGER NOT NULL,
|
||||
request_id TEXT,
|
||||
metadata_json TEXT,
|
||||
occurred_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY(user_id) REFERENCES user(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY(anime_id) REFERENCES anime(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_recommendation_impression_user_occurred_at
|
||||
ON recommendation_impression(user_id, occurred_at DESC);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_recommendation_impression_request_id
|
||||
ON recommendation_impression(request_id);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS recommendation_profile_snapshot (
|
||||
user_id TEXT PRIMARY KEY,
|
||||
profile_json TEXT NOT NULL,
|
||||
source_window_start DATETIME,
|
||||
source_window_end DATETIME,
|
||||
computed_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY(user_id) REFERENCES user(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- +goose Down
|
||||
DROP TABLE IF EXISTS recommendation_profile_snapshot;
|
||||
DROP INDEX IF EXISTS idx_recommendation_impression_request_id;
|
||||
DROP INDEX IF EXISTS idx_recommendation_impression_user_occurred_at;
|
||||
DROP TABLE IF EXISTS recommendation_impression;
|
||||
DROP INDEX IF EXISTS idx_recommendation_event_anime_occurred_at;
|
||||
DROP INDEX IF EXISTS idx_recommendation_event_user_event_type_occurred_at;
|
||||
DROP INDEX IF EXISTS idx_recommendation_event_user_occurred_at;
|
||||
DROP TABLE IF EXISTS recommendation_event;
|
||||
Reference in New Issue
Block a user