Files
mal/docs/recommendation-architecture.md

4.8 KiB

Recommendation Architecture

This document defines the long-term shape of the Top Pick for You recommendation system. The goal is to keep the current implementation simple enough to operate inside the existing Go application while preserving a clean path toward a larger recommender system.

Current Serving Model

The current Top Pick for You implementation is a bounded hybrid ranker:

  • builds weighted seeds from user watch history
  • uses Jikan recommendation edges as collaborative candidates
  • uses watchlist-derived genres, themes, studios, and demographics as profile search candidates
  • excludes anime already present in the watchlist
  • boosts candidates that match user taste signals
  • reranks the final list to reduce genre pileups

The online request path stays intentionally small:

  1. load recent watchlist state
  2. derive strong seeds
  3. build a weighted taste profile from those seeds
  4. fetch bounded collaborative and profile-search candidate sets
  5. score candidates
  6. rerank for diversity
  7. return top results

Target System Shape

The future recommender should keep four stable layers:

  1. event collection
  2. feature aggregation
  3. candidate generation
  4. ranking and reranking

That separation matters more than the specific model used at each stage.

Event Collection

Recommendations should eventually be driven by behavior events, not only by watchlist state.

Important events:

  • impression
  • click
  • add_to_watchlist
  • start_watch
  • progress_update
  • complete
  • drop
  • hide_recommendation
  • search

Event capture should preserve:

  • user_id
  • anime_id
  • event_type
  • occurred_at
  • source
  • contextual metadata as JSON

Feature Aggregation

Online requests should not recompute the full user profile from raw events. Instead, background jobs should maintain aggregated feature snapshots.

Useful profile features:

  • genre affinity
  • theme affinity
  • studio affinity
  • demographic affinity
  • completion rate by genre
  • abandonment rate by genre
  • preference for airing vs finished anime
  • preference for recent vs older anime
  • short-term interest profile
  • long-term stable taste profile

These features should eventually live in a durable profile snapshot table so the serving path remains cheap.

Candidate Generation

Candidate generation should be modular. Each source should produce:

  • anime_id
  • source
  • source_score
  • explanation metadata

Primary candidate sources:

  • item-item recommendation edges
  • related anime and sequel chains
  • content-similar anime from genres, themes, studios, and demographics
  • trending titles inside the user taste envelope
  • seasonal titles aligned with recent behavior
  • editorial or promoted rails when needed

Candidate generation should stay bounded. Ranking the full catalog online is not a viable long-term approach.

Ranking

The current ranker is heuristic by design. That is the correct starting point.

Near-term ranking inputs:

  • collaborative recommendation weight
  • watch history status weight
  • recency decay
  • progress-based engagement
  • genre overlap
  • theme overlap
  • studio overlap
  • demographic overlap
  • airing or freshness alignment
  • popularity moderation

The ranking API should remain stable even if the scoring model changes later. That allows a future move to gradient-boosted trees or other learned rankers without rewriting candidate generation or serving.

Reranking

The final serving stage should apply product constraints that raw ranking will not handle well on its own:

  • genre diversity
  • franchise caps
  • duplicate suppression
  • hide or negative-feedback suppression
  • maturity filtering
  • freshness and exploration budget

This is intentionally a separate concern from relevance scoring.

Data Tables

The first recommendation-specific schema additions should support:

  • append-only event capture
  • recommendation impression tracking
  • cached user profile snapshots

These tables are created in migration 024_add_recommendation_foundation.sql.

Roadmap

V1

  • bounded hybrid ranker in request path
  • uses watchlist history and Jikan metadata
  • no offline jobs required

V2

  • capture user recommendation and watch behavior events
  • persist user profile snapshots
  • precompute candidate caches
  • add explicit feedback controls such as hide or not interested

V3

  • split retrieval from ranking
  • precompute similarity graphs and user candidate pools
  • run offline evaluation on impressions, clicks, starts, and completes
  • introduce learned ranking only when enough behavior data exists

Operational Rules

  • keep request-time fanout bounded
  • keep scoring explainable
  • log recommendation impressions before introducing heavier models
  • prefer replaceable modules over one large recommendation function
  • treat data collection as the foundation for later ML, not an optional extra