Social Graph-based Content Recommendation System

A content recommendation solution that utilizes graph embeddings from a social network

From “What should we watch?” to a recommendation experiment

Everyone asks for a few good shows. I treated that recurring question as a product hypothesis. If a social watchlist app can surface titles people actually want together, the idea is worth building. The experiment was not “ship a perfect recommender,” but learn whether the loop of data → model → API → user feedback could run end-to-end and where it would break in production.

What I built on a high-level

The serving path is deliberately training-heavy offline, inference-light online. A graph neural network over users and titles learns embeddings from interaction history.
At request time the system re-scores a filtered candidate pool instead of re-running full graph training. Models and graph artifacts live in object storage; the API loads once, precomputes node embeddings, then uses a fast scorer per request.
A cache holds ephemeral personalization state so repeat visits stay cheap. Relational data stores catalogs, users, watch history, and experiment assignments so recommendations stay auditable and you can tie outputs back to a model version or A/B variant.

Offline artifacts, online latency

Training stays offline and serves loads versioned weights and graph bundles from storage, builds the model defined as a Bipartite GNN, materializes node embeddings once, then answers requests with a link predictor over precomputed movie vectors. Startup follows the production path implemented in the inference engine: load weights, load graph tensors and ID mappings, then precompute caches.

# inference/engine.py — production wiring (excerpt)async def set_production_model(self, version: str, metadata: dict) -> None:    await self.load_model(version, metadata)    graph_data = await self.load_graph_data(version)    await self.precompute_embeddings(version, graph_data)    self.production_version = version    # ...

precompute_embeddings runs a single forward pass to fill cached_user_emb and cached_movie_emb, and derives cold_start_emb as the mean user embedding i.e. the default “anonymous” user vector when the requester is not a known training user.

# inference/engine.py — precompute_embeddings (excerpt)user_emb, movie_emb = model.get_node_embeddings(    user_features=user_features,    movie_features=movie_features,    edge_index=edge_index,    return_all=True,)self.cached_user_emb = user_embself.cached_movie_emb = movie_embself.cold_start_emb = user_emb.mean(dim=0)

At scoring time, predict_for_user encodes the actual resolution order for a request: a real index into the training graph if it exists, else a proxy tensor passed in from the API, else the population mean. The model’s link_predictor then scores only the candidate internal movie indices passed in.

# inference/engine.py — predict_for_user (excerpt)if user_idx is not None and user_idx < self.cached_user_emb.shape[0]:    user_emb = self.cached_user_emb[user_idx]elif user_embedding is not None:    user_emb = user_embedding.to(self.device)else:    user_emb = self.cold_start_emb# ...scores = model.link_predictor(user_emb_expanded, movie_emb).squeeze(-1)

That execution model is why the product can stay responsive with one heavy precompute at load and batched tensor sized to the candidate list.

Two-stage recall, then GNN rank

The recommendations route does not score the whole database. It builds SQL filters (genres, years, minimum rating, excluding watched rows), caps the working set with CANDIDATE_POOL_SIZE, and only then calls the GNN path. If the engine is not ready, the same filters still apply but ordering falls back to a Trakt rating in the database.

The scorer splits each batch into movies that have a trakt_id mapping into the training graph versus everything else. Mapped items go through prediction in inference_engine.predict_for_user. The remainder is padded by sorting unmapped rows by trakt_rating and normalizing to a 0–1 style score, so the user still gets a full list when the graph does not cover every catalog title.

# api/routers/recommendations.py — _score_movies_with_gnn (excerpt)for movie in candidate_movies:    if movie.trakt_id is not None:        internal_idx = inference_engine.trakt_id_to_movie_idx.get(movie.trakt_id)        if internal_idx is not None:            gnn_movies.append(movie)            gnn_indices.append(internal_idx)            continue    fallback_movies.append(movie)# ...if gnn_indices:    top_indices, top_scores = await inference_engine.predict_for_user(        user_idx=user_idx,        candidate_movie_indices=gnn_indices,        top_k=limit,        user_embedding=user_embedding,    )# ...if len(scored) < limit:    fallback_movies.sort(key=lambda m: float(m.trakt_rating or 0), reverse=True)    for movie in fallback_movies[: limit - len(scored)]:        scored.append((movie, float(movie.trakt_rating or 0) / 100.0))

So, the execution here is literally partition → GNN on the mapped slice → rating padding→ rank with coverage repair.

Cold start and social priors

The API comment in the router states the core tension: the model was trained on a MovieLens-era graph, so real users typically do not have a stable internal_user_idx. Cold start is not an abstract slide, it is user_idx is None plus whatever proxy you can construct.

The router loads optional embedding lists from User.preferences, gathers mutually accepted friends’ stored embeddings, and blends with an explicit weight. This results in a real blend_embeddings helper and not a hypothetical social layer.

# api/routers/recommendations.pydef blend_embeddings(    self_emb: list[float] | None,    friend_embs: list[list[float]],    friend_weight: float = 0.3,) -> torch.Tensor | None:    # ...    friends_mean = friends_tensor.mean(dim=0)    if self_emb:        self_tensor = torch.tensor(self_emb, dtype=torch.float32)        return (1 - friend_weight) * self_tensor + friend_weight * friends_mean    return friends_mean

If there is still no vector, the route may tighten recall using preferred_genres from preferences when the client did not pass explicit genre filters. This is again so cold users see something coherent rather than a random 500-title slice. If that tightening yields zero rows, the code backs off to the looser filter set so the response is never empty for that reason alone.

# api/routers/recommendations.py — query path (excerpt)query = select(Movie)if filters:    query = query.where(and_(*filters))query = query.limit(_CANDIDATE_POOL_SIZE)# ...if not candidate_movies and cold_start_genre_filter_applied:    fallback_filters = _build_movie_filters(        exclude_watched=exclude_watched,        watched_ids=watched_ids,        year_min=year_min,        year_max=year_max,        min_rating=min_rating,    )    # re-query without the cold-start genre tightening

Together, these branches are the executable definition of cold-start policy in this project.

Experiments and attribution

When a running class ABExperiment exists, users are assigned to a variant with a stable hash of user and experiment IDs; impressions bump on the variant row, and each recommendation write includes model_id, experiment_id, and variant_id in RecommendationHistory. That is how execution ties traffic → variant → logged slate without inventing a separate analytics pipeline first.

What actually blocks the next chapter

Cold start and domain shift show up directly in the code path above: missing graph user index, proxy embeddings living in JSON preferences, genre tightening, and Trakt padding are all compensations for not yet having product-scale interaction data aligned with the training graph.

Compute shows up in configuration as EC2 Spot-oriented training defaults—instance family, max spot bid, checkpoint interval—i.e. the project was designed knowing that retrain loops cost money, not just curiosity.

# api/config.py (excerpt)training_instance_type: str = "g4dn.xlarge"training_max_spot_price: float = 0.20training_checkpoint_interval: int = 5

Those are the practical brakes on “iterate the graph overnight”: credits cap how often you can refresh weights and how aggressively you can explore alternatives, so progress shifts toward instrumented serving and honest fallbacks, which is exactly what this codebase encodes.

Summary

Turning “any good shows?” into an experiment meant measuring whether the system could run using storage-backed artifacts, precomputed embeddings, bounded candidate scoring, cold-start policy in the router, and experiment logging in the database. The remaining work is fundamentally data and budget: close the gap between MovieLens-era training users and real ToGather behavior, and buy enough training cycles to make that gap shrink. Until then, the most valuable output is not a leaderboard score, it is applied ML discipline baked into how requests actually execute.