Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ctjlewis/the-algorithm
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: twitter/the-algorithm
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Able to merge. These branches can be automatically merged.

Commits on Mar 31, 2023

  1. Copy the full SHA
    2dbdfe1 View commit details
  2. Copy the full SHA
    ee5e7fc View commit details

Commits on Apr 1, 2023

  1. Copy the full SHA
    9115361 View commit details

Commits on Apr 4, 2023

  1. [home-mixer] fix has_gte_10k_favs typo

    Fixes twitter#384, closes twitter#242, closes twitter#362, closes twitter#572, closes twitter#577, closes twitter#650, closes twitter#745, closes twitter#935, closes twitter#1076, closes twitter#1079, closes twitter#1105, closes twitter#1561
    twitter-team committed Apr 4, 2023
    Copy the full SHA
    d1cab28 View commit details
  2. Merge pull request twitter#550 from MrAuro/improve-navi-docs

    (docs): Improve README file for Navi
    dzhao authored Apr 4, 2023
    Copy the full SHA
    9f0afc0 View commit details
  3. Update README.md

    dzhao authored Apr 4, 2023
    Copy the full SHA
    e8147d8 View commit details
  4. Merge pull request twitter#452 from pouriya/refactor-dr_transform

    Refactor `navi/dr_transform`
    dzhao authored Apr 4, 2023
    Copy the full SHA
    36588c6 View commit details
  5. [minor] Fix grammar + typo issues

    Closes twitter#557, closes twitter#678, closes twitter#748, closes twitter#806, closes twitter#818, closes twitter#842, closes twitter#866, closes twitter#948, closes twitter#1024, closes twitter#1313, closes twitter#1458, closes twitter#1461, closes twitter#1465, closes twitter#1491, closes twitter#1503, closes twitter#1539, closes twitter#1611
    twitter-team committed Apr 4, 2023
    Copy the full SHA
    bb09560 View commit details

Commits on Apr 5, 2023

  1. [VF] updates includes addressing Ukraine labels

    twitter-team committed Apr 5, 2023
    Copy the full SHA
    3f69746 View commit details
  2. [docs] Fix broken navi link in README

    twitter-team committed Apr 5, 2023
    Copy the full SHA
    3496189 View commit details
  3. [cr-mixer/home-mixer] Remove getLinearRankingParams in EarlybirdTen…

    …sorflowBasedSimilarityEngine
    
    Remove unused ranking params which are specified by services when making an Earlybird relevance search.
    
    For cr-mixer: since we always set useTensorflowRanking = true in EarlybirdSimilarityEngineRouter, we will only ever use the TensorFlowBasedScoringFunction for ranking search results. That function doesn't rely on any of the linear params specified in getLinearRankingParams, nor the boosts because we set applyBoosts = false in the request. These parameters are therefore strictly redundant.
    
    The parameters in home-mixer can be removed for essentially the same reason—the parameters are redundant given that we use the Tensorflow scoring function and don't apply boosts.
    twitter-team committed Apr 5, 2023
    Copy the full SHA
    138bb51 View commit details

Commits on Apr 14, 2023

  1. Open-sourcing Topic Social Proof Service

    Topic Social Proof Service (TSPS) delivers highly relevant topics tailored to a user's interests by analyzing topic preferences, such as following or unfollowing, and employing semantic annotations and other machine learning models.
    twitter-team committed Apr 14, 2023
    Copy the full SHA
    94ff4ca View commit details
  2. Open-sourcing User Signal Service

    User Signal Service (USS) is a centralized online platform that supplies comprehensive data on user actions and behaviors on Twitter. This service stores information on both explicit signals, such as Favorites, Retweets, and replies, and implicit signals like Tweet clicks, profile visits, and more.
    twitter-team committed Apr 14, 2023
    Copy the full SHA
    f1b5c32 View commit details
  3. Open-sourcing Unified User Actions

    Unified User Action (UUA) is a centralized, real-time stream of user actions on Twitter, consumed by various product, ML, and marketing teams. UUA makes sure all internal teams consume the uniformed user actions data in an accurate and fast way.
    twitter-team committed Apr 14, 2023
    Copy the full SHA
    617c8c7 View commit details
  4. [opensource] Update README to include all new modules

    Since the first batch of open sourcing, we have added the following components:
    - User signal service
    - Unified user actions
    - Topic social proof service
    
    Update the README to include these.
    twitter-team committed Apr 14, 2023
    Copy the full SHA
    6e5c875 View commit details

Commits on Apr 28, 2023

  1. Latest navi open source refresh

    latest code change including the global thread pool
    
    Closes twitter#452
    Closes twitter#505
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    4df87a2 View commit details
  2. [Medium][UUA] Clean up BCE in UUA

    This is to clean up the BCE adapters and services in UUA since BCE no longer exists.
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    23fa75d View commit details
  3. improvements from external prs

    -fix corner case where dr converter failed when initializing
    
    Closes twitter#550
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    31e82d6 View commit details
  4. User Signals in Candidate Sourcing Stage

    Add the overview readme about how Twitter uses user signals in candidate retrieval.
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    b5e849b View commit details
  5. Open-sourcing Timelines Aggregation Framework

    Open sourcing Aggregation Framework, a config-driven Summingbird based framework for generating real-time and batch aggregate features to be consumed by ML models.
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    197bf2c View commit details
  6. Open-sourcing Representation Manager

    Representation Manager (RMS) serves as a centralized embedding management system, providing SimClusters or other embeddings as facade of the underlying storage or services.
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    43cdcf2 View commit details
  7. Open-sourcing Representation Scorer

    Representation Scorer (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features.
    twitter-team committed Apr 28, 2023
    Copy the full SHA
    5edbbee View commit details
  8. Copy the full SHA
    90d7ea3 View commit details

Commits on May 19, 2023

  1. Open-sourcing Tweetypie

    Tweetypie is the core Tweet service that handles the reading and writing of Tweet data.
    twitter-team committed May 19, 2023
    Copy the full SHA
    01dbfee View commit details
  2. Open-sourcing pushservice

    Pushservice is the main recommendation service we use to surface recommendations to our users via notifications. It fetches candidates from various sources, ranks them in order of relevance, and applies filters to determine the best one to send.
    twitter-team committed May 19, 2023
    Copy the full SHA
    b389c3d View commit details

Commits on May 22, 2023

  1. README updates

    - renames pushservice readme.md to README.md
    - Minor changes to main README.md
    twitter-team committed May 22, 2023
    Copy the full SHA
    fb54d8b View commit details

Commits on Jul 13, 2023

  1. [opensource] Update home mixer with latest changes

    twitter-team committed Jul 13, 2023
    Copy the full SHA
    72eda9a View commit details
Showing 1,940 changed files with 174,682 additions and 4,424 deletions.
68 changes: 49 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,66 @@
# Twitter Recommendation Algorithm
# Twitter's Recommendation Algorithm

The Twitter Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The
diagram below illustrates how major services and jobs interconnect.
Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for serving feeds of Tweets and other content across all Twitter product surfaces (e.g. For You Timeline, Search, Explore, Notifications). For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm).

## Architecture

Product surfaces at Twitter are built on a shared set of data, models, and software frameworks. The shared components included in this repository are listed below:

| Type | Component | Description |
|------------|------------|------------|
| Data | [tweetypie](tweetypie/server/README.md) | Core Tweet service that handles the reading and writing of Tweet data. |
| | [unified-user-actions](unified_user_actions/README.md) | Real-time stream of user actions on Twitter. |
| | [user-signal-service](user-signal-service/README.md) | Centralized platform to retrieve explicit (e.g. likes, replies) and implicit (e.g. profile visits, tweet clicks) user signals. |
| Model | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. |
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
| | [topic-social-proof](topic-social-proof/README.md) | Identifies topics related to individual Tweets. |
| | [representation-scorer](representation-scorer/README.md) | Compute scores between pairs of entities (Users, Tweets, etc.) using embedding similarity. |
| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. |
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
| | [timelines-aggregation-framework](timelines/data_processing/ml_util/aggregation_framework/README.md) | Framework for generating aggregate features in batch or real time. |
| | [representation-manager](representation-manager/README.md) | Service to retrieve embeddings (i.e. SimClusers and TwHIN). |
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |

The product surfaces currently included in this repository are the For You Timeline and Recommended Notifications.

### For You Timeline

The diagram below illustrates how major services and jobs interconnect to construct a For You Timeline.

![](docs/system-diagram.png)

These are the main components of the Recommendation Algorithm included in this repository:
The core components of the For You Timeline included in this repository are listed below:

| Type | Component | Description |
|------------|------------|------------|
| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict likelihood of a Twitter User interacting with another User. |
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos) |
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos). |
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light ranker model used by search index (Earlybird) to rank Tweets. |
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light Ranker model used by search index (Earlybird) to rank Tweets. |
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md) |
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md). |
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
| Software framework | [navi](navi/navi/README.md) | High performance, machine learning model serving written in Rust. |
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |

We include Bazel BUILD files for most components, but not a top level BUILD or WORKSPACE file.
### Recommended Notifications

The core components of Recommended Notifications included in this repository are listed below:

| Type | Component | Description |
|------------|------------|------------|
| Service | [pushservice](pushservice/README.md) | Main recommendation service at Twitter used to surface recommendations to our users via notifications.
| Ranking | [pushservice-light-ranker](pushservice/src/main/python/models/light_ranking/README.md) | Light Ranker model used by pushservice to rank Tweets. Bridges candidate generation and heavy ranking by pre-selecting highly-relevant candidates from the initial huge candidate pool. |
| | [pushservice-heavy-ranker](pushservice/src/main/python/models/heavy_ranking/README.md) | Multi-task learning model to predict the probabilities that the target users will open and engage with the sent notifications. |

## Build and test code

We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. We plan to add a more complete build and test system in the future.

## Contributing

51 changes: 51 additions & 0 deletions RETREIVAL_SIGNALS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Signals for Candidate Sources

## Overview

The candidate sourcing stage within the Twitter Recommendation algorithm serves to significantly narrow down the item size from approximately 1 billion to just a few thousand. This process utilizes Twitter user behavior as the primary input for the algorithm. This document comprehensively enumerates all the signals during the candidate sourcing phase.

| Signals | Description |
| :-------------------- | :-------------------------------------------------------------------- |
| Author Follow | The accounts which user explicit follows. |
| Author Unfollow | The accounts which user recently unfollows. |
| Author Mute | The accounts which user have muted. |
| Author Block | The accounts which user have blocked |
| Tweet Favorite | The tweets which user clicked the like botton. |
| Tweet Unfavorite | The tweets which user clicked the unlike botton. |
| Retweet | The tweets which user retweeted |
| Quote Tweet | The tweets which user retweeted with comments. |
| Tweet Reply | The tweets which user replied. |
| Tweet Share | The tweets which user clicked the share botton. |
| Tweet Bookmark | The tweets which user clicked the bookmark botton. |
| Tweet Click | The tweets which user clicked and viewed the tweet detail page. |
| Tweet Video Watch | The video tweets which user watched certain seconds or percentage. |
| Tweet Don't like | The tweets which user clicked "Not interested in this tweet" botton. |
| Tweet Report | The tweets which user clicked "Report Tweet" botton. |
| Notification Open | The push notification tweets which user opened. |
| Ntab click | The tweets which user click on the Notifications page. |
| User AddressBook | The author accounts identifiers of the user's addressbook. |

## Usage Details

Twitter uses these user signals as training labels and/or ML features in the each candidate sourcing algorithms. The following tables shows how they are used in the each components.

| Signals | USS | SimClusters | TwHin | UTEG | FRS | Light Ranking |
| :-------------------- | :----------------- | :----------------- | :----------------- | :----------------- | :----------------- | :----------------- |
| Author Follow | Features | Features / Labels | Features / Labels | Features | Features / Labels | N/A |
| Author Unfollow | Features | N/A | N/A | N/A | N/A | N/A |
| Author Mute | Features | N/A | N/A | N/A | Features | N/A |
| Author Block | Features | N/A | N/A | N/A | Features | N/A |
| Tweet Favorite | Features | Features | Features / Labels | Features | Features / Labels | Features / Labels |
| Tweet Unfavorite | Features | Features | N/A | N/A | N/A | N/A |
| Retweet | Features | N/A | Features / Labels | Features | Features / Labels | Features / Labels |
| Quote Tweet | Features | N/A | Features / Labels | Features | Features / Labels | Features / Labels |
| Tweet Reply | Features | N/A | Features | Features | Features / Labels | Features |
| Tweet Share | Features | N/A | N/A | N/A | Features | N/A |
| Tweet Bookmark | Features | N/A | N/A | N/A | N/A | N/A |
| Tweet Click | Features | N/A | N/A | N/A | Features | Labels |
| Tweet Video Watch | Features | Features | N/A | N/A | N/A | Labels |
| Tweet Don't like | Features | N/A | N/A | N/A | N/A | N/A |
| Tweet Report | Features | N/A | N/A | N/A | N/A | N/A |
| Notification Open | Features | Features | Features | N/A | Features | N/A |
| Ntab click | Features | Features | Features | N/A | Features | N/A |
| User AddressBook | N/A | N/A | N/A | N/A | Features | N/A |
2 changes: 1 addition & 1 deletion ann/src/main/python/dataflow/faiss_index_bq_dataset.py
Original file line number Diff line number Diff line change
@@ -91,7 +91,7 @@ def parse_metric(config):
elif metric_str == "linf":
return faiss.METRIC_Linf
else:
raise Exception(f"Uknown metric: {metric_str}")
raise Exception(f"Unknown metric: {metric_str}")


def run_pipeline(argv=[]):
4 changes: 2 additions & 2 deletions cr-mixer/README.md
Original file line number Diff line number Diff line change
@@ -2,6 +2,6 @@

CR-Mixer is a candidate generation service proposed as part of the Personalization Strategy vision for Twitter. Its aim is to speed up the iteration and development of candidate generation and light ranking. The service acts as a lightweight coordinating layer that delegates candidate generation tasks to underlying compute services. It focuses on Twitter's candidate generation use cases and offers a centralized platform for fetching, mixing, and managing candidate sources and light rankers. The overarching goal is to increase the speed and ease of testing and developing candidate generation pipelines, ultimately delivering more value to Twitter users.

CR-Mixer act as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
CR-Mixer acts as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.

CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.
Loading