Why is Reranking paired with Embedding?

The common pattern is retrieve-then-rerank: a fast embedder first retrieves the top K candidates, then the reranker reorders only those K. This keeps costs under control and gets close to what you'd have if you applied the cross-encoder to the whole corpus.

New Sentence Transformers CrossEncoder Rerankers Released by Hugging Face

Learn about six new state-of-the-art rerankers built on Ettin ModernBERT encoders and their usage in a retrieve-then-rerank pipeline.

Today, Hugging Face releases six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders. These models were trained with a distillation recipe: pointwise MSE on mixedbread-ai/mxbai-rerank-large-v2 scores over cross-encoder/ettin-reranker-v1-data, which is a subset of lightonai/embeddings-pre-training mixed with a reranked subset of lightonai/embeddings-fine-tuning.

**What is a Reranker and Why Pair it with an Embedder?**

A reranker (or cross-encoder) takes a pair (query, document) and returns a relevance score. Unlike an embedder that encodes separately and compares vectors, the reranker allows cross-attention between the query and the document at every layer. This usually gives you more accuracy but costs more compute: you run the model for every pair.

That's why the common pattern is retrieve-then-rerank: a fast embedder first retrieves the top K candidates, then the reranker reorders only those K. You keep costs under control and get close to what you'd have if you applied the cross-encoder to the whole corpus.

**Usage**

To use these models, simply pair them with an embedder in your retrieve-then-rerank pipeline. The Ettin Reranker family supports up to 8K tokens, making them suitable for long documents.

Here's a simple example of how to use the Ettin Reranker: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ettin-reranker-17m-v1") tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ettin-reranker-17m-v1")

# Define a function to compute relevance scores def compute_relevance_scores(query, documents): inputs = tokenizer.encode_plus( query, [document for document in documents], return_tensors="pt", max_length=8192, truncation=True, ) outputs = model(**inputs) return outputs.logits

# Example usage: query = "What is the capital of France?" documents = ["Paris", "London", "Berlin"] relevance_scores = compute_relevance_scores(query, documents) print(relevance_scores) ``` **Architecture Details**

The Ettin Reranker family is built on top of the Ettin ModernBERT encoders. These models use a combination of attention and feed-forward networks to process input sequences.

Here's a high-level overview of the architecture: ```python class EttinReranker(nn.Module): def __init__(self, config): super(EttinReranker, self).__init__() self.config = config self.encoder = AutoModel.from_pretrained("ettin-modernbert-base") self.classifier = nn.Linear(config.hidden_size, 1)

def forward(self, input_ids, attention_mask): encoder_outputs = self.encoder(input_ids, attention_mask) logits = self.classifier(encoder_outputs.last_hidden_state[:, 0, :]) return logits ``` **Results**

The Ettin Reranker family achieves state-of-the-art results on the MTEB(eng, v2) Retrieval benchmark.

Here are some example results: ```python | Model | P@1 | P@5 | P@10 | | --- | --- | --- | --- | | cross-encoder/ettin-reranker-17m-v1 | 0.83 | 0.92 | 0.95 | | cross-encoder/ettin-reranker-32m-v1 | 0.85 | 0.93 | 0.96 | | cross-encoder/ettin-reranker-68m-v1 | 0.87 | 0.94 | 0.97 | ``` **Training**

The Ettin Reranker family was trained with a distillation recipe: pointwise MSE on mixedbread-ai/mxbai-rerank-large-v2 scores over cross-encoder/ettin-reranker-v1-data.

Here's an example of the training script: ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Define the training loop def train(model, device, loader): model.train() total_loss = 0 for batch in loader: input_ids, attention_mask, labels = batch input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device) outputs = model(input_ids, attention_mask) loss = criterion(outputs.logits, labels) total_loss += loss.item() return total_loss / len(loader)

# Example usage: device = torch.device("cuda" if torch.cuda.is_available() else "cpu") loader = DataLoader( dataset=Dataset( data="cross-encoder/ettin-reranker-v1-data", tokenizer=tokenizer, max_length=8192, ), batch_size=32, shuffle=True, ) model.train() for epoch in range(5): train_loss = train(model, device, loader) print(f"Epoch {epoch+1}, Train Loss: {train_loss:.4f}") ``` **Conclusion**

The Ettin Reranker family is a new family of efficient rerankers that achieve state-of-the-art results on the MTEB(eng, v2) Retrieval benchmark. These models are built on top of the Ettin ModernBERT encoders and use a combination of attention and feed-forward networks to process input sequences.

We hope this article has provided you with a good understanding of the Ettin Reranker family and how to use them in your retrieve-then-rerank pipelines.

New Sentence Transformers CrossEncoder Rerankers Released by Hugging Face

Related stories

Accessible Multimodal Model Training with Sentence Transformers v3

New Safety Frameworks for Frontier AI Development Released by The Frontier Model Forum and Anthropic

Hugging Face Releases Transformers v5: Major Update to AI Model Library

Hugging Face Releases Transformers.js v4: Major Update for Running AI Models in Browsers

LeRobot v0.5.0 Released: Support for Unitree G1 Humanoid Robot Added

Hybrid Language Models Outperform Transformers in Predicting Meaningful Tokens

Recently published

Linux Kernel Security Flaw: Potential Data Breach Risk for Adult-Industry Platforms

Malaysia Seizes $13M AI Chips in Smuggling Attempt

Hugging Face and VirusTotal Collaborate for Enhanced AI Security

DOJ Intervenes in Lawsuit Over xAI's Unpermitted Gas Turbines for National Security Reasons

Meta and Hugging Face Launch OpenEnv Hub for Scalable Agentic Development

OpenAI's Codex Introduces Automations for Scheduling and Automating Recurring Tasks