BM25

What is BM25?

BM25 (Best Matching 25) is a formula that scores how well a document matches a search query. It looks at how often query words appear in a document, how rare those words are across all documents, and adjusts for the document’s length, giving a single relevance number.

Let's break it down

BM25: a name for a specific ranking formula used in information retrieval.
Best Matching: means it tries to find the most relevant matches.
25: just a version number; earlier versions existed before this one.
Term frequency: counts how many times a query word shows up in a document.
Inverse document frequency: measures how uncommon a word is across the whole collection; rare words get more weight.
Document length normalization: shortens the advantage long documents have just because they contain more words, balancing the score.
Score: a single number that tells you how good the match is; higher = better.

Why does it matter?

Because it turns a messy collection of text into an ordered list of the most useful results, helping people find what they need quickly-whether they’re searching the web, a library, or an online store.

Where is it used?

Search engines like Elasticsearch and Apache Solr to rank web pages.
E-commerce sites to surface the most relevant products when shoppers type a query.
Digital libraries and academic databases to retrieve the most pertinent research papers.
Recommendation systems that match user queries to relevant content or media.

Good things about it

Simple to understand and implement compared to deep-learning models.
Fast to compute, making it suitable for real-time search.
Works well with short, keyword-based queries, which are common in many applications.
Has tunable parameters (k1, b) that let you adapt it to different data sets.
Proven effectiveness; it’s a strong baseline that often outperforms more complex methods.

Not-so-good things

Ignores the meaning and order of words, so it can miss semantic relevance.
Requires good tokenization and preprocessing; poor handling of language nuances hurts performance.
Parameter tuning can be tricky; wrong settings may degrade results.
Less effective for very large vocabularies or queries with many rare terms.