Relearning Ranking

Bringing Back the Old School

Nicholas Khami

Founder/CEO
trieve

  • we serve over 1M re-ranking queries every week

Twitter: @skeptrune
LinkedIn: - nicholas-khami
GitHub: github.com/skeptrunedev

Agenda | Old to New

  • What is Learning to Rank?
  • Types of LTR
  • Well known implementations
    ------------------------------
  • Cross Encoders
  • Cross Encoders vs. LTR
  • Vector DB's are kind of cooked

Old School Learning to Rank

What is LTR?

  • multiple scores exist per retrieved doc (upvotes, bm25, cos_dist, backlinks)
  • LTR describes using ML model to sort based on weighted combination of scores

How to train?

Supervised learning wth labeled ground truth data
  1. Good: pointwise regression
  2. Better: pairwise classification
  3. Best: holistic listwise ordering

Well known LTR systems

  • Google combines backlinks, fulltext, and other signals
  • Facebook combines similarity, closeness to network
  • Reddit combines upvotes and relevance in search

New School Neural Re-ranker Models

What is a "re-ranker"

  • Technically cross-encoder(s)
  • Transformer network that scores query<->doc pair
  • Closest to a pointwise-regression in LTR context

Why use cross-encoders?

  • Usually worse than LTR
  • Main reason to use is due to lack of labeled data
  • PSA: latency-optimize with Trieve Vector Inference

Vector DB's are kind of cooked

   OpenSearch/Solr have LTR
+ Vector DB's don't have LTR
+ OpenSearch/Solr have Vectors and more
----
= Vector DB's are cooked

Questions

Thank you!