Skip to content

Discover your next Milestone.

Choose from industry-vetted challenges. Build local, push to GitHub, and earn cryptographic proof of your engineering skills.

LLM Evaluation Framework

genaiIntermediate365d access
149onwards

Define test cases, score LLM outputs on accuracy, faithfulness, and tone. Build a regression tracker that alerts you when a prompt change breaks a passing test.

  • Design a rigorous evaluation test suite with multiple scoring dimensions
  • Use an LLM-as-judge to automatically score other LLM outputs
  • Track performance across prompt versions and detect regressions
  • Calculate inter-rater agreement between automated and human evaluators

Multi-Document RAG with Reranking

genaiIntermediate365d access
149onwards

Search across multiple document collections, rerank chunks by relevance using a cross-encoder, and generate a cited, structured answer from multiple sources.

  • Index and query documents from multiple distinct sources in a single vector store
  • Understand why initial vector search returns noisy results and why reranking helps
  • Integrate Cohere Rerank to score retrieved chunks by true relevance
  • Instruct an LLM to cite specific source documents in its generated answers

Context Engineering Pipeline

genaiIntermediate365d access
149onwards

Build a RAG system that dynamically assembles context based on query type — metadata filters, recency weighting, source diversity rules, and dynamic prompt templates.

  • Build a rigorous evaluation framework with a fixed test set and baseline score
  • Add document metadata and use it to filter retrieved chunks before answer generation
  • Classify query intent and apply different prompt templates for each type
  • Measure the impact of each RAG improvement quantitatively against a baseline
12k+
Verified Developers
150+
Active Projects
450+
Companies Hiring
14 Days
Avg. Completion

Got questions?

Every challenge includes detailed documentation, technical constraints, and automated evaluation scripts to ensure you have everything you need to succeed.