Skip to content

Discover your next Milestone.

Choose from industry-vetted challenges. Build local, push to GitHub, and earn cryptographic proof of your engineering skills.

Production RAG with Hybrid Search

genaiAdvanced365d access
199onwards

Combine dense vector search with BM25 keyword search. Add query rewriting, hypothetical document embedding (HyDE), streaming response, and a full eval suite.

  • Combine dense vector search and BM25 keyword search using Reciprocal Rank Fusion
  • Implement query rewriting to improve retrieval quality for ambiguous queries
  • Apply HyDE (Hypothetical Document Embedding) to boost semantic search precision
  • Build a streaming FastAPI + Streamlit layer on top of a production RAG system

Fine-tune a Small LLM on Custom Data

genaiAdvanced365d access
199onwards

Prepare an instruction-tuning dataset, fine-tune Phi-2 or Mistral 7B using LoRA/QLoRA on free Colab TPUs, and rigorously evaluate the fine-tuned model vs. the base.

  • Curate and format a high-quality instruction-tuning dataset in JSONL format
  • Understand LoRA and QLoRA — how parameter-efficient fine-tuning works
  • Fine-tune a small open-source LLM (Phi-2 or Mistral 7B) on Google Colab for free
  • Evaluate a fine-tuned model rigorously against its base model using an LLM judge

LLM Evaluation Framework

genaiIntermediate365d access
149onwards

Define test cases, score LLM outputs on accuracy, faithfulness, and tone. Build a regression tracker that alerts you when a prompt change breaks a passing test.

  • Design a rigorous evaluation test suite with multiple scoring dimensions
  • Use an LLM-as-judge to automatically score other LLM outputs
  • Track performance across prompt versions and detect regressions
  • Calculate inter-rater agreement between automated and human evaluators

Multi-Document RAG with Reranking

genaiIntermediate365d access
149onwards

Search across multiple document collections, rerank chunks by relevance using a cross-encoder, and generate a cited, structured answer from multiple sources.

  • Index and query documents from multiple distinct sources in a single vector store
  • Understand why initial vector search returns noisy results and why reranking helps
  • Integrate Cohere Rerank to score retrieved chunks by true relevance
  • Instruct an LLM to cite specific source documents in its generated answers

Context Engineering Pipeline

genaiIntermediate365d access
149onwards

Build a RAG system that dynamically assembles context based on query type — metadata filters, recency weighting, source diversity rules, and dynamic prompt templates.

  • Build a rigorous evaluation framework with a fixed test set and baseline score
  • Add document metadata and use it to filter retrieved chunks before answer generation
  • Classify query intent and apply different prompt templates for each type
  • Measure the impact of each RAG improvement quantitatively against a baseline

Long Document Summarizer

genaiBeginner365d access
99onwards

Summarize documents that exceed the context window using chunking and map-reduce patterns. Handle PDFs, articles, and reports of any length intelligently.

  • Ingest documents from multiple input types: pasted text, .txt files, and PDFs
  • Implement map-reduce summarization to handle arbitrarily long documents
  • Control output structure using prompt instructions (bullets, Q&A, executive summary)
  • Estimate and display approximate API cost per operation

Prompt Engineering Lab

genaiBeginner365d access
99onwards

Run the same query through 3 different prompt templates and score outputs by quality. Build a systematic prompt testing and iteration tool — the skill every AI engineer needs.

  • Design and compare multiple prompt templates for the same task systematically
  • Use concurrent API calls to run prompt variants in parallel
  • Build a scoring system to quantitatively evaluate prompt quality
  • Identify which prompt patterns (chain-of-thought, few-shot, direct) work best for different query types

Chat with Your PDF

genaiBeginner365d access
99onwards

Upload a PDF, ask questions about it, get accurate answers. Build your first RAG pipeline end-to-end — chunking, embedding, vector storage, and retrieval.

  • Chunk documents into overlapping segments for effective retrieval
  • Generate and store vector embeddings using OpenAI's Embeddings API
  • Build and query a FAISS vector index for similarity search
  • Create a RetrievalQA chain that grounds answers in document content

AI FAQ Bot for a Topic

genaiFoundation365d access
99onwards

Pick a topic (company policy, a book, a subject), load the content, and answer user questions about it using basic prompt stuffing. Your intuition-builder for why RAG exists.

  • Understand the prompt stuffing technique and how it works
  • Identify the practical limits of putting large documents into a prompt
  • Observe and document LLM hallucination behavior on out-of-scope questions
  • Write a prompt that instructs the model to cite its sources

Text Summarizer Tool

genaiFoundation365d access
99onwards

Feed in a long article or paste any text and get a clean summary. Learn API calls, basic prompt construction, and how to handle long inputs by chunking.

  • Build prompt templates that produce structured, useful summaries
  • Solve the context window problem using text chunking
  • Implement map-reduce summarization for documents of any length
  • Give users control over output format and tone through prompt engineering

Simple AI Chatbot

genaiFoundation365d access
99onwards

Build a chatbot using direct OpenAI API calls — maintain conversation history, handle multi-turn context, and display responses cleanly. Your very first LLM integration.

  • Make your first OpenAI API call and understand the request/response structure
  • Maintain multi-turn conversation context using the messages array
  • Build a Streamlit chat UI with session state and message history
  • Use system prompts to give an LLM a persona and behavioral constraints
12k+
Verified Developers
150+
Active Projects
450+
Companies Hiring
14 Days
Avg. Completion

Got questions?

Every challenge includes detailed documentation, technical constraints, and automated evaluation scripts to ensure you have everything you need to succeed.