Skip to content

Discover your next Milestone.

Choose from industry-vetted challenges. Build local, push to GitHub, and earn cryptographic proof of your engineering skills.

Feedback & Survey Analyzer

fullstackIntermediate365d access
149onwards

Collect open-ended survey responses, cluster them by theme using embeddings, detect sentiment per cluster, and generate an executive summary report automatically.

  • Collect open-ended survey responses through a full-stack form
  • Embed text responses and cluster them using k-means to discover latent themes
  • Auto-generate cluster labels using an LLM given representative examples
  • Prompt an LLM to synthesize a data-driven executive summary from structured analysis

Smart Flashcard Generator

fullstackIntermediate365d access
149onwards

Upload lecture notes or textbook PDFs. Auto-generate flashcards, quiz yourself with spaced repetition scheduling, and let the AI adapt difficulty based on your performance

  • Extract and chunk text from PDF uploads for downstream AI processing
  • Use structured prompt output to auto-generate flashcard Q&A pairs from raw content
  • Build an interactive quiz UI with card flip animations
  • Implement the SM-2 spaced repetition algorithm to schedule card reviews

AI-Powered Personal Finance Tracker

fullstackIntermediate365d access
149onwards

Upload bank statements, auto-categorize transactions with an LLM classifier, visualize spending trends, and chat with your own financial data through a RAG interface.

  • Parse CSV bank statement files and normalize transaction data
  • Use batch LLM calls to auto-categorize transactions at scale
  • Build a RAG interface that answers natural language questions about personal financial data
  • Build financial visualizations: category breakdown, monthly trends, merchant leaderboard

Feature Store for ML Pipelines

data science & mlIntermediate365d access
149onwards

Build a lightweight feature store that computes, caches, and serves ML features. Connect it to both a training pipeline and a real-time prediction API.

  • Understand the purpose of a feature store and the training/serving skew problem
  • Compute, version, and store ML features in Redis (real-time) and PostgreSQL (historical)
  • Connect a feature store to both a training pipeline and a live inference API
  • Verify consistency between features used in training and features used in serving

Sentiment Classifier: LSTM vs. LLM

data science & mlIntermediate365d access
149onwards

Train an LSTM on real product reviews. Run the same data through a zero-shot LLM classifier. Compare accuracy, latency, and cost — understand where each belongs.

  • Build, train, and evaluate an LSTM text classifier in PyTorch
  • Implement a zero-shot LLM classifier and measure its performance
  • Compare trained models vs LLMs on accuracy, latency, and cost per prediction
  • Understand the trade-off space: when to use fine-tuned models vs zero-shot LLMs

LLM Output Evaluation Dataset Builder

data science & mlIntermediate365d access
149onwards

Generate a benchmark dataset by prompting an LLM across many scenarios, score outputs on multiple criteria, and produce a structured eval report with failure analysis.

  • Design a multi-dimensional scoring rubric for evaluating LLM outputs
  • Generate structured evaluation datasets using async LLM API calls
  • Build an LLM-as-judge pipeline that scores model responses automatically
  • Calculate inter-rater agreement between human and automated scoring

LLM Evaluation Framework

genaiIntermediate365d access
149onwards

Define test cases, score LLM outputs on accuracy, faithfulness, and tone. Build a regression tracker that alerts you when a prompt change breaks a passing test.

  • Design a rigorous evaluation test suite with multiple scoring dimensions
  • Use an LLM-as-judge to automatically score other LLM outputs
  • Track performance across prompt versions and detect regressions
  • Calculate inter-rater agreement between automated and human evaluators

Multi-Document RAG with Reranking

genaiIntermediate365d access
149onwards

Search across multiple document collections, rerank chunks by relevance using a cross-encoder, and generate a cited, structured answer from multiple sources.

  • Index and query documents from multiple distinct sources in a single vector store
  • Understand why initial vector search returns noisy results and why reranking helps
  • Integrate Cohere Rerank to score retrieved chunks by true relevance
  • Instruct an LLM to cite specific source documents in its generated answers

Context Engineering Pipeline

genaiIntermediate365d access
149onwards

Build a RAG system that dynamically assembles context based on query type — metadata filters, recency weighting, source diversity rules, and dynamic prompt templates.

  • Build a rigorous evaluation framework with a fixed test set and baseline score
  • Add document metadata and use it to filter retrieved chunks before answer generation
  • Classify query intent and apply different prompt templates for each type
  • Measure the impact of each RAG improvement quantitatively against a baseline

Async AI Job Queue

backendIntermediate365d access
149onwards

Build a queue where users submit long AI tasks — document analysis, batch summarization — and poll for results. Handle failures, retries, dead letters, and status webhooks.

  • Understand the job queue pattern and when to use async processing
  • Set up and connect Celery with Redis as a message broker
  • Build APIs for job submission, status polling, and result retrieval
  • Implement automatic retries with exponential backoff for failed tasks

Tool Server for AI Agents

backendIntermediate365d access
149onwards

Expose web search, code execution, and calculator as standardized tools via a REST API. Connect it to an agent and watch it call your tools autonomously.

  • Design a standardized tool API that exposes capabilities to AI agents
  • Implement real tool functions: web search, calculation, and datetime
  • Understand the JSON Schema format for describing tool inputs
  • Connect a LangChain agent to external tools via a REST API

Rate-limited LLM API Gateway

backendIntermediate365d access
149onwards

Build a gateway that sits in front of any LLM API and enforces per-user token-bucket rate limits. Essential infrastructure for every production AI product.

  • Understand the token bucket algorithm and when to use it over other rate limiting approaches
  • Implement per-user token bucket rate limiting using Redis atomic operations
  • Rate limit by LLM token consumption, not just request count
  • Write load tests using Locust to verify rate limiting under concurrent traffic
12k+
Verified Developers
150+
Active Projects
450+
Companies Hiring
14 Days
Avg. Completion

Got questions?

Every challenge includes detailed documentation, technical constraints, and automated evaluation scripts to ensure you have everything you need to succeed.