Collect open-ended survey responses, cluster them by theme using embeddings, detect sentiment per cluster, and generate an executive summary report automatically.
Collect open-ended survey responses through a full-stack form
Embed text responses and cluster them using k-means to discover latent themes
Auto-generate cluster labels using an LLM given representative examples
Prompt an LLM to synthesize a data-driven executive summary from structured analysis
Upload lecture notes or textbook PDFs. Auto-generate flashcards, quiz yourself with spaced repetition scheduling, and let the AI adapt difficulty based on your performance
Extract and chunk text from PDF uploads for downstream AI processing
Use structured prompt output to auto-generate flashcard Q&A pairs from raw content
Build an interactive quiz UI with card flip animations
Implement the SM-2 spaced repetition algorithm to schedule card reviews
Upload bank statements, auto-categorize transactions with an LLM classifier, visualize spending trends, and chat with your own financial data through a RAG interface.
Parse CSV bank statement files and normalize transaction data
Use batch LLM calls to auto-categorize transactions at scale
Build a RAG interface that answers natural language questions about personal financial data
Build a lightweight feature store that computes, caches, and serves ML features. Connect it to both a training pipeline and a real-time prediction API.
Understand the purpose of a feature store and the training/serving skew problem
Compute, version, and store ML features in Redis (real-time) and PostgreSQL (historical)
Connect a feature store to both a training pipeline and a live inference API
Verify consistency between features used in training and features used in serving
Train an LSTM on real product reviews. Run the same data through a zero-shot LLM classifier. Compare accuracy, latency, and cost — understand where each belongs.
Build, train, and evaluate an LSTM text classifier in PyTorch
Implement a zero-shot LLM classifier and measure its performance
Compare trained models vs LLMs on accuracy, latency, and cost per prediction
Understand the trade-off space: when to use fine-tuned models vs zero-shot LLMs
Generate a benchmark dataset by prompting an LLM across many scenarios, score outputs on multiple criteria, and produce a structured eval report with failure analysis.
Design a multi-dimensional scoring rubric for evaluating LLM outputs
Generate structured evaluation datasets using async LLM API calls
Build an LLM-as-judge pipeline that scores model responses automatically
Calculate inter-rater agreement between human and automated scoring
Define test cases, score LLM outputs on accuracy, faithfulness, and tone. Build a regression tracker that alerts you when a prompt change breaks a passing test.
Design a rigorous evaluation test suite with multiple scoring dimensions
Use an LLM-as-judge to automatically score other LLM outputs
Track performance across prompt versions and detect regressions
Calculate inter-rater agreement between automated and human evaluators
Search across multiple document collections, rerank chunks by relevance using a cross-encoder, and generate a cited, structured answer from multiple sources.
Index and query documents from multiple distinct sources in a single vector store
Build a RAG system that dynamically assembles context based on query type — metadata filters, recency weighting, source diversity rules, and dynamic prompt templates.
Build a rigorous evaluation framework with a fixed test set and baseline score
Add document metadata and use it to filter retrieved chunks before answer generation
Classify query intent and apply different prompt templates for each type
Measure the impact of each RAG improvement quantitatively against a baseline
Build a queue where users submit long AI tasks — document analysis, batch summarization — and poll for results. Handle failures, retries, dead letters, and status webhooks.
Understand the job queue pattern and when to use async processing
Set up and connect Celery with Redis as a message broker
Build APIs for job submission, status polling, and result retrieval
Implement automatic retries with exponential backoff for failed tasks
Expose web search, code execution, and calculator as standardized tools via a REST API. Connect it to an agent and watch it call your tools autonomously.
Design a standardized tool API that exposes capabilities to AI agents
Implement real tool functions: web search, calculation, and datetime
Understand the JSON Schema format for describing tool inputs
Connect a LangChain agent to external tools via a REST API
Build a gateway that sits in front of any LLM API and enforces per-user token-bucket rate limits. Essential infrastructure for every production AI product.
Understand the token bucket algorithm and when to use it over other rate limiting approaches
Implement per-user token bucket rate limiting using Redis atomic operations
Rate limit by LLM token consumption, not just request count
Write load tests using Locust to verify rate limiting under concurrent traffic
Every challenge includes detailed documentation, technical constraints, and automated evaluation scripts to ensure you have everything you need to succeed.