Describe a startup idea. The app runs a full AI pipeline — market sizing, live competitor scan with web search tools, SWOT analysis, and a landing page draft. All streamed.
Design and implement a multi-stage LLM pipeline with distinct agent responsibilities
Use LangChain agents with Tavily web search to do live competitor research
Stream multiple independent pipeline stages to the frontend simultaneously over SSE
Build a pipeline progress tracker that visualizes each stage's completion status
End-to-end interview coach — upload a JD, generate role-specific questions, conduct a mock interview with voice, get scored feedback with improvement suggestions.
Design a complex multi-service architecture before writing any code
Generate role-specific interview questions from a job description using an LLM
Transcribe voice answers using Whisper and deliver questions via TTS
Evaluate free-text interview answers across multiple quality dimensions using LangGraph
Implement a full Transformer encoder — multi-head attention, positional encoding, layer norm — in PyTorch from scratch. Train on a classification task. No HuggingFace shortcuts.
Implement scaled dot-product attention and multi-head attention from scratch in PyTorch
Build sinusoidal positional encoding and understand why position matters in Transformers
Assemble a complete Transformer encoder block with residual connections and layer norm
Train an encoder classifier end-to-end on a real text classification dataset
Scrape content, clean it, auto-generate instruction-response pairs using an LLM, score quality with an evaluator model, and output a production-ready JSONL dataset.
Build an async web scraping pipeline using httpx and BeautifulSoup
Clean, deduplicate, and validate raw text content at scale
Auto-generate instruction-response training pairs using an LLM
Score dataset quality using an LLM judge and apply rule-based filters
Prepare an instruction-tuning dataset, fine-tune Phi-2 or Mistral 7B using LoRA/QLoRA on free Colab TPUs, and rigorously evaluate the fine-tuned model vs. the base.
Curate and format a high-quality instruction-tuning dataset in JSONL format
Understand LoRA and QLoRA — how parameter-efficient fine-tuning works
Fine-tune a small open-source LLM (Phi-2 or Mistral 7B) on Google Colab for free
Evaluate a fine-tuned model rigorously against its base model using an LLM judge
Log every LLM call with latency, token usage, and model output. Build a query layer surfacing slow calls, expensive prompts, error rates, and cost trends over time.
Instrument every LLM call with latency, token usage, and cost tracking
Group related LLM calls into traces for end-to-end session visibility
Write analytical SQL queries for performance monitoring (slow calls, cost-by-model)
Build a live observability dashboard using Chart.js
Coordinate multiple specialized AI agents — planner, researcher, writer — passing context and managing state between them. Return a streamed unified result to the client.
Design a multi-agent architecture with clearly defined agent roles
Implement a stateful agent graph using LangGraph
Stream intermediate agent progress to clients using WebSockets
Implement per-agent timeouts and graceful fallback strategies
Every challenge includes detailed documentation, technical constraints, and automated evaluation scripts to ensure you have everything you need to succeed.