Note: The backend is hosted in an Azure Container App with low idle replicas,
so it may take 5-30 seconds to spool up after periods of inactivity.
This demo showcases the difference between the unmodified TinyLlama-1.1B model and a custom
LoRA-finetuned variant designed specifically for tutoring. The base model shows how a small LLM
behaves with no guidance β answers may drift, vary in quality, or be overly long. The fine-tuned
tutor model adds structure, clarity, and a consistent teaching style aimed at beginners.
Because this is a 1.1B-parameter model, even with careful prompting and fine-tuning it can still behave
like a βtinyβ model: occasional inaccuracies, repetition, or uneven depth are expected. This demo is meant
to show how much LoRA can improve consistency, not to claim perfect results.
Everything here represents Phase 1 of the project: a minimal, end-to-end AI tutor
running inside a single Docker container deployed to Azure Container Apps using a quantized GGUF
build for efficient CPU inference.
Coming in Phase 2: A more advanced LangGraph-powered tutoring pipeline with
multi-step reasoning, Retrieval-Augmented Generation (RAG) using curated course materials, a
feedback-driven refinement loop, and expanded evaluation tools. The goal is a more interactive,
context-aware tutoring system that goes beyond single-shot answers and supports real learning
progression.