Note: The backend is hosted in an Azure Container App with low idle replicas, so it may take 5-30 seconds to spool up after periods of inactivity.

This demo showcases the difference between the unmodified TinyLlama-1.1B model and a custom LoRA-finetuned variant designed specifically for tutoring. The base model shows how a small LLM behaves with no guidance β€” answers may drift, vary in quality, or be overly long. The fine-tuned tutor model adds structure, clarity, and a consistent teaching style aimed at beginners.

Because this is a 1.1B-parameter model, even with careful prompting and fine-tuning it can still behave like a β€œtiny” model: occasional inaccuracies, repetition, or uneven depth are expected. This demo is meant to show how much LoRA can improve consistency, not to claim perfect results.

Everything here represents Phase 1 of the project: a minimal, end-to-end AI tutor running inside a single Docker container deployed to Azure Container Apps using a quantized GGUF build for efficient CPU inference.

Coming in Phase 2: A more advanced LangGraph-powered tutoring pipeline with multi-step reasoning, Retrieval-Augmented Generation (RAG) using curated course materials, a feedback-driven refinement loop, and expanded evaluation tools. The goal is a more interactive, context-aware tutoring system that goes beyond single-shot answers and supports real learning progression.

AI Tutor – TinyLlama + LoRA

Compare base TinyLlama vs fine-tuned model.
Backend: Azure Container Apps Model: TinyLlama-1.1B (GGUF) Adapter: LoRA (tutor style) Checking backend...
Tip: Try the same question with both modes to see the difference.