Turn any textbook into interactive course

Building an AI That Turns Any Textbook into an Interactive Course (Using Open Source LLMs)

Oct 25, 2025 I’ve been developing something quite innovative lately — an agent that can transform a PDF or textbook into a completely interactive course. Like, not just a bunch of text dumped into a webpage. I’m talking Q&A modules, drag-and-match games, flowcharts, info bubbles — all auto-generated from your material. I believe this is one of the most interesting things I’ve created so far, and it’s all open source. Press enter or click to view image in full size

State-of-the-art large reasoning models show impressive problem-solving abilities, but often struggle to follow straightforward instructions during reasoning. The basic idea The agent works like a course designer who never sleeps. Here’s how it goes: 1. You drop in some content (like a textbook chapter or lecture notes). 2. The agent reads and figures out what’s inside. 3. It comes up with a lesson structure — what goes first, what becomes a quiz, what fits better as a visual. 4. Then it actually calls tools that build each part — questions, charts, etc. 5. A second agent reviews the whole thing to make sure it’s true to the source. 6. You get a neat, interactive lesson that you can share. And the best part — it’s built on Together AI’s open-source models, so there are no paywalls or black-box dependencies. How it works When you upload a document, the agent initially determines how to process it. If it’s short, it retains everything in context. If it’s long, it creates embeddings so it can “remember” sections as needed. text = load_pdf("chapter1.pdf")

if len(text) < 10000:

   context = text

else:

   context = embed_and_store(text)

Once it understands the material, it sketches out a lesson plan. Something like: lesson = [

   {"type": "info_bubble", "topic": "Newton’s First Law"},
   {"type": "qa", "count": 3},
   {"type": "drag_match", "pairs": ["Force", "Mass × Acceleration"]},
   {"type": "flowchart", "topic": "How motion changes under force"}

] That structure then becomes a set of tool calls that automatically build each piece of the lesson. call_tool("qa_generator", {

   "topic": "Newton’s First Law",
   "num_questions": 3

}) Everything’s modular, so I can keep adding new “learning blocks” over time. The reviewer agent After the first agent creates the lesson, another one (a “Lesson Reviewer”) reviews everything to ensure accuracy against the original material. If it detects issues, it proposes edits — similar to a quality check for AI-generated lessons. Why open source? I’m choosing Together AI models because I want this to remain open and free. If you’re a student, teacher, or independent creator, you should be able to create learning content without spending a lot or sharing your data with proprietary APIs. ReasonIF — Why LLMs still struggle to follow instructions While building this, I came across a new research paper called ReasonIF (short for Reasoning Instruction Following). It basically says even the best large reasoning models (like GPT-OSS-120B, Qwen3–235B, DeepSeek-R1) fail to follow reasoning instructions about 75% of the time. Not the final answer — but during the actual thinking process. That’s… a lot. They built a benchmark for this and tried a new training trick called Reasoning Instruction Finetuning (RIF). It helped a bit (score went from 0.11 to 0.27), but yeah, there’s still a ton of room for improvement. For anyone building multi-step agents like mine, this part is especially important. You need your model not only to “get” your instructions but also to follow them consistently as it reasons through tasks. Press enter or click to view image in full size

Press enter or click to view image in full size

Analysis specific to instructions (left) within the reasoning trace and (right) in the main response. Failures are particularly evident for formatting-sensitive tasks like JSON formatting and uppercase-only text. Press enter or click to view image in full size

Model accuracy versus reasoning IFS across benchmark difficulty levels. All LRMs exhibit a positive slope, indicating that harder tasks can negatively impact reasoning IF performance. For the curious: running ReasonIF If you like testing this kind of stuff, here’s how they ran it:

Install dependencies

uv sync

Activate environment

source .venv/bin/activate

Run inference

python -m src.main --model_name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"

Evaluate

python -m src.eval_core --model_name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B" It saves metrics like instruction-following accuracy in the outputs/ folder. What’s next Right now, the app can already generate small lessons from short PDFs. Next step: scaling it up to whole chapters, with better visuals and smoother editing tools. Once it’s stable, I’ll make a complete write-up or a video guide on how I built it. Final thoughts AI shouldn’t just generate text — it should create experiences. This project is my small step toward that goal. The fact that it’s open and reproducible makes it even better.