Building Mavia

Our new AI-powered personal tutor, Mavia.ai builds upon a unique combination of the latest advancements in artificial intelligence and years of academic research ranging from studies of online instruction during Covid-19 and massive open online courses (MOOC), to more recent efforts towards integrating AI into education. In my earlier posts, I've written at length about the learning process and pitfalls of Generative AI which became the underpinnings of Mavia. I'd like to dig a little deeper into how these manifested themselves within the product, with a special focus on user experience and additional research behind it.
Who is Mavia?
Creating a realistic virtual tutor persona was one of our key goals, who we named after the fourth-century warrior queen, Mavia. We reimagined the modern day Mavia as a young, successful woman that would captivate audiences in lecture halls and on keynote stages. Our firm belief has been that in order to be effective, Mavia not only needed to look, sound and act like a person, but also remember previous discussions, students interests etc. and incorporate those into the tutoring experience to create a personal connection. Doing this right is no small feat even with the latest off-the-shelf Large Language Models (LLM). But why did we go to such great lengths?
Importance of realism for tutor personas
The presence of real or virtual teachers in teaching content has been found to enhance learning outcomes and experiences (Colliot and Jamet, 2018; Wang and Antonenko, 2017; Kizilcec et al., 2015; Wilson et al., 2018). When a virtual teacher has a human-like appearance and body, displaying human-like movements and expressions, students report higher levels of trust and social presence (eg, Kim et al., 2018; Veletsianos et al., 2009).Zhang and Wu (2024) found that video quality and expressivity of virtual avatars enhance users' learning, emotional connection, and engagement going beyond the learning session, leading to improved learning outcomes. Their work provides the recipe for engaging content which includes:
For the K-12, synthesized voices have been observed to even enhance retention and transfer (ability to apply what was learned in one situation to a new, different situation) more than human voices (Deng et al., 2022), due to better intonation, cadence, clearer pronunciation and accent.
In the absence of these qualities, virtual teachers have been shown to negatively impact learning. According to Vallis et al. (2023), the most frequently voiced negative aspect of AI-generated instructors is that they cause distraction. Similarly Kocadere and Ozhan (2024) observed that AI-generated instructors caused distraction and made students uncomfortable due to the "uncanny valley" effect resulting in inability to pay full attention to the subject, thrown off by unnatural voice and facial expressions.
This is why with Mavia we set a really high bar when building our virtual tutor. I am sure you all sat through video conferences or pre-recorded training sessions where you felt that the presenter didn't even want to be there. How did that impact your willingness to pay attention and stay engaged? Now, imagine you were a teenager. We wanted to provide students with a dynamic and engaging tutoring experience where they can feel Mavia's excitement and enthusiasm about topics, and after many iterations with different designs, we believe we reached our goal. You can find a sample video below that showcases the virtual tutor in different environments, lighting conditions, moods, outfits, displaying a range of gestures and facial expressions.
We've relied on existing academic research to optimize for high engagement and retention along many axes from chunking of video content, ratio of tutor video to content and rate of speech to using the right conversational language to create a sense of partnership and prompt students to try different ways to look at the problems. We continue to further optimize based on usage patterns and being an online service, Mavia gives us a much larger pool to learn from and faster iteration cycles compared to most academic studies.
Content quality and placement
Kestin and Miller (2022) found that educational videos are most effective when they combine visuals that enhance conceptual understanding with questions embedded throughout the video to drive continued, active engagement. Haerawan et al. (2024) reached a similar conclusion in their studies where making videos more interactive by incorporating quizzes and branching scenarios resulted in 45% higher interaction rates, 30% longer viewing times and 25% improvement in learning outcomes as measured by post-test scores. Signaling, or highlighting the most relevant parts of an explanation, is also shown to promote effective instruction, both within the cognitive theory of multimedia learning, as well as within more recent frameworks for the design of instructional videos (Mayer et al., 2003; Roelle et al., 2014; Zhang and Wu, 2024). Mavia's conversational experience incorporates all of these best practices and interleaves questions into the natural flow of learning for sustained engagement and highlights key points to make it easier for students to follow.
Paivio's dual-coding theory (1969) suggests that the maximal cognitive learning benefits occur when complementary information is presented simultaneously to both visual and verbal systems, as occurs in well-designed video teaching sessions (Mayer, 2008).
Mavia's multimedia capabilities allow it to deliver engaging, interactive content effectively targeting both the visual and verbal system that far exceeds what textbooks or existing eLearning solutions could offer. Another benefit of video lectures is that they give students control over their learning with the ability to pause and rewatch content, unlike live sessions, allow students to manage their cognitive load (Mayer, 2014; Van Merrienboer and Sweller, 2005).
Mavia takes these benefits up a notch and even allows students to raise their hands and interrupt the video to ask questions just like in-person lectures which can even lead to changes in the rest of the tutoring session through advanced personalization capabilities, making every student's journey unique not to mention the overall experience very interactive and deeply engaging.
Memory and teaching style
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet retaining and retrieving relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization like tutoring. Mavia takes an approach very similar to Tan et al.'s (2025) work on Reflective Memory Management (RMM). Mavia dynamically summarizes interactions into a personalized memory storage for fast and effective future retrieval. This makes it possible to keep arbitrarily long context windows without hitting LLM limits or experiencing quality degradation. It also allows complex learning paths where the student can just explain to Mavia what they learned at school to skip certain topics and difficulty to be automatically adjusted accordingly.
This is a vast departure from state-of-the-art eLearning tools which force students through a linear curriculum that assumes isolation from the outside world. Mavia also addresses an issue that has been recently bothering the educators. In 3 separate studies published in the past few months, usage of ChatGPT has been found to lead to cognitive laziness and eroding critical thinking skills (Chow, 2025; Fan et al., 2025; Lee et al., 2025).
Mavia addresses these concerns by helping students walk through problems, asking questions to put them in the right direction. With advanced personalization, Mavia ensures the content is just challenging enough to promote growth and provides supportive feedback along the way, so students stay engaged, learn by practicing and retain the information long term.
Try Mavia.ai today for free and experience the new era of learning firsthand.