Building Mavia

Building Mavia

Our new AI-powered personal tutor, Mavia.ai builds upon a unique combination of the latest advancements in artificial intelligence and years of academic research ranging from studies of online instruction during Covid-19 and massive open online courses (MOOC), to more recent efforts towards integrating AI into education. In my earlier posts, I've written at length about the learning process and pitfalls of Generative AI which became the underpinnings of Mavia. I'd like to dig a little deeper into how these manifested themselves within the product, with a special focus on user experience and additional research behind it.

Who is Mavia?

Creating a realistic virtual tutor persona was one of our key goals, who we named after the fourth-century warrior queen, Mavia. We reimagined the modern day Mavia as a young, successful woman that would captivate audiences in lecture halls and on keynote stages. Our firm belief has been that in order to be effective, Mavia not only needed to look, sound and act like a person, but also remember previous discussions, students interests etc. and incorporate those into the tutoring experience to create a personal connection. Doing this right is no small feat even with the latest off-the-shelf Large Language Models (LLM). But why did we go to such great lengths?

Importance of realism for tutor personas

The presence of real or virtual teachers in teaching content has been found to enhance learning outcomes and experiences (Colliot and Jamet, 2018; Wang and Antonenko, 2017; Kizilcec et al., 2015; Wilson et al., 2018). When a virtual teacher has a human-like appearance and body, displaying human-like movements and expressions, students report higher levels of trust and social presence (eg, Kim et al., 2018; Veletsianos et al., 2009).


Zhang and Wu (2024) found that video quality and expressivity of virtual avatars enhance users' learning, emotional connection, and engagement going beyond the learning session, leading to improved learning outcomes. Their work provides the recipe for engaging content which includes:

  • Visually appealing avatars with personality, delicate and natural movements that can express feelings.
  • Ensuring that the video provides clear images and smooth audio, employs sensible scene and screen layouts, and incorporates special effects sparingly.
  • Tailor the depth of knowledge and information to the target user's proficiency and present content with a clear, organized structure.
  • For the K-12, synthesized voices have been observed to even enhance retention and transfer (ability to apply what was learned in one situation to a new, different situation) more than human voices (Deng et al., 2022), due to better intonation, cadence, clearer pronunciation and accent.

    In the absence of these qualities, virtual teachers have been shown to negatively impact learning. According to Vallis et al. (2023), the most frequently voiced negative aspect of AI-generated instructors is that they cause distraction. Similarly Kocadere and Ozhan (2024) observed that AI-generated instructors caused distraction and made students uncomfortable due to the "uncanny valley" effect resulting in inability to pay full attention to the subject, thrown off by unnatural voice and facial expressions.

    This is why with Mavia we set a really high bar when building our virtual tutor. I am sure you all sat through video conferences or pre-recorded training sessions where you felt that the presenter didn't even want to be there. How did that impact your willingness to pay attention and stay engaged? Now, imagine you were a teenager. We wanted to provide students with a dynamic and engaging tutoring experience where they can feel Mavia's excitement and enthusiasm about topics, and after many iterations with different designs, we believe we reached our goal. You can find a sample video below that showcases the virtual tutor in different environments, lighting conditions, moods, outfits, displaying a range of gestures and facial expressions.

    Mavia virtual tutor video
    We've relied on existing academic research to optimize for high engagement and retention along many axes from chunking of video content, ratio of tutor video to content and rate of speech to using the right conversational language to create a sense of partnership and prompt students to try different ways to look at the problems. We continue to further optimize based on usage patterns and being an online service, Mavia gives us a much larger pool to learn from and faster iteration cycles compared to most academic studies.

    Content quality and placement

    Kestin and Miller (2022) found that educational videos are most effective when they combine visuals that enhance conceptual understanding with questions embedded throughout the video to drive continued, active engagement. Haerawan et al. (2024) reached a similar conclusion in their studies where making videos more interactive by incorporating quizzes and branching scenarios resulted in 45% higher interaction rates, 30% longer viewing times and 25% improvement in learning outcomes as measured by post-test scores. Signaling, or highlighting the most relevant parts of an explanation, is also shown to promote effective instruction, both within the cognitive theory of multimedia learning, as well as within more recent frameworks for the design of instructional videos (Mayer et al., 2003; Roelle et al., 2014; Zhang and Wu, 2024). Mavia's conversational experience incorporates all of these best practices and interleaves questions into the natural flow of learning for sustained engagement and highlights key points to make it easier for students to follow.

    Mavia interactive prompts and signaling
    Paivio's dual-coding theory (1969) suggests that the maximal cognitive learning benefits occur when complementary information is presented simultaneously to both visual and verbal systems, as occurs in well-designed video teaching sessions (Mayer, 2008).

    Mavia multimedia diagrams Mavia's multimedia capabilities allow it to deliver engaging, interactive content effectively targeting both the visual and verbal system that far exceeds what textbooks or existing eLearning solutions could offer. Another benefit of video lectures is that they give students control over their learning with the ability to pause and rewatch content, unlike live sessions, allow students to manage their cognitive load (Mayer, 2014; Van Merrienboer and Sweller, 2005).

    Mavia takes these benefits up a notch and even allows students to raise their hands and interrupt the video to ask questions just like in-person lectures which can even lead to changes in the rest of the tutoring session through advanced personalization capabilities, making every student's journey unique not to mention the overall experience very interactive and deeply engaging.

    Mavia interactive prompts and signaling

    Memory and teaching style

    Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet retaining and retrieving relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization like tutoring. Mavia takes an approach very similar to Tan et al.'s (2025) work on Reflective Memory Management (RMM). Mavia dynamically summarizes interactions into a personalized memory storage for fast and effective future retrieval. This makes it possible to keep arbitrarily long context windows without hitting LLM limits or experiencing quality degradation. It also allows complex learning paths where the student can just explain to Mavia what they learned at school to skip certain topics and difficulty to be automatically adjusted accordingly.


    Mavia managing student context
    This is a vast departure from state-of-the-art eLearning tools which force students through a linear curriculum that assumes isolation from the outside world. Mavia also addresses an issue that has been recently bothering the educators. In 3 separate studies published in the past few months, usage of ChatGPT has been found to lead to cognitive laziness and eroding critical thinking skills (Chow, 2025; Fan et al., 2025; Lee et al., 2025).

    Mavia addresses these concerns by helping students walk through problems, asking questions to put them in the right direction. With advanced personalization, Mavia ensures the content is just challenging enough to promote growth and provides supportive feedback along the way, so students stay engaged, learn by practicing and retain the information long term.

    Mavia study session
    Try Mavia.ai today for free and experience the new era of learning firsthand.



    References

  • Zhang R., Wu Q. (2024). Impact of using virtual avatars in educational videos on user experience. Nature
  • Haerawan H., Woolnough C., Uwe B. (2024). The Effectiveness of Interactive Videos in Increasing Student Engagement in Online Learning. Journal of Computer Science Advancements
  • Kestin G., Miller K. (2022). Harnessing active engagement in educational videos: Enhanced visuals and embedded questions. Physics Education Research
  • Chow A. R. (2025). ChatGPT May Be Eroding Critical Thinking Skills, According to a New MIT Study. Time Magazine
  • Tan Z., Yan J., Hsu I.H., Han R., Wang Z., Long T.L., Song Y., Chen Y., Palangi H., Lee G., Iyer A, Chen T., Liu H. Lee C.Y., Pfister T. (2025). In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
  • Fan Y., Tang L., Le H., Shen K., Tan S., Zhao Y., Shen Y., Li X., Gasevic D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology
  • Lee H., Sarkar A., Tankelevitch L., Drosos I., Rintel S., Banks R., Wilson N. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. Conference on Human Factors in Computing Systems
  • Napolitano J. (2025). AI Makes Quick Gains in Math, But Errors Still Worry Some Eyeing Reliability. 74million.org
  • Kocadere S., Ozhan S. (2024). Video Lectures With AI-Generated Instructors.International Review of Research in Open and Distributed Learning
  • Vallis, C., Wilson, S., Gozman, D., Buchanan, J. (2023). Student perceptions of AI-generated avatars in teaching business ethics: We might not be impressed. Postdigital Science and Education, 1-19
  • Deng L., Zhou Y., Cheng T., Liu X., Xu T., Wang X. (2022). My English Teachers Are Not Human but I Like Them: Research on Virtual Teacher Self-study Learning System in K12. International Conference on Human-Computer Interaction
  • Colliot, T., Jamet, E. (2018). Understanding the effects of a teacher video on learning from a multimedia document: An eye-tracking study. Educational Technology Research and Development, 66(6), 1415-1433
  • Wang, J., Antonenko, P. D. (2017). Instructor presence in instructional video: Effects on visual attention, recall, and perceived learning. Computers in Human Behavior, 71, 79-89
  • Kizilcec, R., Bailenson, J., Gomez, C. (2015). The instructor's face in video instruction: Evidence from two large-scale field studies. Journal of Educational Psychology, 107, 724-739
  • Wilson, K. E., Martinez, M., Mills, C., D'Mello, S., Smilek, D., Risko, E. F. (2018). Instructor presence effect: Liking does not always lead to learning. Computers & Education, 122, 205-220
  • Kim, K., Boelling, L., Haesler, S., Bailenson, J., Bruder, G., Welch, G. F. (2018). Does a digital assistant need a body? The influence of visual embodiment and social behavior on the perception of intelligent virtual agents in AR. In IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Munich Germany: (pp. 105-114)
  • Roelle J., Berthold K., Renkl A. (2014) Two instructional aids to optimise processing and learning from instructional explanations, Instr. Sci. 42, 207
  • Mayer R.E. (2014) Cognitive Theory of Multimedia Learning, 2nd Ed.. The Cambridge handbook of multimedia learning(Mayer R.E., Ed.), Cambridge University Press.43-71 10.1017/CBO9781139547369.005
  • Veletsianos, G., Miller, C., Doering, A. (2009). EnALI: A research and design framework for virtual characters and pedagogical agents. Journal of Educational Computing Research, 41(2), 171-194
  • Mayer R.E. (2008) Applying the science of learning: evidence-based principles for the design of multimedia instruction. Am. Psychol. 63, 760 10.1037/0003-066X.63.8.760
  • Mayer R.E., Dow G.T., Mayer S. (2003) Multimedia learning in an interactive self-explaining environment: What works in the design of agent-based microworlds?, J. Educ. Psychol. 95, 806
  • Van Merrienboer J.J., Sweller J. (2005) Cognitive load theory and complex learning: recent developments and future directions. Educ. Psychol. Rev. 17, 147-177
  • Paivio A. (1969) Mental imagery in associative learning and memory. Psychol. Rev. 76, 241 10.1037/h0027272