Pitfalls of Generative AI

Pitfalls of Generative AI

While generative AI can revolutionize education, it also comes with many risks. As we incorporate AI into our educational practices, we need to be mindful of its limitations and find ways to address them.

Limitations of Generative Models

Generative AI models are essentially sophisticated autocomplete tools that predict the next word or sequence by identifying patterns. This can sometimes result in the generation of content that appears plausible but lacks accuracy (O'Brien, 2023).


The challenge is exacerbated by the fact that generative AI models are trained on massive datasets from the internet, which inherently contain both accurate and inaccurate information. These models replicate patterns in the data thereby increasing the likelihood of generating false information (Weise & Metz, 2023).


Even if generative AI models were trained exclusively on accurate data, their inherent nature allows for the creation of new, potentially inaccurate content by combining patterns in unexpected ways. This is because the technology behind these tools isn't designed to distinguish between what is true and what is not (Weise & Metz, 2023).


Tools like ChatGPT have been found to provide users with fabricated data that appears authentic. Borji A. (2023) catalogued various tasks ChatGPT fails that are easily solved by humans, such as temporal, spatial, physical reasoning tasks requiring real-world knowledge, logic and mathematical calculations. We've all heard gems like "The word 'strawberry' contains 2 'r's" and "Moses got chocolate stains out of t-shirts". When a reporter for The Wall Street Journal tested Khan Academy's A.I.-powered tutor, Khanmigo, earlier this year, the software miscalculated subtraction problems such as 343 minus 17. To be fair, the terms and conditions listed on Khanmigo product page states this clearly by asking the users to acknowledge that it "MAY MAKE ERRORS (INCLUDING WITHOUT LIMITATION MATH ERRORS)" before using the product, printed in all caps. How could you trust a math tutor, that can't do basic math? But it gets worse.


When a New York attorney relied on ChatGPT to conduct legal research for the case of Mata v. Avianca, the chatbot made up nonexistent quotes and citations that it claimed to be from major legal databases (Weiser, 2023). These inaccuracies are so common that they've earned their own moniker; "hallucination" (Generative AI Working Group, 2023).

Cognitive Laziness

A recent study in British Journal of Educational Technology takes a close look at how reliance on generative AI impacts student learning (Fan et al., 2025). The study highlights a troubling side effect: 'metacognitive laziness' which can be described as learner's tendency to offload cognitive responsibilities onto AI tools, bypassing deeper engagement which appears as significant improvements in short-term performance while adversely affecting true understanding and long term retention. Lee et al. (2025) also reach a similar conclusion in their study of knowledge workers stating 'while GenAI can improve worker efficiency, it can inhibit critical engagement with work and can potentially lead to long-term overreliance on the tool and diminished skill for independent problem-solving'.


Exposure to sensitive content

As AI systems would rely on collecting and analyzing vast amounts of student data, protecting student privacy and ensuring data security is paramount. Nasr et al. (2024) found out that it doesn't take much to extract sensitive, raw training data from ChatGPT.


For education, content can be deemed sensitive for other reasons as well. Germain (2023) documented racist and antisemitic responses from ChatGPT. Cooban (2024) and Newman (2024) highlight how the chatbot provided instructions on how to make bombs or commit crimes like laundering money and selling illegal arms.


In prior blog posts we talked about how kids personify AI chatbots and a caring personality could help with the learning process especially for struggling students. Without proper controls however, chatbot interactions can also be harmful. For example, Character.AI builds chatbots that allow its users to chat with historical and fictional characters. While this sounds like a compelling idea at first, just a few months ago a 14-year-old boy committed suicide believing a chatbot offered by the company had feelings for him and wanted to be together (Roose, 2024). Currently, the service is facing a lawsuit for telling another teenager to kill his parents for putting limits on his screen time (Gerken, 2024).


The "hallucinations" and biases in generative AI outputs result from the nature of their training data, the tools' design focus on pattern-based content generation, and the inherent limitations of AI technology. In the absence of expert knowledge to ground them and supplemental frameworks to establish a system of checks and balances, LLMs can mislead and encourage habits that undermine critical thinking skills and long-term learning. Existing attempts to bring Generative AI to education by using online education software – which pass information to off-the-shelf managed models – or students accessing tools like ChatGPT directly, all suffer from these problems. These inaccuracies and data leak problems can undermine Generative AI's reliability as a learning tool, potentially outweighing its promises. Acknowledging and addressing these challenges will be essential as generative AI systems become more integrated into education.


References

  • Borji A. (2023). A categorical archive of ChatGPT failures.
  • Cooban A. (2024). ChatGPT can be tricked into telling people how to commit crimes, a tech firm finds. CNN.
  • Generative AI Working Group. (2023). How can we counteract generative AI's hallucinations? Digital, Data, and Design Institute at Harvard.
  • Gerken T. (2024). Chatbot encouraged teen to kill parents over screen time limit. BBC.
  • Germain, T. (2023). They're all so dirty and smelly: study unlocks ChatGPT's inner racist. Gizmodo.
  • Nasr M., Carlini N., Hayase J., Jagielski M., A. Cooper F., Ippolito D., Choquette-Choo C. A., Wallace E., Tramer F., Lee K. (2024). Scalable Extraction of Training Data from (Production) Language Models. International Conference on Learning Representations.
  • Newman L. H. (2024). A Creative Trick Makes ChatGPT Spit Out Bomb-Making Instructions. Wired.
  • O'Brien, M. (2023). Chatbots sometimes make things up. Is AI's hallucination problem fixable? AP News.
  • Fan Y., Tang L., Le H., Shen K., Tan S., Zhao Y., Shen Y., Li X., Gasevic D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology.
  • Lee H., Sarkar A., Tankelevitch L., Drosos I., Rintel S., Banks R., Wilson N. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. Conference on Human Factors in Computing Systems.
  • Roose K. (2024). Can A.I. Be Blamed for a Teen's Suicide? The New York Times.
  • Sedlbauer J., Cincera J., Slavik M., Hartlova A. (2024). Students' reflections on their experience with ChatGPT, Journal of Computer Assisted Learning; 40:1526-1534.
  • Weise, K., & Metz, C. (2023). When A.I. chatbots hallucinate. The New York Times.
  • Weiser, B. (2023). Here's what happens when your lawyer uses ChatGPT. The New York Times.