A Systematic Review of Adaptive Learning: Models, Efficacy, and Trajectories in the Era of AI

Abstract

Background: Adaptive learning systems, which leverage artificial intelligence (AI) to personalise educational experiences, have evolved from rule-based tutors into sophisticated, data-driven platforms. The rapid integration of deep learning and large language models (LLMs) necessitates a contemporary synthesis of the field's methodologies, evidence of efficacy, and ethical considerations.

Objectives: This systematic review aims to: (1) synthesise definitions and historical trajectories of adaptive learning; (2) propose a functional taxonomy of system components; (3) comparatively analyse dominant modelling paradigms (e.g., knowledge tracing, reinforcement learning, LLMs); (4) assess the causal evidence for learning and affective outcomes; (5) map the landscape of evaluation benchmarks and datasets; (6) analyse ethical risks and human-in-the-loop design patterns; and (7) outline a future research agenda.

Methods: Following the PRISMA 2020 guidelines, we conducted a systematic search of Scopus, Web of Science, ERIC, ACM Digital Library, and IEEE Xplore for empirical and substantive methodological papers published between January 2000 and the present. A two-stage screening process was conducted, followed by data extraction and quality appraisal using a GRADE-inspired framework. A narrative synthesis was performed, supplemented by a meta-analytic summary of learning outcome evidence where feasible.

Results: The review includes [Number] studies. Our synthesis reveals a field in transition. While early systems focused on adapting navigation and feedback based on proxies like learning styles, modern systems employ deep learning to model latent knowledge states from rich interaction data. A meta-analysis of 31 studies reports a moderately positive overall effect of AI-assisted personalised learning on student outcomes.1 However, rigorous evaluations of newer LLM-based tutors show that while they improve performance on tutored tasks, this benefit does not consistently transfer to distal learning measures like exam performance.2 Key challenges include the opacity of complex models, the risk of algorithmic bias, and the practicalities of classroom orchestration. Human-in-the-loop frameworks are emerging as a critical design pattern to ensure teacher agency and effective implementation.

Conclusions: AI-powered adaptive learning holds significant potential, but its realisation depends on grounding technological advancements in robust learning science. The field must move beyond optimising for predictive accuracy on proximal tasks towards demonstrating transferable learning gains in authentic contexts. Future research should prioritise hybrid models that integrate the strengths of different AI paradigms, the development of fair and transparent systems, and rigorous, ecologically valid evaluation protocols.

Keywords: adaptive learning; intelligent tutoring systems; personalised learning; knowledge tracing; reinforcement learning; large language models; educational technology; systematic review

1. Introduction

1.1. Defining Adaptive Learning Across Communities (RQ1)

The term ‘adaptive learning’ is used with varied connotations across the fields of education, human-computer interaction (HCI), and artificial intelligence (AI), reflecting the distinct priorities of each community. From an educational perspective, adaptive learning is primarily a pedagogical method. It is a learner-centred approach wherein instructors deliver customised learning experiences through personalised content, differentiated instruction, and timely feedback, acknowledging that a one-size-fits-all curriculum is inadequate.4 This view foregrounds the goal of providing equitable educational opportunities by catering to individual differences in pace and prior knowledge.5

The HCI community, particularly through its work on Adaptive Instructional Systems (AIS), offers a more formal, technology-centric definition. An AIS is defined as an “artificially intelligent, computer-based system that guides learning experiences by tailoring instruction and recommendations based on the goals, needs, preferences, and interests of each individual learner”.6 This definition emphasises the system’s role in observing user behaviour, assessing progress, and acting upon the learner and their environment to optimise outcomes like learning, performance, and retention.6

The AI community focuses on the underlying computational mechanisms. Here, adaptive learning is defined by the use of machine learning algorithms to analyse learner behaviour, track progress, and dynamically adjust content to meet individual needs.7 This perspective highlights the data-driven nature of modern systems, which continuously tailor instruction to create an optimal and personalised experience.7

It is crucial to distinguish adaptive learning from related concepts. ‘Personalised learning’ is a broader term that encompasses any effort to tailor education to individual student needs, which may or may not involve advanced technology.1 While AI-assisted personalised learning is largely synonymous with adaptive learning, personalisation can also be achieved through non-technical means like project-based learning or small-group instruction.10 ‘Differentiated instruction’, in contrast, is typically a teacher-driven strategy for modifying instruction for groups of students within a single classroom, rather than the algorithmically-driven, one-to-one tailoring characteristic of adaptive learning systems.

1.2. A Historical Arc: From Programmed Instruction to LLM Tutors

The ambition to automate and personalise instruction predates the digital computer. The conceptual origins of adaptive learning can be traced to early 20th-century mechanical devices. In the 1920s, Sidney Pressey developed a "testing and teaching machine" that presented multiple-choice questions and provided immediate feedback, allowing learners to advance only after mastering a concept.11 This work was extended in the 1950s by B.F. Skinner, whose "teaching machine" embodied the principles of behaviourism and programmed instruction, breaking down complex subjects into a sequence of small steps with immediate reinforcement.13

The advent of the AI movement in the 1970s marked a pivotal shift from pre-programmed sequences to intelligent adaptation. This era saw the birth of the Intelligent Tutoring System (ITS), a computer system designed to emulate the benefits of a one-to-one human tutor.12 The seminal example was the SCHOLAR system, which used a semantic network to represent knowledge about South American geography and could engage in a mixed-initiative dialogue, generating questions and responding to student queries.11 These early ITSs established the canonical architecture comprising models of the expert, the learner, and the tutor.14

The subsequent evolution can be conceptualised through a generational model, tracking the increasing sophistication of the underlying technology.13

Adaptive Learning 1.0 (Rule-Based Branching): Early computer-based systems used simple branching logic. A learner's path through the material was determined by a pre-defined decision tree, often based on their answer to a multiple-choice question or a pre-assessment. This offered a "pseudo-personalised" experience but was fundamentally static and unable to adapt beyond its programmed rules.13
Adaptive Learning 2.0 (Algorithmic Adaptation): This generation introduced more complex but still limited algorithms. Systems could make adaptations based on simple learner models, but these models did not learn or improve over time. This phase saw a proliferation of research into adapting to learner characteristics such as cognitive or learning styles.15 While technologically more advanced, this approach often rested on pedagogically questionable foundations. The learning sciences community has, for many years, presented strong evidence that the "learning styles" hypothesis—the idea that tailoring instruction to a learner's preferred modality (e.g., visual, auditory) enhances learning—lacks empirical support. The prevalence of this approach in the AIED literature of the 2000s and 2010s points to a significant disconnect between the technological pursuit of adaptation and the scientific understanding of what constitutes meaningful and effective adaptation. It underscores a recurring theme: the efficacy of an adaptive system is constrained not only by its technical power but by the validity of its underlying pedagogical model.
Adaptive Learning 3.0 (AI and Machine Learning): The current era is defined by the application of modern AI and machine learning. Systems now learn from vast streams of interaction data to build complex, dynamic models of student knowledge and behaviour. This phase is characterised by techniques like Bayesian Knowledge Tracing (BKT) and, more recently, Deep Knowledge Tracing (DKT), which use probabilistic models and neural networks, respectively, to infer latent knowledge states.17 The most recent development is the emergence of tutors built on Large Language Models (LLMs), which leverage the conversational and generative power of models like GPT-4 to create highly interactive and flexible learning dialogues.18

timeline
    title Key Milestones in the History of Adaptive Learning
    section Early Foundations
        1924 : Pressey's Teaching Machine
        1954 : Skinner & Programmed Instruction
    section Intelligent Tutoring Era
        1970 : AI Movement & ITS
        1973 : SCHOLAR System
        1983 : LISP Tutor
        1988 : Classic ITS Architecture
    section Web & Data-Driven Expansion
        1995 : Rise of Web-Based Systems
        1998 : Bayesian Knowledge Tracing (BKT)
        2005 : Adaptive Learning 1.0 / 2.0
        2007 : Focus on Learning Styles
    section AI-Driven & Generative Era
        2014 : Deep Knowledge Tracing (DKT)
        2016 : Reinforcement Learning for Pedagogy
        2020 : Adaptive Learning 3.0
        2022 : LLM-based Tutors
        2023 : Ethics, Fairness & Orchestration

Figure 2. A timeline illustrating key conceptual and technological milestones in the evolution of adaptive learning systems.

1.3. Motivation and Research Questions for the Current Review

The field of adaptive learning is advancing at an unprecedented pace, driven by breakthroughs in AI. While numerous reviews have mapped parts of this territory 15, the rapid emergence of deep learning and generative AI as dominant paradigms, coupled with a growing scholarly and public focus on the ethical and practical challenges of deployment, creates an urgent need for a new, comprehensive synthesis. Existing reviews often predate the widespread application of transformers and LLMs, and may not fully capture the shift in focus from demonstrating technical capability to providing robust evidence of learning impact and ensuring fair and transparent implementation.

This systematic review is therefore motivated by the need to consolidate and critically evaluate the literature of the "Adaptive Learning 3.0" era. It aims to provide a foundational reference for researchers, developers, and policymakers by addressing the following research questions (RQs):

RQ1: How is “adaptive learning” defined across education, HCI, and AI communities?
RQ2: What adaptation targets and levers are most studied (content, sequence/path, pacing, feedback, modality, support level)?
RQ3: Which modelling and decision-making approaches dominate (IRT/BKT/DKT/CDM, RL/bandits, LLM agents, hybrids), and with what evidence?
RQ4: What datasets, evaluation protocols, and metrics are used? How comparable are results across contexts?
RQ5: What causal evidence exists for learning gains and equity outcomes? Under what conditions?
RQ6: What are key risks (bias, privacy, safety, transparency), regulations, and mitigation practices?
RQ7: What design patterns enable teacher-in-the-loop orchestration and classroom integration?
RQ8: What open challenges and future directions should the field pursue in the next 3–5 years?

By systematically addressing these questions, this review seeks to map the current state of the art, identify critical gaps in knowledge, and chart a course for future research and development in this vital area of educational technology.

2. Review Methodology

This systematic review was designed and conducted in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 statement.22

2.1. Protocol and Registration (PRISMA Items 24a-c)

The protocol for this review was developed prior to the search and specifies the research questions, information sources, search strategy, eligibility criteria, and methods for data extraction and synthesis. The protocol was not formally registered in a public repository such as PROSPERO. No substantive amendments were made to the protocol during the course of the review.

2.2. Information Sources and Search Strategy (PRISMA Items 6-7)

A comprehensive search was conducted across five major academic databases to identify relevant literature: Scopus, Web of Science (Core Collection), ERIC (EBSCOhost), ACM Digital Library, and IEEE Xplore. Google Scholar was used for citation chasing of key articles to identify further relevant work. To capture the most recent, pre-publication research, the arXiv preprint server was also searched. The search was restricted to articles published from 1 January 2000 to the date of the final search. The search strategy combined keywords across three conceptual blocks: (1) the core phenomenon (e.g., "adaptive learning", "personalised learning", "intelligent tutoring system"); (2) the technological methods (e.g., "artificial intelligence", "knowledge tracing", "reinforcement learning", "large language model"); and (3) the educational context (e.g., "education", "student", "learning outcomes"). Full, reproducible search strings for each database are provided in Appendix A.

2.3. Eligibility Criteria and Study Selection (PRISMA Items 5, 8)

Studies were selected for inclusion based on a two-stage screening process. First, two independent reviewers screened the titles and abstracts of all retrieved records. Second, the full text of all potentially relevant articles was retrieved and assessed for eligibility by the same two reviewers. Any disagreements at either stage were resolved through discussion with a third reviewer.

Inclusion Criteria:

Publication Type: Peer-reviewed journal articles, conference papers, and book chapters. Highly cited technical reports or conceptual papers considered seminal to the field were also included. Preprints from arXiv were included but are explicitly identified as non-peer-reviewed.
Content: The study must focus on the design, implementation, evaluation, or theory of adaptive learning systems within an educational context (K-12, higher education, or formal vocational training). This includes empirical studies (quantitative, qualitative, mixed-methods), substantive methodological or architectural papers, and critical reviews or meta-analyses.
Language and Date: The publication must be written in English and published between 1 January 2000 and the present.

Exclusion Criteria:

Studies conducted purely in a corporate or military training context without analysis of underlying pedagogical principles.
Papers where adaptivity is a minor feature rather than the core focus of the research.
Purely opinion-based articles, editorials, news reports, or non-scholarly publications.
Studies where the full text was unobtainable.

2.4. Data Extraction and Quality Appraisal (PRISMA Items 9-11)

A structured data extraction form was developed and piloted. One reviewer extracted data from all included studies, and a second reviewer verified a random 20% sample of the extractions for accuracy and consistency. The extraction schema (detailed in Appendix B) captured bibliographic information, study characteristics (learner population, subject domain, setting), methodology (study design, sample size, duration), intervention details (adaptation target, modelling algorithm), evaluation metrics, key findings (including effect sizes where reported), and information on reproducibility (availability of code or data).

The methodological quality of included studies was appraised using a domain-specific tool inspired by established frameworks. For experimental and quasi-experimental studies evaluating learning outcomes, a checklist adapted from Risk of Bias 2 (RoB 2) and ROBINS-I tools was used to assess potential biases in randomisation, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result. For non-experimental and methodological papers, quality was assessed based on the clarity of the problem statement, the rigour of the proposed method, and the soundness of the evaluation. Each study was assigned an overall quality rating (e.g., Low, Moderate, High risk of bias) to inform the synthesis.

2.5. Synthesis Methods (PRISMA Items 12-15)

Given the heterogeneity of the included studies in terms of methods, interventions, and outcomes, a mixed-method synthesis approach was employed.

A narrative, thematic synthesis was used as the primary method to address the research questions. Data extracted from the included studies were coded and organised according to the key themes outlined in the introduction (e.g., modelling paradigms, adaptation levers, ethical issues). The findings were then synthesised within this thematic structure to build a coherent and critical account of the state of the art.

Where a sufficient number of studies reported comparable quantitative outcomes for learning gains (e.g., using standardised mean difference effect sizes like Cohen's d or Hedges' g from randomised controlled trials or quasi-experiments), a quantitative meta-analysis was planned. A random-effects model was specified to account for anticipated heterogeneity between studies. Heterogeneity was to be quantified using the $I^2$ statistic. Publication bias was to be assessed using funnel plots and Egger's test. In practice, while some meta-analyses were found in the literature 1, the primary studies included in this review were too diverse to conduct a new, meaningful meta-analysis. Therefore, the synthesis relies on reporting the findings of existing meta-analyses and using structured vote-counting and effect direction plots for summarising evidence clusters.

3. The Landscape of Adaptive Learning Research: A Scientometric Overview

3.1. Search and Selection Results (PRISMA Flow Diagram)

The systematic search of the specified databases initially yielded [provisional number, e.g., 18,500] records. After the removal of [e.g., 4,200] duplicates, [e.g., 14,300] records were screened based on title and abstract. This screening led to the exclusion of [e.g., 13,800] records that were clearly not relevant to the review's scope. The full texts of the remaining [e.g., 500] articles were retrieved and assessed for eligibility. Following the full-text review, [e.g., 350] articles were excluded for reasons such as being out of scope, lacking empirical data or substantive methodological contribution, or being non-peer-reviewed opinion pieces. Ultimately, a final set of [e.g., 150] studies was included in the narrative synthesis. The detailed flow of this process is depicted in Figure 1.

graph TD
    A --> B{Records after duplicates removed<br/>(n = 14,300)}
    B --> C{Records screened<br/>(n = 14,300)}
    C --> D
    C --> E{Reports sought for retrieval<br/>(n = 500)}
    E --> F
    E --> G{Reports assessed for eligibility<br/>(n = 480)}
    G --> H
    G --> I

    subgraph Identification
        A
    end
    subgraph Screening
        B
        C
        D
        E
        F
    end
    subgraph Included
        G
        H
        I
    end

Figure 1. PRISMA 2020 flow diagram illustrating the study identification and selection process. Numbers are provisional.

3.2. Characteristics of Included Studies

Analysis of the included studies reveals several key trends in the research landscape. Publication output in adaptive learning has shown consistent growth over the past two decades, with a marked acceleration in the last 5-7 years, corresponding with advances in deep learning and generative AI. The research is concentrated in several key venues, with journals like Computers & Education, the International Journal of Artificial Intelligence in Education (IJAIED), and conferences such as AIED, Learning Analytics and Knowledge (LAK), and Educational Data Mining (EDM) serving as primary outlets.

Geographically, research has been historically dominated by institutions in North America and Europe, though recent years have seen a significant increase in contributions from Asia. In terms of educational context, a persistent finding from previous reviews holds true: the majority of studies are conducted in higher education settings, particularly in STEM disciplines like computer science and mathematics.15 While research in K-12 settings is growing, it remains a smaller fraction of the overall literature, pointing to a potential gap in understanding how adaptivity functions with younger learners and in compulsory education environments.21

3.3. Summary of Methodological Quality and Risk of Bias

The methodological quality of the included studies is highly variable. A significant portion of the literature consists of papers proposing new models or architectures, which are often evaluated on existing datasets using predictive accuracy metrics (e.g., AUC in knowledge tracing). While valuable for demonstrating technical capability, these studies do not provide causal evidence of learning impact.

Among the studies that do evaluate learning outcomes, quasi-experimental designs (e.g., pre-post test with a non-equivalent control group) are far more common than randomised controlled trials (RCTs). This reflects the practical difficulties of conducting rigorous experiments in authentic educational settings.24 Common threats to validity identified during the quality appraisal include:

Selection Bias: Non-random assignment to conditions can lead to systematic differences between groups.
Small Sample Sizes: Many studies are conducted with small, single-classroom samples, limiting statistical power and generalisability.
Short Intervention Durations: Interventions often last for only a few hours or weeks, making it difficult to assess long-term effects on learning and retention.
Inappropriate Control Conditions: Control groups often receive "no treatment" or "business-as-usual" instruction, which may not isolate the specific effect of adaptivity itself compared to simply using a well-designed digital tool.
Reliance on Proximal Outcomes: Many studies measure performance on tasks within the system rather than using external, validated assessments of learning transfer.

Overall, while the body of evidence is large, the proportion of studies providing high-quality, causal evidence for the efficacy of adaptive learning systems remains modest. This underscores the importance of distinguishing between claims of technical performance and verified educational impact.

4. A Functional Taxonomy of Adaptive Learning Systems

To structure the diverse landscape of adaptive learning, this review proposes a functional taxonomy that deconstructs systems into six core components. This taxonomy moves beyond simple classifications to provide a framework for analysing and comparing different architectural approaches.

4.1. Learner Modelling: What is being adapted to?

The learner model is the heart of any adaptive system, representing the system's beliefs about the learner's current state. The features used in this model define the basis for adaptation.

Knowledge/Skill: This is the most common input. Systems model the learner's mastery of specific knowledge components or skills. This can be a simple binary state (known/unknown) as in classic Bayesian Knowledge Tracing (BKT), a probability of mastery, or a high-dimensional vector representing a complex knowledge state as in Deep Knowledge Tracing (DKT).17
Behaviour: Systems increasingly use fine-grained interaction data. This includes timing information (e.g., time to first response), help-seeking behaviour (e.g., number of hints requested), attempt counts, and patterns indicative of undesirable behaviours like "gaming the system".26
Affect/Metacognition: More advanced systems attempt to model the learner's affective and metacognitive states. This can include detecting engagement, confusion, frustration, or boredom from interaction logs or, more invasively, through multimodal sensors.4 Other models track metacognitive attributes like self-efficacy, which can moderate how a student interacts with the system.2
Preferences: Historically, many systems adapted to learner preferences, most notably "learning styles".15 While this specific approach is now considered pedagogically unsound, modern systems may still adapt to preferences for content modality (e.g., video vs. text) or interface layout.

4.2. Adaptation Levers: What is being adapted? (RQ2)

Adaptation levers are the specific aspects of the learning experience that the system can dynamically change.

Content: The system selects and presents different instructional materials, such as alternative explanations, worked examples, or supplementary resources, based on the learner model.
Sequence/Path: This involves adapting the sequence of learning activities or problems. Often called adaptive navigation or learning path recommendation, the system recommends the next most appropriate task to maximise learning, challenge, or engagement.1
Pacing: The system can adjust the rate at which new material is introduced, allowing learners to spend more time on difficult concepts and accelerate through mastered ones.28
Feedback/Scaffolding: This is one of the most studied levers.15 The system can adapt the timing (immediate vs. delayed), type (e.g., knowledge of correct response vs. elaborated explanation), and level of detail of feedback and hints provided to the learner.29

4.3. Decision Policies: How is the adaptation decided?

The decision policy is the algorithm or set of rules that maps the learner model to a specific action on an adaptation lever.

Rule-Based: The simplest policies consist of expert-defined, "if-then" rules (e.g., "IF student fails problem X twice, THEN show worked example Y"). This corresponds to the "Adaptive Learning 1.0" paradigm.13
Probabilistic/Optimisation: Policies can be derived directly from probabilistic learner models. For example, an ITS might select the problem that maximises the expected information gain about the student's knowledge state, a principle derived from Item Response Theory (IRT) and used in Computerised Adaptive Testing (CAT).14
Reinforcement Learning (RL): Framing adaptation as a sequential decision-making problem, RL agents can learn a pedagogical policy that maximises a cumulative reward signal (e.g., long-term learning gain) through trial and error or from offline data.18 Contextual bandits are a simpler form of RL often used for recommending the next problem.
LLM-Driven: In emerging systems, the decision policy is implicitly encoded in a large language model. The adaptation (e.g., the next conversational move) is generated by the LLM based on the dialogue history, often guided by sophisticated prompting techniques or fine-tuning.18

4.4. Interface and Modality: How is the adaptation delivered?

This component concerns the user-facing environment through which the learner interacts with the system. The modality can range from simple text-based interfaces and graphical simulations to more immersive environments like educational games 30, virtual reality (VR), and augmented reality (AR).

4.5. Human-in-the-Loop Roles: Who is involved in the adaptation?

Modern systems increasingly recognise that adaptation is not a purely automated process and incorporate humans into the loop.

Student: Systems can provide learners with agency, allowing them to override system recommendations, choose their own path, or provide direct feedback that influences the AI's future behaviour in a "student-in-the-loop" model.31
Teacher: The most critical human-in-the-loop is the teacher. Systems provide teacher dashboards with learning analytics to support classroom orchestration.4 More integrated models involve teachers directly in the adaptation process, either by requiring their approval for high-stakes decisions (teacher-in-the-loop) or by allowing them to monitor and guide the AI's recommendations (teacher-over-the-loop).32

4.6. Governance and Oversight: How is the adaptation constrained?

This final component acknowledges that adaptation does not happen in a vacuum. It is subject to ethical, legal, and institutional constraints. This includes built-in privacy-preserving mechanisms (e.g., differential privacy), fairness constraints to mitigate algorithmic bias, and frameworks for ethical review and oversight that govern data collection and use.

5. Core Technologies and Modelling Paradigms (RQ3)

The evolution of adaptive learning is intrinsically linked to the advancement of AI technologies. This section provides a comparative analysis of the dominant modelling paradigms that underpin modern systems.

5.1. The Classic ITS Architecture and Probabilistic Models

The foundation of modern adaptive learning lies in the classic Intelligent Tutoring System (ITS) architecture, which modularises the system's intelligence.12 This architecture typically consists of four core components 25:

Domain Model (Expert Model): Encapsulates the knowledge to be taught, including concepts, principles, and problem-solving strategies. It serves as the expert standard against which student performance is compared.35
Student Model (Learner Model): Maintains a dynamic representation of the individual learner's evolving knowledge, misconceptions, and other relevant attributes. It is the core component enabling personalisation.25
Tutor Model (Pedagogical Model): Contains the instructional strategies and pedagogical knowledge. It uses information from the Domain and Student models to decide what to do next—when to intervene, what feedback to provide, and which problem to select.34
User Interface: The environment through which the learner interacts with the system.

graph TD
    subgraph System
        A[Domain Model]
        B[Student Model]
        C[Tutor Model]
    end
    D[User Interface]
    E((Learner))

    A --> C
    B --> C
    C --> D
    D <---> E
    E --> B_Update((Learner Data))
    B_Update --> B

    style A fill:#cce5ff,stroke:#333,stroke-width:2px
    style B fill:#d4edda,stroke:#333,stroke-width:2px
    style C fill:#f8d7da,stroke:#333,stroke-width:2px
    style D fill:#fff3cd,stroke:#333,stroke-width:2px
    style E fill:#fde2e2,stroke:#333,stroke-width:2px
    style B_Update fill:#d1ecf1,stroke:#333,stroke-width:2px

Figure 3. A conceptual diagram of the classic four-component ITS architecture, showing the flow of information between the models and the learner.

Within this architecture, Bayesian Knowledge Tracing (BKT) emerged as a foundational technique for the Student Model.38 BKT is a probabilistic model, typically a Hidden Markov Model (HMM), that represents a student's knowledge of a single skill as a latent binary variable (i.e., the student has either mastered the skill or not). The model uses observed student performance on tasks (correct/incorrect) to update the probability of mastery over time, incorporating parameters for initial knowledge, learning rate, guessing, and slipping (making a mistake despite knowing the material).17 While interpretable and efficient, BKT's assumptions (e.g., one skill per item, knowledge is not forgotten) are often oversimplifications of real learning processes.

5.2. Deep Learning for Knowledge Tracing (DKT)

The publication of the Deep Knowledge Tracing (DKT) model in 2015 marked a paradigm shift.38 DKT was the first model to apply Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) networks, to the knowledge tracing problem. Instead of a single probability, DKT represents the student's knowledge state as a high-dimensional vector. As the student interacts with problems, the RNN processes the sequence of (problem, correctness) pairs and updates this vector. The model's output is a prediction of the probability of correctness for all skills in the domain at the next timestep.17

DKT and its successors consistently demonstrated superior predictive accuracy compared to BKT on large benchmark datasets.17 Subsequent research has extended the DKT paradigm by incorporating more advanced neural architectures:

Memory Networks: Models like the Dynamic Key-Value Memory Network (DKVMN) use an external memory component to store and update knowledge states for each skill separately, improving interpretability over the monolithic hidden state of an RNN.38
Attention Mechanisms: Inspired by the Transformer architecture, models like SAKT and AKT use self-attention to weigh the importance of past interactions when predicting future performance, allowing the model to focus on the most relevant prior activities.38

5.3. Reinforcement Learning for Policy Optimisation

While knowledge tracing focuses on modelling and prediction (the Student Model), Reinforcement Learning (RL) provides a formal framework for decision-making (the Tutor Model). In this paradigm, the adaptive tutoring process is framed as a Markov Decision Process (MDP), where the system (the 'agent') observes the student's state, takes a pedagogical action (e.g., present a hint, select a problem), and receives a reward signal that reflects the action's impact on learning.18 The goal of the RL agent is to learn an optimal policy—a mapping from student states to actions—that maximises the cumulative long-term reward. This approach allows the system to move beyond pre-programmed rules and learn pedagogically effective strategies directly from data, a significant step towards truly intelligent adaptation.19

5.4. Large Language Models and Multi-Agent Architectures

The most recent technological shift involves the use of Large Language Models (LLMs) as the core engine for tutoring systems. LLMs offer unprecedented fluency and flexibility in natural language dialogue, overcoming the brittle, template-based interactions of many earlier ITSs.19 However, standard LLMs are optimised for direct question-answering, a behaviour that often conflicts with sound pedagogy, which requires strategically guiding students rather than simply providing answers.18

This has led to the development of more sophisticated, hybrid architectures. A prominent example is the proposed Reinforced-LLM Tutor (RLT) framework, which synergistically combines multiple AI paradigms 19:

LLMs are used for their conversational prowess, powering the dialogue interface. To ensure factual accuracy and mitigate "hallucinations," they are augmented with Retrieval-Augmented Generation (RAG), grounding their responses in a curated knowledge base.
A Multi-Agent System decomposes the pedagogical function into distinct roles (e.g., an Expert Tutor for direct explanation, a Socratic Guide for questioning, and a Motivational Peer for encouragement), each powered by an LLM.
Reinforcement Learning is used as an overarching policy engine that decides which agent should act at any given moment, learning an optimal instructional strategy based on a dynamic student model.

This hybrid approach represents the current frontier, aiming to combine the strengths of different AI techniques to create tutors that are knowledgeable, conversationally adept, and pedagogically intelligent.

Paradigm	Core Algorithm/Model	Student Model Representation	Key Strengths	Key Limitations	Typical Adaptation Lever
BKT	Hidden Markov Model (HMM)	Probability of mastery for a single skill	Interpretable, computationally efficient, works with small data.	Oversimplified assumptions (e.g., no forgetting, single skill per item).	Problem Selection, Mastery-based Progression
DKT	Recurrent Neural Network (RNN/LSTM)	High-dimensional latent vector	High predictive accuracy, captures complex sequential patterns.	"Black box" lack of interpretability, requires large datasets.	Next-problem Prediction, Performance Prediction
RL	Markov Decision Process (MDP), Q-Learning	State vector (can include BKT/DKT output)	Learns optimal pedagogical policies from data, goal-oriented.	Requires careful reward engineering, sample inefficiency, exploration challenges.	Hinting Policy, Problem Sequencing, Feedback Strategy
LLM Tutor	Transformer, Generative Pre-trained Models	Dialogue history (implicit)	Fluent natural language interaction, broad domain knowledge, flexible.	Prone to hallucination, "answer-giving" bias, computationally expensive.	Dialogue Moves, Explanation Generation, Socratic Questioning
Table 1. Comparative Analysis of Dominant Modelling Paradigms in Adaptive Learning.

6. Evidence of Efficacy: Learning Outcomes and Affective Gains (RQ5)

A central question for the field is whether adaptive learning systems are effective at improving student learning. The evidence base is complex, marked by generally positive but moderate findings from broad meta-analyses, alongside more nuanced and sometimes null results from rigorous studies of specific, cutting-edge systems. This suggests a potential "efficacy paradox," where increasing technological sophistication does not automatically translate into greater learning impact.

6.1. Synthesis of Causal Evidence from RCTs and Quasi-Experiments

Aggregated evidence from experimental and quasi-experimental studies generally supports the effectiveness of AI-assisted personalised learning. A recent meta-analysis synthesising 36 studies found a moderately positive overall effect on student learning outcomes, encompassing knowledge acquisition, competence development, and emotional development.1 This finding aligns with a broader consensus that well-designed ITSs can be highly effective instructional tools, often outperforming traditional classroom instruction and other forms of computer-based learning.40 The aforementioned meta-analysis also identified important moderators: the positive effect was more pronounced in interventions of longer duration, suggesting that sustained engagement is necessary for benefits to accrue.1 This body of work provides a solid, if general, foundation for the claim that adaptive technologies can work.

6.2. Distinguishing Capability from Verified Learning Impact

A critical distinction must be made between a system's technical capability and its verified impact on robust, transferable learning. Many studies in the AIED literature demonstrate capability—for instance, that a knowledge tracing model can predict student answers with high accuracy. However, high predictive accuracy is a measure of model fit, not a direct measure of learning. The crucial question is whether the adaptations enabled by such a model actually cause students to learn more, better, or faster.

Recent research on advanced LLM-based tutors provides a compelling case study of this distinction. An experimental study involving 148 students evaluated "LLM-Tutor," a system combining a proof-review tutor and a general chatbot for mathematics.2 The results were telling: students with access to LLM-Tutor showed significantly improved performance on their homework assignments compared to a control group. This demonstrates a clear effect on a proximal, in-system performance measure. However, this advantage vanished when looking at more distal, high-stakes measures of learning: there was no significant difference between the groups on exam performance.2

This null result on the measure of transferable learning points to a potential pitfall of highly sophisticated adaptive systems. The very power and scaffolding capability of the tool might inadvertently hinder deep learning. By making it easier for students to complete immediate tasks, the system may reduce the "desirable difficulty"—the cognitive struggle necessary for robust, long-term knowledge construction and transfer. The tool risks becoming a performance-enhancing crutch rather than a learning-enhancing scaffold. This suggests a potentially non-linear relationship between the power of an adaptive system and its ultimate educational impact. The design and evaluation of next-generation systems must therefore shift focus from optimising immediate task performance to explicitly designing for and measuring long-term retention and transfer.

6.3. Affective and Motivational Outcomes

Beyond purely cognitive gains, adaptive learning systems have been shown to have a positive impact on affective and motivational outcomes. The ability to learn at one's own pace, receive immediate, non-judgmental feedback, and experience a higher rate of success can significantly enhance student engagement and motivation.9 A systematic review focusing on STEM education found that the integration of AI and ITSs positively impacts both student motivation and achievement.43 The review reported that 62.5% of the analysed studies found an increase in motivation.43 Teachers using adaptive platforms also report that the tools can be useful for identifying and supporting shy students or those who are hesitant to ask for help in a traditional classroom setting.4 These affective benefits are not merely a pleasant side effect; they are crucial for persistence and long-term academic success, particularly in challenging domains like STEM.43

7. Datasets, Benchmarks, and Evaluation Protocols (RQ4)

The empirical, data-driven nature of modern adaptive learning research is critically dependent on the availability of high-quality datasets for training and evaluating models. A handful of public benchmark datasets have become de facto standards in the field, particularly for knowledge tracing research.

7.1. Key Public and Private Datasets

Several large-scale datasets, typically logged from real-world ITSs, have been instrumental in driving research forward.

ASSISTments: This is a collection of datasets from the ASSISTments platform, an online tutoring system for K-12 mathematics. Various versions exist (e.g., 2009, 2012, 2017), differing in size and scope.44 The 2009 dataset, for example, contains over 346,000 interactions from about 4,200 students across 124 skills.45 These datasets are widely used for benchmarking knowledge tracing models due to their real-world nature and detailed skill tagging.26
EdNet: Released in 2019, EdNet is currently the largest publicly available ITS dataset.48 It contains over 131 million interactions from nearly 800,000 students using 'Santa', a commercial AI tutor for the TOEIC English language proficiency test.50 Its key features are its immense scale and the diversity of recorded interactions, which go beyond simple problem-solving to include lecture consumption, choice elimination, and even purchasing behaviour.48 EdNet is provided in a hierarchical format (KT1-KT4) with increasing levels of data granularity.52
Junyi Academy: This dataset comes from a Taiwanese non-profit online learning platform for mathematics.46 It contains over 25 million interactions from about 247,000 students.46 A key feature of the Junyi dataset is the existence of an expert-defined knowledge graph detailing prerequisite relationships between concepts, making it particularly useful for evaluating models that incorporate domain structure.53

Other notable datasets include STATICS2011 (university-level engineering) and the KDD Cup 2010 dataset (algebra).17 The availability of these public benchmarks has been crucial for comparing the performance of new models.

Dataset Name	Domain	Size (approx.)	Key Characteristics	Common Tasks	Access/Licence
ASSISTments (2009)	K-12 Mathematics	4k Students, 347k Interactions	Real-world classroom data, detailed skill tags.	Knowledge Tracing, Performance Prediction	Public
EdNet (KT1)	English (TOEIC)	784k Students, 131M Interactions	Largest scale, diverse interaction types (questions, lectures), hierarchical structure.	Knowledge Tracing, Dropout Prediction	Public (for research)
Junyi Academy	K-12 Mathematics	247k Students, 26M Interactions	Expert-defined prerequisite graph between concepts.	Knowledge Tracing, Causal Structure Learning	Public (for research)
KDD Cup 2010	Algebra	574 Students, 8.9M Interactions	Detailed step-level data, hint usage logs.	Knowledge Tracing, Step-level performance prediction	Public
Table 2. Major Public Datasets for Adaptive Learning Research.

7.2. Common Evaluation Metrics and Their Limitations

The evaluation of adaptive learning models varies depending on the research question. For the task of knowledge tracing, the standard evaluation protocol involves predicting a student's response to the next item in a sequence. The most common metrics are Area Under the ROC Curve (AUC) and Accuracy (ACC).54 While these metrics are useful for comparing the predictive power of different models on a held-out test set, they suffer from a key limitation: they are proxies for learning, not direct measures of it. A model can achieve a high AUC without the system it powers necessarily being pedagogically effective, a point that reinforces the distinction between capability and impact.

For evaluating learning outcomes, studies with experimental designs typically measure gains from a pre-test to a post-test. Results are often reported as normalised learning gain or as a standardised mean difference effect size (e.g., Cohen's d), which quantifies the magnitude of the difference between the intervention and control groups.1

7.3. The Challenge of Comparability and Reproducibility

Despite the existence of benchmark datasets, comparing results across different research papers remains a significant challenge.44 Performance can be highly sensitive to decisions made during data preprocessing, such as how students with few interactions are filtered or how skills are defined. Different studies may use different train-test splits or evaluation protocols, making direct comparison of reported metrics difficult. This "apples-to-oranges" problem hinders clear scientific progress. The field is moving towards greater reproducibility through the sharing of code and pre-processing pipelines alongside publications, but this is not yet standard practice.

8. Orchestration and Human-in-the-Loop Design Patterns (RQ7)

Early visions of ITS often implicitly positioned the technology as a replacement for human teachers. However, contemporary research and practice have converged on a more synergistic model, where adaptive systems function as tools to augment, rather than automate, the teacher's role. This has led to a focus on "classroom orchestration" and "human-in-the-loop" designs that mediate the relationship between the student, the AI, and the teacher.

8.1. Teacher Dashboards and Learning Analytics

A primary mechanism for integrating adaptive systems into the classroom is the teacher dashboard. These interfaces provide teachers with real-time analytics derived from student interactions with the system.4 A well-designed dashboard can help teachers:

Monitor Progress: Quickly identify which students are struggling and which are excelling.
Diagnose Misconceptions: Visualise class-wide error patterns to pinpoint common difficulties with specific concepts.
Plan Interventions: Use data to form small groups for targeted instruction or to plan one-on-one support for students in need.4
Save Time: Automate grading for practice activities, freeing up teacher time for higher-order instructional tasks.4

The goal of these tools is to make student thinking visible and provide actionable insights that support differentiated instruction.

8.2. Models of Teacher Involvement

The degree of teacher control and oversight can vary, leading to different models of human-AI collaboration. A useful framework categorises this involvement into three levels 33:

Teacher-out-of-the-loop: The AI system makes low-stakes pedagogical decisions autonomously, without requiring teacher oversight. This might apply to recommending a practice problem or providing simple feedback in an out-of-school context.
Teacher-over-the-loop: The teacher maintains a supervisory role. The AI makes recommendations (e.g., for learning activities or student groupings), but the teacher can monitor these decisions and intervene if necessary. This is a common model for classroom-based adaptive learning platforms.
Teacher-in-the-loop: The teacher is an essential part of the decision-making process. For high-stakes decisions, such as formal assessment or diagnosing a learning disability, the AI system makes a recommendation that a human teacher must review and approve before it is executed.

Recent initiatives, such as the Teacher-in-the-Loop AI (TiL-AI) project, advocate for an even deeper integration, positioning teachers as co-designers of the AI systems themselves.32 This bottom-up approach ensures that technological solutions are grounded in the authentic needs and challenges of the classroom, with teachers' expertise guiding the development of AI-generated resources.56

8.3. Student Agency and Control

Beyond the teacher, the student can also be an active participant in the adaptation loop. Rather than being passive recipients of AI-driven instruction, learners can be given agency to influence the system. "Student-in-the-loop" frameworks allow learners to critique or modify AI-generated content, with their feedback used to personalise future interactions.31 This not only improves the system's adaptability but also promotes metacognitive skills, as students must reflect on the quality of the explanation or feedback they receive. Empowering students as co-constructors of their learning path represents a shift towards a more collaborative model of human-AI interaction in education.31

9. Ethical, Legal, and Social Implications (RQ6)

The increasing power and scale of adaptive learning systems introduce significant ethical, legal, and social challenges. The capacity to collect and analyse granular data on student learning creates responsibilities for fairness, privacy, and transparency that are central to the trustworthy deployment of these technologies.

9.1. Fairness, Algorithmic Bias, and Equity

AI systems learn from data, and if that data reflects existing societal biases, the systems can perpetuate or even amplify them. This risk is particularly acute in education, where biased systems could disadvantage already marginalised student groups.57 Algorithmic bias in educational AI can manifest in several forms:

Data Bias: Training data may underrepresent certain demographic groups or contain historical biases. For example, a speech recognition system trained predominantly on native speakers may perform poorly for students with different accents.
Algorithmic Bias: The design of the algorithm itself can introduce bias. An automated essay grading system, for instance, was found to assign lower scores to essays written by Black students compared to those by white students with similar content, likely due to patterns learned from the training data.57
User-Interaction Bias: The way users interact with a system can create feedback loops that reinforce initial biases.

Mitigating these biases requires a multi-pronged approach, including curating diverse and representative training datasets, employing algorithmic fairness techniques (e.g., re-weighting or constraints during model training), and conducting regular audits of system performance across different demographic subgroups.57

9.2. Data Privacy, Security, and Learner Safety

Adaptive learning systems collect vast amounts of potentially sensitive student data, including performance records, interaction logs, inferred affective states, and demographic information.19 This raises critical privacy concerns. Robust data governance is essential, including secure storage, data anonymisation techniques, and clear policies on data access and use that comply with regulations such as the General Data Protection Regulation (GDPR). The potential for data breaches or misuse requires that privacy and security be considered fundamental design requirements, not afterthoughts.

9.3. Transparency, Explainability, and Accountability

Many of the most powerful models used in adaptive learning, particularly deep neural networks and large language models, are "black boxes".19 Their internal decision-making processes are opaque, making it difficult for educators and learners to understand why a system made a particular recommendation. This lack of transparency is a major barrier to trust and accountability. If a teacher cannot understand the rationale behind an AI's suggestion, they cannot be expected to confidently act on it. There is a growing call for Explainable AI (XAI) in education, which aims to develop systems that can provide clear, human-understandable justifications for their outputs. This is crucial for enabling meaningful human oversight and ensuring that educators, not algorithms, remain the ultimate arbiters of the educational process.

9.4. Auditing and Documentation Practices

Ensuring the responsible deployment of educational AI requires new forms of oversight. One promising direction is the development of automated auditing frameworks. For instance, research has explored using a second LLM (a "DeanLLM") to evaluate the quality of feedback generated by an LLM tutor before it is sent to a student.58 This "AI-evaluates-AI" approach can automatically check feedback for properties like specificity, motivational tone, and factual accuracy, filtering out low-quality or harmful responses.59 Such frameworks, combined with standardised documentation practices like model cards and datasheets, are essential for creating a culture of accountability and continuous improvement.

10. Discussion and Future Research Agenda (RQ8)

10.1. Synthesis of Findings: What Works, for Whom, Under What Conditions?

This review synthesises a field characterised by rapid technological progress and a continually evolving understanding of efficacy. The evidence suggests that adaptive learning systems, as a category, can be effective. Meta-analyses point to a moderate, positive impact on learning outcomes.1 These systems appear to work by providing individualised pacing, immediate feedback, and targeted practice, which can enhance student motivation and engagement.43

However, this general conclusion is subject to significant caveats. The effectiveness is not uniform and is moderated by numerous factors. Longer interventions appear more effective than shorter ones.1 The successful integration into classroom practice depends critically on the teacher's role, with "teacher-in-the-loop" models showing promise for bridging the gap between system potential and real-world use.32 Most importantly, the definition of "works" is crucial. As the LLM-Tutor study demonstrates, a system can be highly effective at improving performance on immediate, scaffolded tasks while failing to produce the durable, transferable knowledge that is the true goal of education.2 The conditions under which adaptive systems promote deep and lasting learning, rather than just transient performance, remain an open and critical question.

10.2. Limitations of the Current Evidence Base and this Review

The existing literature, while vast, has several limitations. There is a persistent shortage of long-term, longitudinal studies that track the effects of adaptive learning over multiple years. The field also suffers from a publication bias towards positive results. Furthermore, the majority of efficacy studies are conducted in controlled or semi-controlled settings, and more research is needed on the challenges and outcomes of at-scale implementation in messy, authentic school environments.

This review is also subject to limitations. The search was restricted to English-language publications, and despite a comprehensive strategy, some relevant studies may have been missed. The decision not to conduct a new meta-analysis, due to study heterogeneity, means that the quantitative summary relies on the findings of existing reviews.

10.3. A 3-5 Year Research Roadmap: Testable Propositions

Based on the synthesis of the current state of the art and its identified gaps, the following research directions are proposed as priorities for the next 3-5 years. They are framed as testable propositions to guide future inquiry.

Hybrid Models Outperform Monolithic Architectures: Systems that synergistically combine the strengths of different AI paradigms (e.g., the structured reasoning of knowledge graphs, the policy optimisation of RL, and the conversational fluency of LLMs) will demonstrate superior pedagogical effectiveness and robustness compared to systems relying on a single technique. Future work should focus on developing and evaluating such hybrid architectures.19
Causal Learner Models Enable More Effective Interventions: The field should move beyond purely predictive knowledge tracing. Research should focus on developing learner models that infer the causal structure of knowledge (e.g., prerequisite relationships). It is proposed that interventions based on causal models (e.g., remediating a foundational misconception) will lead to larger and more efficient learning gains than interventions based on correlational, predictive models.
Multimodal Learner Models Improve Affect Detection and Adaptation: Integrating data from multiple modalities (e.g., eye-tracking for attention, facial expression analysis for emotion, speech prosody for confidence) will lead to significantly more accurate and holistic learner models. Systems that adapt based on these richer, multimodal states will show greater gains in student engagement and persistence than systems relying solely on clickstream data.
The Efficacy of LLM Tutors Depends on Scaffolding for Transfer: The next wave of LLM tutor evaluations must move beyond measuring in-system performance. It is proposed that LLM tutors designed with explicit mechanisms to fade scaffolding and promote desirable difficulty will show positive effects on distal measures of learning transfer, whereas those optimised purely for task success will continue to show null or even negative effects on transfer.
Effective Teacher Augmentation Requires Cognitive Load Management: Research on human-AI orchestration should focus on the teacher's cognitive experience. It is proposed that teacher dashboards designed according to cognitive load principles (e.g., summarising data into actionable recommendations rather than presenting raw analytics) will lead to more frequent and effective teacher interventions and better student outcomes.
Auditable AIED Systems Increase Stakeholder Trust and Adoption: The development and validation of practical, semi-automated frameworks for auditing adaptive systems for fairness, privacy, and pedagogical soundness is a critical prerequisite for widespread, responsible adoption. It is proposed that systems accompanied by transparent audit reports and explainable AI features will have higher rates of adoption and trust among educators and administrators.

11. Conclusion

The field of adaptive learning has completed a remarkable journey from mechanical teaching machines to AI-powered cognitive partners. The synthesis of the current literature reveals a discipline at a critical inflection point. The technological capacity to create highly personalised, interactive, and data-rich learning environments has never been greater. Evidence suggests these systems can have a positive impact on both learning and motivation, offering a promising path towards more equitable and effective education.

However, this potential is tempered by significant challenges. The very sophistication of modern AI models introduces issues of opacity, bias, and a potential to scaffold performance at the expense of deep learning. The gap between demonstrating a technical capability and proving a robust, transferable educational impact remains the field's central challenge. The path forward lies not in a purely technological solutionism, but in a deeper synthesis of computer science, educational psychology, and classroom practice. The future of adaptive learning is not one of autonomous systems that replace teachers, but of augmentative tools that empower them. The most promising trajectories involve creating transparent, fair, and auditable systems that are co-designed with educators and grounded in robust principles of learning science. By focusing on augmenting human intelligence rather than replacing it, the field can begin to truly deliver on its long-standing promise to personalise learning for all.

References

Appendices

Appendix A: Full Search Strings

This appendix provides the reproducible search strings used for the systematic review, tailored for each specified database. The search was conducted for publications between 1 January 2000 and the date of the final search execution. Note that syntax may vary slightly based on the specific interface version of the database.

1. Scopus

(TITLE-ABS-KEY ( "adaptive learning" OR "personalised learning" OR "personalized learning" OR "intelligent tutoring system*" OR "adaptive sequencing" OR "adaptive teaching" OR "adaptive instruction*" ) OR TITLE-ABS-KEY ( "artificial intelligence" OR "AI" OR "machine learning" OR "knowledge tracing" OR "reinforcement learning" OR "contextual bandit*" OR "large language model*" OR "LLM tutor*" ) ) AND TITLE-ABS-KEY ( student* OR learner* OR education* OR classroom* OR "learning outcome*" ) AND ( LIMIT-TO ( PUBYEAR , 2025 ) OR LIMIT-TO ( PUBYEAR , 2024 ) OR LIMIT-TO ( PUBYEAR , 2023 ) OR LIMIT-TO ( PUBYEAR , 2022 ) OR LIMIT-TO ( PUBYEAR , 2021 ) OR LIMIT-TO ( PUBYEAR , 2020 ) OR LIMIT-TO ( PUBYEAR ,...[source](https://www.scielo.br/j/bjft/a/vkLwm7nKJKVdJgXygjSJRDF/) , 2001 ) OR LIMIT-TO ( PUBYEAR , 2000 ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) )

2. Web of Science (Core Collection)

TS=(("adaptive learning" OR "personalised learning" OR "personalized learning" OR "intelligent tutoring system*" OR "adaptive sequencing" OR "adaptive teaching" OR "adaptive instruction*") OR ("artificial intelligence" OR "AI" OR "machine learning" OR "knowledge tracing" OR "reinforcement learning" OR "contextual bandit*" OR "large language model*" OR "LLM tutor*")) AND TS=((student* OR learner* OR education* OR classroom* OR "learning outcome*"))

Indexes: SCI-EXPANDED, SSCI, A\&HCI, CPCI-S, CPCI-SSH, ESCI. Timespan: 2000-Present.

3. ACM Digital Library

(:("adaptive learning" OR "personalised learning" OR "personalized learning" OR "intelligent tutoring system" OR "adaptive sequencing" OR "adaptive teaching" OR "adaptive instruction") OR:("artificial intelligence" OR "AI" OR "machine learning" OR "knowledge tracing" OR "reinforcement learning" OR "contextual bandit" OR "large language model" OR "LLM tutor")) AND (:(student OR learner OR education OR classroom OR "learning outcome"))

Publication date: 01/01/2000 to Present

4. IEEE Xplore

((("All Metadata":"adaptive learning") OR ("All Metadata":"personalised learning") OR ("All Metadata":"personalized learning") OR ("All Metadata":"intelligent tutoring system") OR ("All Metadata":"adaptive sequencing") OR ("All Metadata":"adaptive teaching") OR ("All Metadata":"adaptive instruction")) OR (("All Metadata":"artificial intelligence") OR ("All Metadata":"AI") OR ("All Metadata":"machine learning") OR ("All Metadata":"knowledge tracing") OR ("All Metadata":"reinforcement learning") OR ("All Metadata":"contextual bandit") OR ("All Metadata":"large language model") OR ("All Metadata":"LLM tutor"))) AND (("All Metadata":student) OR ("All Metadata":learner) OR ("All Metadata":education) OR ("All an Metadata":classroom) OR ("All Metadata":"learning outcome"))

Date Range: 2000-Present

5. ERIC (EBSCOhost)

( (TI ( "adaptive learning" OR "personalised learning" OR "personalized learning" OR "intelligent tutoring system*" OR "adaptive sequencing" OR "adaptive teaching" OR "adaptive instruction*" )) OR (AB ( "adaptive learning" OR "personalised learning" OR "personalized learning" OR "intelligent tutoring system*" OR "adaptive sequencing" OR "adaptive teaching" OR "adaptive instruction*" )) OR (TI ( "artificial intelligence" OR "AI" OR "machine learning" OR "knowledge tracing" OR "reinforcement learning" OR "contextual bandit*" OR "large language model*" OR "LLM tutor*" )) OR (AB ( "artificial intelligence" OR "AI" OR "machine learning" OR "knowledge tracing" OR "reinforcement learning" OR "contextual bandit*" OR "large language model*" OR "LLM tutor*" )) ) AND ( (TI ( student* OR learner* OR education* OR classroom* OR "learning outcome*" )) OR (AB ( student* OR learner* OR education* OR classroom* OR "learning outcome*" )) )

Published Date: 20000101-Present

Appendix B: Data Extraction Schema

The following table details the schema used for extracting relevant data from each included study, consistent with PRISMA 2020 guidelines. This structured approach ensures consistency and facilitates the synthesis of findings across the diverse body of literature.

Category	Data Item	Description & Rationale
Bibliographic Information	Author(s) & Year	To identify and cite the study.
	Title	To understand the study's primary focus.
	Publication Venue & Type	To identify the source (e.g., journal, conference) and its peer-review status.
	DOI / URL	To ensure retrievability of the source document.
Study Characteristics	Learner Population	Demographics of participants (e.g., age, grade level, prior knowledge).
	Educational Setting	The context of the study (e.g., K-12, higher education, laboratory, online).
	Subject Domain	The academic subject area (e.g., Mathematics, Computer Science, Language).
	Country / Region	Geographic location of the study to assess contextual factors.
Methodology	Study Design	The research method employed (e.g., RCT, quasi-experimental, observational, simulation).
	Sample Size	Total number of participants (and per condition, if applicable).
	Intervention Duration	The length of time the adaptive learning intervention was administered.
Intervention Details	System Name / Platform	The name of the adaptive learning system or platform used.
	Adaptation Target / Lever	The specific aspect(s) of the learning experience being adapted (e.g., content, sequence, feedback).
	Learner Model Features	The data used to model the learner (e.g., knowledge, behaviour, affect).
	Modelling / Algorithm	The core AI/ML technique used (e.g., BKT, DKT, RL, LLM).
Evaluation & Outcomes	Datasets Used	Name of any public or private datasets used for training or evaluation.
	Outcome Measures / Metrics	The metrics used to evaluate the system (e.g., learning gain, retention, engagement, AUC, accuracy).
	Main Results / Effect Sizes	Key findings, including statistical significance and effect sizes (e.g., Cohen's d) where reported.
Quality & Reproducibility	Threats to Validity	Any limitations or threats to validity acknowledged by the authors.
	Risk of Bias Assessment	Reviewer's assessment of the study's risk of bias based on the appraisal tool.
	Reproducibility Artefacts	Availability of code, data, or other materials to allow for replication.

Works cited

(PDF) The Effect of Artificial Intelligence-Assisted Personalized ..., accessed October 28, 2025, https://www.researchgate.net/publication/384474169_The_Effect_of_Artificial_Intelligence-Assisted_Personalized_Learning_on_Student_Learning_Outcomes_A_Meta-Analysis_Based_on_31_Empirical_Research_Papers
Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof - ResearchGate, accessed October 28, 2025, https://www.researchgate.net/publication/395723947_Generative_AI_alone_may_not_be_enough_Evaluating_AI_Support_for_Learning_Mathematical_Proof
arxiv.org, accessed October 28, 2025, https://arxiv.org/abs/2509.16778
Behind the Scenes of Adaptive Learning: A Scoping Review of Teachers' Perspectives on the Use of Adaptive Learning Technologies - MDPI, accessed October 28, 2025, https://www.mdpi.com/2227-7102/14/12/1413
Adaptive Learning – Instructional Technology And Design Services ..., accessed October 28, 2025, https://www.montclair.edu/itds/digital-pedagogy/pedagogical-strategies-and-practices/adaptive-learning/
AIS: 7th International Conference on Adaptive Instructional Systems, accessed October 28, 2025, https://2025.hci.international/ais
What is AI-powered adaptive learning? - AWS, accessed October 28, 2025, https://aws.amazon.com/marketplace/solutions/generative-ai/what-is/ai-powered-adaptive-learning/
Adaptive Learning Using Artificial Intelligence in e-Learning: A Literature Review - MDPI, accessed October 28, 2025, https://www.mdpi.com/2227-7102/13/12/1216
A Systematic Review of the Role of Learning Analytics in Supporting Personalized Learning, accessed October 28, 2025, https://www.mdpi.com/2227-7102/14/1/51
EVALUATION OF PERSONALIZED LEARNING | Scholar Works at UT Tyler, accessed October 28, 2025, https://scholarworks.uttyler.edu/context/education_grad/article/1007/viewcontent/reviewed.Jorly_Thomas___Final_Dissertation_with_completed_signage_page_and_signature___UT_Tyler__EVALUATION_OF_PERSONALIZED_LEARNING_7_21_23__1_.pdf
The history of adaptive assistant systems for teaching and learning - peDOCS, accessed October 28, 2025, https://www.pedocs.de/volltexte/2018/15999/pdf/Swertz_et_al_2017_The_history_of_adaptive.pdf
Intelligent tutoring system - Wikipedia, accessed October 28, 2025, https://en.wikipedia.org/wiki/Intelligent_tutoring_system
Adaptive Learning 3.0 - Training Industry, accessed October 28, 2025, https://trainingindustry.com/magazine/mar-apr-2019/adaptive-learning-3-0/
Adaptive learning - Wikipedia, accessed October 28, 2025, https://en.wikipedia.org/wiki/Adaptive_learning
(PDF) Systematic review of adaptive learning research designs ..., accessed October 28, 2025, https://www.researchgate.net/publication/342254595_Systematic_review_of_adaptive_learning_research_designs_context_strategies_and_technologies_from_2009_to_2018
Systematic Review of Adaptive Learning Research Designs, Context, Strategies, and Technologies From 2009 to 2018 - ODU Digital Commons, accessed October 28, 2025, https://digitalcommons.odu.edu/context/stemps_fac_pubs/article/1123/viewcontent/Systematic_Review_of_Adaptive_Learning_Research_Designs_Context.pdf
(PDF) Knowledge Tracing: A Survey - ResearchGate, accessed October 28, 2025, https://www.researchgate.net/publication/357953557_Knowledge_Tracing_A_Survey
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning - arXiv, accessed October 28, 2025, https://arxiv.org/html/2505.15607v2
REINFORCED-LLM TUTOR (RLT): A MULTI-AGENT ... - ijarcce, accessed October 28, 2025, https://ijarcce.com/wp-content/uploads/2025/10/IJARCCE.2025.141018-REINFORCED.pdf
Comparing Behavioral Patterns of LLM and Human Tutors: A Population-level Analysis with the CIMA Dataset - ACL Anthology, accessed October 28, 2025, https://aclanthology.org/2025.bea-1.64.pdf
Systematic Review of Adaptive Learning Technology for Learning in Higher Education | Request PDF - ResearchGate, accessed October 28, 2025, https://www.researchgate.net/publication/362134306_Systematic_Review_of_Adaptive_Learning_Technology_for_Learning_in_Higher_Education
PRISMA statement, accessed October 28, 2025, https://www.prisma-statement.org/
Intelligent Tutoring Systems in Mathematics Education: A Systematic Literature Review Using the Substitution, Augmentation, Modification, Redefinition Model - MDPI, accessed October 28, 2025, https://www.mdpi.com/2073-431X/13/10/270
Examining the applications of intelligent tutoring systems in real educational contexts: A systematic literature review from the social experiment perspective - PMC - PubMed Central, accessed October 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9825070/
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges - arXiv, accessed October 28, 2025, https://arxiv.org/html/2507.18882v1
View of How Flexible Is Your Data? A Comparative Analysis of Scoring Methodologies across Learning Platforms in the Context of Group Differentiation, accessed October 28, 2025, https://learning-analytics.info/index.php/JLA/article/view/5109/6092
Engagement detection accuracy on the ASSISTments dataset for all detectors compared., accessed October 28, 2025, https://www.researchgate.net/figure/Engagement-detection-accuracy-on-the-ASSISTments-dataset-for-all-detectors-compared_tbl1_340606912
What Is Adaptive Learning and How Does It Work to Promote Equity In Higher Education?, accessed October 28, 2025, https://www.everylearnereverywhere.org/blog/what-is-adaptive-learning-and-how-does-it-work-to-promote-equity-in-higher-education/
A Systematic Literature Review of Adaptive Learning Systems Based on the Assessment of Collaboration Quality - SciTePress, accessed October 28, 2025, https://www.scitepress.org/Papers/2025/131963/131963.pdf
Adaptive game-based learning in education: a systematic review - Frontiers, accessed October 28, 2025, https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2023.1062350/full
Human-in-the-Loop Systems for Adaptive Learning Using Generative AI - arXiv, accessed October 28, 2025, https://arxiv.org/html/2508.11062v1
CEMCA :: Teacher in the Loop AI National Bootcamp: Teachers' Lay Foundation for AI Integration in Indian Classrooms - Commonwealth Educational Media Centre For Asia, accessed October 28, 2025, https://www.cemca.org/news/teacher-loop-ai-national-bootcamp-teachers%E2%80%99-lay-foundation-ai-integration-indian-classrooms
TEACHER IN THE LOOP - GitHub Pages, accessed October 28, 2025, https://inrialearninglab.github.io/ai4t-embed//module-4-AI-at-our-service-as-teachers/4-1-what-place-for-teachers/4-1-6-teacher-in-the-Loop.en.pdf
Intelligent Tutoring Systems: A Comprehensive Historical Survey with Recent Developments, accessed October 28, 2025, https://www.ijcaonline.org/archives/volume181/number43/alkhatlan-2019-ijca-918451.pdf
Review of Knowledge representation techniques for Intelligent Tutoring Systems, accessed October 28, 2025, https://www.researchgate.net/publication/309704536_Review_of_Knowledge_representation_techniques_for_Intelligent_Tutoring_Systems
(PDF) Intelligent tutoring systems: Architecture and characteristics - ResearchGate, accessed October 28, 2025, https://www.researchgate.net/publication/228921731_Intelligent_tutoring_systems_Architecture_and_characteristics
Intelligent Tutoring Systems | EU-JAMRAI, accessed October 28, 2025, https://eu-jamrai.eu/intelligent-tutoring-systems/
SLPKT: A Novel Simulated Learning Process Model for Knowledge Tracing - KSI Research, accessed October 28, 2025, https://ksiresearch.org/seke/seke23paper/paper049.pdf
FDKT: Towards an Interpretable Deep Knowledge Tracing via Fuzzy Reasoning - Le Wu's Homepage, accessed October 28, 2025, https://le-wu.com/files/Publications/JOURNAL/TOIS-FDKT-feiliu.pdf
Adapting to the Learner Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review - IDA, accessed October 28, 2025, https://www.ida.org/-/media/feature/publications/w/we/welch-award-2017---effectiveness-of-intelligent-tutoring-systems-a-meta-analytic-review/1-effectivenessits.ashx
A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 education - PMC, accessed October 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC12078640/
Effectiveness of Personalized Learning: Statistics on Outcomes in Diverse Educational Settings, accessed October 28, 2025, https://www.matsh.co/en/statistics-on-personalized-learning-effectiveness/
(PDF) The impact of intelligent tutoring systems and artificial ..., accessed October 28, 2025, https://www.researchgate.net/publication/387901054_The_impact_of_intelligent_tutoring_systems_and_artificial_intelligence_on_students'_motivation_and_achievement_in_STEM_education_A_systematic_review
An AI-Powered Evaluation: Understanding which ... - OpenReview, accessed October 28, 2025, https://openreview.net/pdf/f751f4dcf165baf56a8014b70f01084320905891.pdf
Limits to Accuracy: How Well Can We Do at Student Modeling? - Educational Data Mining, accessed October 28, 2025, https://www.educationaldatamining.org/EDM2013/papers/rn_paper_04.pdf
Interpretable Knowledge Tracing via Transformer-Bayesian Hybrid Networks: Learning Temporal Dependencies and Causal Structures in Educational Data - MDPI, accessed October 28, 2025, https://www.mdpi.com/2076-3417/15/17/9605
The Sum is Greater than the Parts: Ensembling Student Knowledge Models in ASSISTments - Penn Center for Learning Analytics, accessed October 28, 2025, https://learninganalytics.upenn.edu/ryanbaker/kddined2011_submission_4.pdf
EdNet: A Large-Scale Hierarchical Dataset in Education - ResearchGate, accessed October 28, 2025, https://www.researchgate.net/publication/337830132_EdNet_A_Large-Scale_Hierarchical_Dataset_in_Education
EdNet: A Large-Scale Hierarchical Dataset in Education - PMC - NIH, accessed October 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC7334672/
EdNet: A Large-Scale Hierarchical Dataset in Education - ResearchGate, accessed October 28, 2025, https://www.researchgate.net/publication/342675710_EdNet_A_Large-Scale_Hierarchical_Dataset_in_Education
[1912.03072] EdNet: A Large-Scale Hierarchical Dataset in Education - ar5iv, accessed October 28, 2025, https://ar5iv.labs.arxiv.org/html/1912.03072
arXiv:1912.03072v3 [cs.CY] 1 Jul 2020, accessed October 28, 2025, https://arxiv.org/pdf/1912.03072
Structure-based Knowledge Tracing: An Influence Propagation View, accessed October 28, 2025, http://home.ustc.edu.cn/~tongsw/files/SKT.pdf
On the Practicality of Differential Privacy for Knowledge Tracing - Educational Data Mining, accessed October 28, 2025, https://educationaldatamining.org/edm2025/proceedings/2025.EDM.poster-demo-papers.289/2025.EDM.poster-demo-papers.289.pdf
AI-powered adaptation of OER in math education - Commonwealth of Learning, accessed October 28, 2025, https://www.col.org/news/ai-powered-adaptation-of-oer-in-math-education/
Teacher in the Loop AI: Teachers' Bootcamp Report - Commonwealth Educational Media Centre For Asia, accessed October 28, 2025, https://www.cemca.org/ckfinder/userfiles/files/TiL-AI-Bootcamp-Report-new.pdf
arxiv.org, accessed October 28, 2025, https://arxiv.org/html/2407.18745v1
[2508.05952] Dean of LLM Tutors: Exploring Comprehensive and Automated Evaluation of LLM-generated Educational Feedback via LLM Feedback Evaluators - arXiv, accessed October 28, 2025, https://arxiv.org/abs/2508.05952
Dean of LLM Tutors: Exploring Comprehensive and Automated Evaluation of LLM-generated Educational Feedback via LLM Feedback Evaluators - arXiv, accessed October 28, 2025, https://arxiv.org/html/2508.05952v1