
Y. S. Soekamto et al.: From Queries to Courses: SKYRAG’s Revolution in Learning Path Generation
offering additional insights into its overall quality and
utility.
The main contributions of this work are encapsulated in the
following points:
• We propose SKYRAG, a learning path generation
system that integrates RAG with LLMs. SKYRAG
mitigates hallucination issues in LLMs by grounding
their output in data retrieved from multiple Massive
Open Online Courses (MOOCs). By leveraging diverse
courses, SKYRAG generates personalized learning
paths tailored to individual learner profiles.
• We provide insights into the adoption potential of
SKYRAG by analyzing user attitudes and behaviors
through structured assessments. To gain a deeper
understanding of user adoption, we extend the TAM
by incorporating two additional variables that influence
user attitudes and behaviors.
II. RELATED WORK
A. LANGUAGE MODELS WITH RETRIEVAL-AUGMENTED
GENERATION
Large Language Models (LLMs), like Generative Pre-trained
Transformer(GPT), excel at generating coherent text from
vast datasets [24]. However, despite advancements in their
capabilities, a persistent issue is ‘‘hallucination’’, where the
models produce factually incorrect or misleading informa-
tion. These hallucinations arise from the probabilistic nature
of LLMs, as they rely on statistical associations within their
training data rather than possessing a semantic or contextual
understanding of the content they generate [16], [17], [25].
In essence, the model predicts the next token or sequence
based on learned patterns, which can result in convincing
but erroneous information when relevant context or factual
accuracy is not sufficiently captured during training.
Addressing hallucinations is crucial as LLMs are increas-
ingly applied in sensitive areas like education, healthcare,
law, etc. Factors contributing to hallucinations include biases
in training data and limitations in the model architecture [9],
[10], [26]. Mitigation strategies involve improving training
data quality, refining prompt engineering, and incorporating
external validation mechanisms. Despite these efforts, hallu-
cinations remain a significant challenge, especially in high-
accuracy applications. A promising approach is integrating
retrieval mechanisms, where the model retrieves information
from trusted sources before generating responses, reducing
hallucinations and improving accuracy [27], [28].
The integration of LLMs with Retrieval-Augmented Gen-
eration (RAG) represents a major advancement in addressing
the limitations of traditional LLMs, particularly in reducing
hallucinations. RAG enhances LLMs by incorporating a
retrieval system, allowing the model to pull accurate,
up-to-date information from external sources during text
generation [29], [30], [31], [32], [33]. This not only improves
factual accuracy but also ensures that the generated content is
more contextually relevant to the query. In its standard form,
RAG operates by integrating two components: a retriever,
typically handled by a Dense Passage Retriever (DPR), and
a generator [34]. The retriever matches the input query
with relevant documents, which are then used by the LLM
to generate the final output. There are two configurations
of RAG: RAG-Sequence, where the same document is
used throughout the output, and RAG-Token, which allows
different documents for each token [35].
Advancements in RAG techniques have incorporated
multi-hop retrieval [36], [37], [38] and cross-attention
mechanisms [39], [40], enabling the model to retrieve and
synthesize information from multiple sources for more
complex queries. Multi-hop retrieval allows RAG to chain
together reasoning across documents, while cross-attention
ensures the most relevant information is prioritized in
the final output. These innovations enhance the accuracy
and depth of responses, making RAG more effective in
fields requiring detailed and nuanced information, such as
education and research [41].
Integrating RAG with LLMs presents challenges, partic-
ularly in ensuring the retrieval process is both efficient and
accurate when dealing with large-scale knowledge bases or
databases [27], [28], [42]. The model must also balance
retrieved information with its generative capabilities to main-
tain natural and fluent output. Despite these hurdles, RAG
offers a significant advantage in improving the accuracy and
reliability of LLMs, making them more suitable for domains
where factual correctness is crucial, such as education.
When comparing RAG to traditional LLM fine-tuning,
each has distinct benefits. Fine-tuning adjusts an LLM’s
parameters on specific datasets, improving task-specific
accuracy but requiring significant computational resources
and time [16], [17], [25]. Additionally, fine-tuning may
not fully resolve hallucinations, as the model can still
produce incorrect information based on learned patterns.
In contrast, RAG combines generative capabilities with
real-time retrieval of external data, offering more accurate
and contextually relevant outputs without extensive fine-
tuning. This dynamic approach is particularly beneficial in
scenarios with rapidly changing information or a broad range
of topics, ensuring the generated outputs are accurate and
up to date [17], [27]. This makes RAG a powerful tool for
applications that demand both accuracy and adaptability.
B. LANGUAGE MODELS IN EDUCATION
LLMs are revolutionizing education by providing valuable
tools to assist teachers in creating educational content
and enhancing their professional development. Teachers
can leverage LLMs to generate lesson plans, quizzes, and
curriculum modules tailored to course objectives. LLMs
saves time on routine tasks and allow teachers to focus
on personalized instruction and student engagement [5],
[43]. LLMs also assist with grading and feedback, enabling
teachers to manage large classes more efficiently while
ensuring students receive timely, constructive responses. Fur-
thermore, LLMs support teachers’ professional development
21436 VOLUME 13, 2025