CLEER GenAI PRINCIPLES & CURATION
The Clearinghouse of Engineering Education Resources (CLEER) is a curated online digital repository for high-quality engineering education resources. CLEER resources are curated and embedded using generative AI.
CLEER resources are curated and embedded using GenAI when needed/allowed.
GLOSSARY
Curation
To ensure the quality of resources, included in the database, we run all resources through a generative-ai-powered curation system. This means that we check each resource against several criteria that have been defined as crucial by experts. You can find out more about how experts defined these criteria in the part on the Delphi study.
Generative AI
Generative AI (GenAI), based on Large Language Models (LLM), is a type of artificial intelligence that creates content by learning patterns from vast amount of data on the internet. At CLEER, we use GenAI both for resource curation and semantic embedding for smarter search capabilities. All GenAI models used in CLEER are hosted locally at EPFL and do not send data to the general corpus.
Semantic embedding
Semantic embedding is a mathematical technique in natural language processing (NLP) that converts words, sentences, and entire documents into dense numerical vector representations. By converting both resources and the queries into vectors, we can compare how similar the two are by comparing their vectors. The closer the meanings of the two are, the closer their vectors will be (i.e., if they were the exact same, the semantic search would return the value 1). All semantic embeddings are done using a local GenAI model at EPFL and do not send data to the general corpus.
RESOURCE CURATION
For resource curation, we run
Qwen/Qwen3-235B-A22B
locally on EPFL servers.
The curation criteria (see: curation criteria website) are embedded into the GenAI's system prompt to guide decisions consistently.
All decisions are aligned with expert-authored guidelines and reviewed through quality checks.
SEMANTIC EMBEDDING
For semantic embeddings, we run
Qwen/Qwen3-Embedding-8B-bfloat16
locally on EPFL servers.
For the CLEER search tool, only abstracts are embedded.
Hybrid search blends semantic similarity with precise keyword logic for both recall and precision.
RESOURCE CURATION (EXTENDED INFORMATION)
The GenAI resource curation is ran on all resources that do not come from greenlit journals (see Database Overview). All resources are judged against all curation criteria, example of the code below.
To learn more about the code, please contact one of our researchers.
literature_review_critical: Optional[bool] = Field( default=None, description=""" Does the literature review include a **critical assessment** of **previous empirical studies** (studies based on data)? ----------------------------- DECISION RULES ----------------------------- STEP 1 – Identify if a literature review exists: - Look for an "Introduction", "Background", or "Literature Review" section. - If no such section exists OR no previous research is cited → return False. STEP 2 – Are empirical studies cited? - Empirical studies include surveys, experiments, interviews, observations, or other data-based research. - If only conceptual papers, frameworks, policy reports, or definitions are cited (with no reference to empirical data studies) → return False. STEP 3 – Look for explicit **evaluative commentary** linked to empirical studies: - Trigger phrases include: "however", "although", "few studies", "contradictory", "inconsistent", "limited evidence", "mixed results", "scarce research", "gap in research", "future research needed", "weakness of prior work", "methodological limitation". - The trigger phrase must clearly refer to an **empirical study or group of studies**, for example: - "Previous studies relied only on small samples" - "Findings have been inconsistent across contexts" - "Earlier research used only self-reported data" STEP 4 – Distinguish conceptual context vs empirical critique: - If the evaluative language is applied only to conceptual arguments, definitions, or general context (e.g., "There are many ways to define diversity", "sustainability is complex") → return False. - Only return True if there is at least one explicit judgment of **empirical research methods, findings, or gaps**. STEP 5 – Final decision: - Return True → if there is at least one explicit evaluative statement about empirical studies. - Return False → if prior work is only described or contextualized, or only conceptual/theoretical sources are reviewed, or no literature review is present. Do NOT infer critique based on tone or general importance statements. Only return True when evaluative language is explicitly present **and clearly linked to empirical research**. """, json_schema_extra={ "examples": [ ( # Purely descriptive (False) """ Active learning strategies have been widely used in engineering education. Prince (2004) defines active learning as instructional methods that engage students in the learning process. Freeman et al. (2014) conducted a meta-analysis showing that active learning improves student performance in STEM fields. Studies by Froyd et al. (2013) and Michael (2006) have also documented benefits such as increased engagement and deeper understanding. Several models have been proposed to implement active learning effectively in large classes (Borrego et al., 2018). """, False ), ( # Explicit critique linked to empirical research (True) """ While active learning has been shown to improve outcomes in engineering education (Freeman et al., 2014), much of the evidence relies on quasi-experimental designs, which limit causal claims. For example, Froyd et al. (2013) primarily report student self-reports, raising questions about measurement validity. Although Borrego et al. (2018) propose scalable models, their generalizability across different institutional contexts remains untested. In contrast to these optimistic accounts, Prince (2004) cautions that active learning requires significant instructor preparation and may lead to uneven student participation. Overall, the literature demonstrates benefits but also highlights gaps in understanding how contextual factors influence effectiveness. """, True ), ( # Evaluative but only about concepts (False) """ There is no single agreed definition of diversity in engineering education, and approaches vary widely between institutions. Diversity is inherently complex and multidimensional. """, False ), ( # Trigger words but conceptual (False) """ Although sustainability is often considered essential, there is little consensus on how it should be incorporated into curriculum models. """, False ), ( # Explicit empirical gap (True) """ Few studies have examined how peer instruction affects confidence in minority-serving institutions, and those that exist use small sample sizes (Smith et al., 2018; Lee, 2019). Findings are inconsistent across contexts, suggesting that more diverse settings need to be investigated. """, True ), ], "disclaimer_message": "Literature review might be biased" } )
CATEGORY DEFINITIONS
- Quantitative:
- Has original numeric data.
- Explicitly states a research question or hypothesis.
- Uses formal statistical analysis (e.g., t-tests, ANOVA, regression, correlation, chi-square).
- Typically based on surveys, tests, experiments, or numerical datasets.
- Descriptive statistics alone (e.g., percentages, averages) do NOT qualify unless linked to hypothesis testing.
- Qualitative:
- Has original non-numeric data (e.g., interviews, focus groups, observations).
- Uses qualitative analysis methods (e.g., coding, thematic analysis, grounded theory).
- Explicitly states a research question or purpose.
- No statistical hypothesis testing.
On average, the curation results are stable at around 89% (which means that when curating same resources several times in a row, 89% of results stay the same).
CURATION MATRIX
The curation results are then compared to the curation matrix (see below) that defines whether a resource is included, included with disclaimers or excluded from the CLEER database.
| Quantitative | Qualitative | Mixed methods | Case study | Design-based research | Systematic review | Narrative review | Conceptual paper | Practice paper | Workshop | Editorial | |
|---|---|---|---|---|---|---|---|---|---|---|
| engineering_education_context_present | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude |
| ethics_consideration_present | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | No action | No action | Exclude |
| theoretical_framework_present | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Exclude |
| literature_review_section_present | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Exclude |
| literature_review_critical | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Exclude |
| references_number: <5 references | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | No action | No action | Exclude |
| references_number: <10 references | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| references_newest: publication year - 10 | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Exclude |
| references_oldest: >publication year - 10 | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Exclude |
| research_question | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | Exclude |
| methodology_section_present | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | Exclude |
| methodology_research_design | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| methodology_context | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| methodology_instruments | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| methodology_matches_research_question | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| methodological_consistency | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | No action | No action | Exclude |
| limitations | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| results_quantitative_present | Exclude | No action | Exclude | Disclaimer | Disclaimer | No action | No action | No action | No action | Exclude |
| results_qualitative_present | No action | Exclude | Disclaimer | Disclaimer | Disclaimer | No action | No action | No action | No action | Exclude |
| results_transparent_reporting | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | No action | No action | Exclude |
| discussion_present | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | Exclude |
| discussion_connected_to_literature | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
| discussion_RQ | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | Exclude | No action | No action | Exclude |
| discussion_overgeneralization | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | Disclaimer | No action | No action | Exclude |
DISCLAIMERS
Resources are only included in the CLEER database if none of the criteria returns "exclude". Different criteria are valid for different types of resources (e.g., case study, qualitative research study, practice paper...). Editorials and workshop proposals are, by default, not included in the database. All disclaimers are noted in the returned result (hover over "Disclaimers"). The disclaimers are grouped as follows:
Limited theoretical framework:
limited_theoretical_framework: The theoretical framework is either not mentioned or not well described.
Literature review might be biased:
literature_review_section_present: There is no literature review section.literature_review_critical: The literature review is critical, not just listing references.references_number (<10 references): There are less than 10 references.references_newest (less than 10 years old at the date of the resource publication): There are no recent references.references_oldest (older than 10 years old at the date of te resource publication): There are no old seminal references.
Limited methodology description:
methodology_research_design: The research design is vaguely/not described.methodology_context: The context of the study is missing or incomplete (e.g., population description, course description).methodology_instruments: The instruments are either not described or vaguely described.
Ill-fitting methodology:
methodology_matches_research_question: The methodology is not suitable to answer the research question.
Limitations not considered:
limitations: The resource does not discuss potential limitations.
Limited data representation:
results_qualitative_present: the resource does not present qualitative data (only valid for qualitative/mixed methods)results_quantitative_present: the resource does not present quantitative data (only valid for quantitative/mixed methods)
Limited discussion:
discussion_connected_to_literature: The discussion is not connected to literature.discussion_connected_to_rq: The discussion does not connect results to research questions.