An augmentation framework for efficiently extracting open educational resources from slideshows

Open education is a way of carrying out education, often using digital technologies. This paper gives a brief background on open education and its constraints. To make courses more accessible, teachers need to have an effective way to preserve and share their materials. We derive the requirement of keeping the choice of tools completely open while we pave the way to a teaching transport format based on open standards allowing for extending existing lecture material with interactive content. A simple learning system that helps to detect and save changes of a slide deck was developed. The linkage of relevant knowledge and interactive context can also be added through this implementation which outputs a final transport format. This provides a good starting point for future work on removing learning barriers and widening access to education.


Introduction
Open education is widely acknowledged as important for successful learning for university students. Research across disciplines has demonstrated that well-designed online learning can enhance students' motivation and improve their learning (Hegeman, 2015;Zheng et al., 2017). Online learning further played a crucial role during the past years with the COVID-19 pandemic and lockdowns, when attending in-person lectures for teachers and students was severely constrained.
From a more general perspective, open education is a modern movement aiming to make access to education easier hoping to stimulate other fields of political and societal whish including lifelong learning (United Nations, 2015;Zawacki-Richter et al., 2020). In the current debate, open education is often reduced to open-access learning over the Internet. We are convinced that this perspective is only partly valid, as open education is not only about openly accessible education over the internet but is supposed to transform existing learning paradigms including frontal teaching and teaching in presence in all educational formats from primary school to high school and university.
One consequence of this perspective is the fact that traditional teaching formats like frontal teaching using a blackboard, slideshow presentations, conversations in class, group work and project work are to be part of open education and we want to make them more visible, as they are the backbone of education for a long time. Most professors in universities and teachers in schools are expected to give classes in presence. For traditional methods of teaching, it is evident that slideshows -despite us knowing that they have limitations -find wide adoption. Unfortunately, transforming such slideshows into interactive online teaching material turns out to be difficult and laborious.
Furthermore, playing out such educational material remains challenging. While software like PowerPoint is routinely used to generate slideshows that often serve their educational purpose quite well, the shareability is limited. Very often, slideshows are converted to PDF to share online and prevent editing, in which animated slides cannot be read very nicely. This paper aims to describe a novel and simple procedure to enable teachers using traditional formats such as a slideshow to progress in opening and enhancing their education towards more aspects of good open education while avoiding a radical change such as changing the underlying toolkit. We comment that a similar framework can be applied to any temporalvisual system, for example, a blackboard lecture could be held once on a digital blackboard and augmented very much along the same lines. Furthermore, we note that the procedure is future-proof in the sense that future updates to the teaching material do not break augmentation and that it does support advanced visual aids such as animations out of the box.

Background
In response to COVID-19 pandemic lockdown measures, higher education like universities closed their premises. Although such institutions were quick to replace face-to-face lectures with online learning, these closures affected learning and examinations as well as raised questions about the value offered by higher education which includes networking and social opportunities. To remain relevant, universities will need to reinvent their learning environments so that digitalization expands and complements student-teacher and other relationships (Schleicher, 2020).
One obvious measure to stay relevant is by adopting state-of-the-art good learning practices throughout the curriculum. Learning resources are mostly part of a curriculum designed to fulfil certain learning goals and have the purpose to satisfy certain learning objectives, and it is therefore essential to achieve a successful learning experience for the students. In the following, we would like to highlight some aspects of good learning, not to provide a theoretical discussion, but interlink with our work.
One aspect is the shift from teacher-centred learning to student-centred learning. While the input slide deck might originate from the former, we see the potential of our tool to help transition the material to the latter in the process by deconstructing, enriching, and augmenting it. Another aspect is competencies-based learning, which can be enhanced to the point of an active learning experience by augmenting an existing slide deck with interactive content. With respect to another aspect, constructive alignment, one could for example augment a slide deck with a self-test.
The main goal is to facilitate the transition to an (inter-) active open online learning experience for students, as it has been shown that a student-centred approach, as well as online learning and active learning, can be beneficial to the learning outcomes of students (U.S. Department of Education, 2009;Hegeman, 2015).

A Simple Learning System Concept
Many learning systems are implemented, designed, and share a large set of features, unfortunately, combined with high complexity in using them all. For example, the online teaching system Moodle has found wide adoption and provides a decent set of features most of which are, however, difficult to implement especially in hybrid courses which shall be taught synchronously in presence and over the internet at the same time. Excellent teaching material is currently non-open and frequently available as slide decks while open education would require a modularized presentation based on open standards supporting interactive contents and linking of open resources. In simple words, an experienced teacher might ask you: "How do I get my slide deck into your system?" We propose to maximize the simplicity of turning existing education into a form in which they can grow towards open education by replacement of problematic sections1 and augmentation within the teacher's tools. While teaching material for teaching in presence requires considering temporal constraints of the teaching situation (e.g., lecture time), open online education should provide a more flexible temporal layout in which the learner can progress self-paced. For building a bridge between existing, high-quality teaching material with a strict temporal organization used in class and flexible open education, we need to define a system that supports segmentation into smaller pieces ("units that can be skipped or replaced with an imported other unit") and augmentation with additional material (background information, additional pieces to consider, challenges, quizzes, tasks, etc.). With many teachers relying on slide deck presentations, an online teaching system for a diverse body of teachers and a diverse audience needs to provide a few key features: • Feature 1: The framework shall enable teachers to convert their existing material into open education material with zero or low complexity in terms of work (time) and learning. • Feature 2: The framework shall enable the sustainable extension of existing material with online-only features such as quizzes and background information linkage. • Feature 3: The augmentation scheme shall not be limited to content, but also include curricular logic by, for example, linking or proposing other (micro-modules) to learn or linking with a knowledge base.
We realize a research prototype of such a system and are aiming at using it in large-scale research studies within the national research data infrastructure NFDI4Earth for our education. Based on the existing teaching material of the partners involved, we want to pave the way to accessible and high-quality online teaching, preferably reusing existing material. Note that this is perfectly in line with the principles of findability, accessibility, interoperability, and reusability (FAIR) which have recently been applied to open education as well. Findability is improved by fragmenting temporal teaching material and augmenting it with metadata, accessibility is provided by open internet access, towards interoperation, we use only industry-standard datatypes (e.g., MPEG video) and community-driven standardized representations (e.g., H5P). And reusability is provided by the editable and extensible nature of our output H5P containers and the explicit integration of all tools that can map the temporal teaching material to a video. While we demonstrate the system on slide decks, note that this can be easily extended and applied even to handwritten text on a digital blackboard.
• Decision 1: For the prototype, we decided to first concentrate on slide decks and rely on the video export features ubiquitously available in slideshow software to bridge existing slideshow technologies. • Decision 2: The linkage of background information must be implemented within the teachers´ toolkit (e.g., PowerPoint) as only in this way future structural modifications like inserting slides do not break the linkage and the improved version can still be edited with the already established software knowledge of the teacher. The transformation procedure is then structured as follows: Starting from slides, we first use the teacher's software to convert the results to video. Opposed to exporting as individual slides, we can perverse all animations and videos that might be embedded in the presentation. Furthermore, presentations on a digital blackboard with a pen can as well be recorded as a video stream while most digital blackboards do not have a suitable concept of a slide that we could use and especially in handwritten lecture notes, the genesis of the material as a video is more valuable than the final page. We then implemented a simple software that decodes the video frame by frame and tracks significant visual changes. For each longer time period without significant visual change, we emit an image slide. For longer periods of change, we emit a slide showing a video segment (e.g., for animations). These sequences are then automatically assembled into an H5P container using image slides and video slides.

Basic Augmentation through Pictograms
Now, you can easily imagine that the simple procedure works well but will have some pitfalls. For example, in animations, it should be possible to generate key frames, that is, we need a mechanism that the teacher selects that a certain visual state within the animation is going to be used for longer time as a slide. Furthermore, you might realize that some parts are good for presence teaching but might not be ideal in online scenarios. For example, a presentation in a school could contain instructions such as "We need your math book today" or university lectures could contain information, especially directed to online participants such as when exactly the lecture starts. These functions are implemented in our framework with a visual language that contains certain pictograms triggering the required behavior in the processing of the video slideshow. Table 1 contains the visual language used for our current version of the prototype. This visual language is sufficiently simple and can be implemented in any slideshow system that is able to show images. In addition, good usability can be provided by extending the toolbar with 325 buttons that insert exactly these images or with a small tool that efficiently provides these images through the clipboard of the operating system.

Icon Condition, Semantics, Action
When this picogram is visible, emit a snapshot of the middle frame of visibility of this pictogram as a still image slide.
Note that this can be used in animations by a sequence of showing followed by removing this pictogram to generate a still image slide within an already existing animation.
While this picogram is visible, no output frames can be emitted or (based on a prerequisite) an error message is shown or the output is blurred significantly and overlaid with a message.
This marks the start of an autoplay section. That is, from the time on, this is visible, we will emit the plain video segment as a video slide.
This allows for supporting pseudo-animations, for example, when multiple slides are used for something that should be an animation. This has often been done to facilitate reasonable PDF export and printing.
This marks the end of an autoplay section.
Note that this does not emit a still frame, hence, the video might be looping or whatever preset we are having. If you want to have the result as a slide, you can combine this with a camera icon.
This icon denotes a chapter start.
In this case, optical character recognition (OCR) can be used to extract the text depicted in the largest font as the title of the chapter and to build a Table-of-Contents using this information.
Such a structure can also be used for skipping forward and backward in long presentations.

Demonstrative Implementation
The proposed system is implemented as a computer vision system based on a rather simple heuristic of change detection. As a first step, we decode the video frame by frame. For each frame, to save computation time and increase stability with respect to video coding artifacts, we perform a resize operation to a size of 256 x 256 pixels, convert the color space to grayscale, and extract edges using the Canny detector with thresholds of 50 and 200. With this representation, we scan through the video and compare each frame with the next frame using a normalized correlation coefficient ( , ) as given in the following formula for two grayscale images ' and '.
For an example slide deck, this leads to a correlation profile between slides as depicted in Figure 1. In this figure, one sees strong peaks away from almost perfect correlation (e.g., R = 1) for each slide change. Hence, for detecting a slide change, we propose to use a threshold of = 0.5 . As a second observation, you see small regular peaks which are related to keyframes of the MPEG codec and providing artifacts. However, you also see that the third slide has some non-trivial correlation. This slide has a small animation (e.g., a small circle moves around). This leads to an overall high correlation as most of each pair of frames is the same, but we need to detect this movement. Therefore, we employ a trick: We incrementally build a sum image of all edge-detections while we do not detect a frame change. And as soon as we get a frame change, the correlation of this sum image with the last image of the previous frame is computed to detect, if there is an animation. Figure 2 depicts this situation for our animation: while almost all frames are very similar to their successor, the sum of all frames clearly depicts the movement of the ball and does not well correlate with any of the frames of the animation. Hence, if this correlation is low enough, we mark the previous range of frames as an animated region that needs to be kept in a video format as opposed to creating a still image representation in H5P.  During the scan, we also employ template matching with the library of defined symbols. For this template matching, we assume that the aspect ratio of the symbols is the same and that the scale is kept similar with the template. Under these assumptions, we can use the same correlation technique (edge detection followed by computing the normalized correlation coefficient), but this time sliding the template over the frame. Varying the frame resolution introduces the required partial scale invariance and symbols can be detected.
This processing stage then provides the following information: the decomposition of the video in slices related to each slide, the information on whether a video slice contains significant animation, and for each frame the presence of any of the symbols from the symbol library. From this information, an H5P container can be generated which contains all the required features.

Conclusion
With the proposed simple method of content augmentation without additional or external toolsets, just where the teacher is used to be, we hope to enable most university teachers to share high-quality interactive content and to improve step by step towards open education.
The framework is implemented and demonstrated for PowerPoint, but the augmentation scheme is general and completely independent of any presentation software or framework, as the functionality is driven by visible elements.
For future work, the complete set of features including PowerPoint animation, voice explanation and interactive materials (reference links and quizzes) should all be embedded into one file. Teachers can then easily share it online within and out of their own universities for open education.