A study on assessment results in a large scale Flipped Teaching Experience

Universitat Politècnica de Valencia (UPV) has developed a large scale experience in flipped teaching (FT), with 64 different courses and 3083 students (2512 unique). Teachers could decide to partici-pate in the experience on their own, and in quite a number of courses we have groups with FT and groups without it. Assessment of the students was done us-ing classical systems (mostly written exams). Evaluation of the experience was done through several ways: First we did a qualitative survey to teachers and students, and then we carried out an analytical study about the results of the assessments, comparing between years, between FT and classical courses and also internally in the courses with FT and classical groups. Results of this analysis show that students like the FT system and that they got statistically significant better results in the classical assessments, with at least a 5% gain. Also we have no correlation results with the perceived teacher quality and the student group size. So this study allows to verify the capabilities of FT approach in higher educational institutions.


Introduction
Flipped teaching, or flipped classroom is a teaching model defined by a change in the use of class time and out-of-class time, as defined in Abeysekera & Dawson (2015) and Lage & Platt (2000). This model "inverts" in the sense that the activities that were homework are now done in class in the forms of active learn-ing, peer learning and problem solving. To make class time the same, class lectures are delivered through a LMS (Learning Management System) and usually as videos for out-of-class viewing. Students view those videos (and content) corresponding to the classical lecture previously to the class time.
So, as less time is dedicated by the teacher to repeat information he can prove students with more exercises and activities and at the end making active learning possible with a reasonable amount of resources.
Reported benefits of flipped learning model include an increased student satisfac-tion, improved communication skills and consequently, an enhanced learning experience,as can be seen in O'Flaherty & Phillips (2015). We have also studied qualitative effects of FT in our university in Turró et al (2016) and the short answer is that both students are teachers get more involved and satisfied by the use of this teaching style.
However, when teachers develop a flipped teaching experience make big changes in the way they behave and assess the students, with a focus on a more personalized evaluation. While we strongly support that, a criticism from the classical teaching side is how the same students would behave in a classical exam. We only found a little work in that area in a survey of Bishop & Verleger (2013), so to fill the gap in research, this study focuses in the perfor-mance of FT students in classical examinations. More specifically this analysis aims for the following questions: ─ Q1. Is there an effect to students' performance in classical assessments when flipped classroom is adopted as a teaching model? ─ Q2. Do students perceive that the best teachers choose FT?.
This paper is structured as follows: Section 2 will display briefly the Flipped Teaching Initiative in the UPV, and then the data available for this study. Section 3 will elaborate on the data and will provide insigths on the results. Finally section 4 will draw some available conclusions on the proposed questions.

Context and Methods
Universitat Politècnica de Valencia (UPV) has since 2006 an initiative, called Net-worked Teaching aimed to encourage the production of high quality e-learning mate-rials as a companion material for the standard lectures. The idea behind that plan is to find ways to coordinate and produce useful results from all small-scale initiatives from teachers and staff that had been developed in the previous years. A key con-cept in the plan is the integration of the different units of the University in the process. For instance, to create a Video learning object, the IT department should be involved, but also the institute of Education, the Library, and the Legal department shall take part in the definition of the process. At the end all these interactions should be hidden to the teachers, so they find a clear and easy path to produce the content.
While this initiative had a remarkable success as reported in Turró et al. (2014), most of that content was used as a side product in Blended Learning schemes, so in 2013 UPV decided to aim for more active or newer methodologies, like Flipped Teaching and MOOCs.
First we made a pilot test to know what would be the challenges and the results of actually deploying Flipped Learning in a wide scenario of courses. So, for the first semester of the 2014-2015 academic year, a group of students in two faculties (Computer Science and Business) received all their courses with Flipped Learning.
The results of that experience were great in terms of satisfaction of both the stu-dents and teachers, while there wasn`t a significant improvement in the assessment. Those results were considered enough good to continue the project.
So, for the 2015-2016 academic term UPV moved a step forward in applying FL to his courses, by planning a large-scale deployment of more than 100 courses with around 200 teachers involved. Teaching is done on two semesters, and for the first semester 45 courses were flipped. Then the experience has continued during 2016-2017 and 2017-2018.
In our case we define the flipped classroom as an educational technique that con-sists of two parts: computer-based individual instruction before the lecture session and interactive group learning activities inside the classroom in the time that was set up for lecturing in standard courses. It's worth noting that we don't restrict this definition to employ videos as an outside of the classroom activity.

Implementing flipped teaching at UPV
Teachers that apply for the flipped teaching project have learning sessions in which they get the directions to apply FT in their courses. However, while they are encouraged to use videos they are allowed not to do it and rely in more conventional techniques like HTML content on the University's LMS platform or even PDF files.
After those learning sessions some teachers decided not to implement FT due to a variety of reasons (required time or effort, unclear results, other). We selected them as the control group for the experience, because we thought that they are interested teachers, so they are more similar to the group of teachers participating in the FT project.
Courses are from a variety of topics around UPV grades, including Engineering, Computer Science, Business and Arts. They are also in different years of the curricu-la, as we didn't make any restriction on the applying teachers.
While UPV encourages a change in teaching style, assessment of students already relies heavily in classical examinations (written), because is compulsory that a student can pass with one or several written exercises. So the way that the assessment is made for FT courses is very similar to the classical ones.
Students can't decide if they want FT or not: As they are assigned to a group and a teacher they have to and can follow FT methodology only if their teacher is doing it.
Both the written examinations and the student selection process allow us to com-pare FT and Classical teaching. But more than that, it makes possible to have mixed-teaching groups, in which some groups students use FT and other not and they have a common assessment via written exams. Those groups ("mixed groups") will be very valuable for comparing performance.

Research data
The research data comes from the official results of the assessment in the academic year. Assessment results follow Spanish standards, marks being from 0 to 10, being 0 the lowest and 10 the highest. Usually a 5.0 mark is required to pass.
In order to populate the dataset properly we selected out the courses with less than 10 students, or the courses that did FT partially, e.g. only for some months in the semester. We also dropped out the courses that were pilot in the previous academic year.
So the dataset contains data from 64 courses with 7818 non-unique students (4915 unique), that includes both students with FT and without it. Other data like number of groups, students per group, etc. come also from the university's databases.
Also UPV makes an official and compulsory anonymous survey about teaching in which students valuate different aspects. There is a question there that is commonly used as a proxy of the teacher performance: "With all the restrictions in mind, I think that he/she is a good teacher". We will use this data, in a 1-5 likert scale to dig about question 2.

Results
In this section we are going to review the results that we have obtained by analyzing the dataset.

Students with Flipped Teaching get better results in the assessments
In figure 1 we show both a boxplot and a density plot showing the grades of the students. Grades are from 0 to 10, being 0 the lowest.
We can clearly see that there is a positive effect on the FT students. A t-test on the data shows statistical significance (t = 12.308, df = 7183.5, p-value < 2.2e-16).
The difference between both groups is 0.27 standard deviations, which is around a 5% (0.5 points).

In courses with both Classical and Flipped Teaching groups (with the same exam), students with Flipped Teaching perform better
In figure 2 we filter out the results of figure 1 so we only include the courses with mixed groups. Results are very similar to those of figure 1, which is quite remarkable.

Figure 2. Grades for students from mixed courses in Classical and Flipped Teaching
Here the t-test is also significant (t = 7.5595, df = 1126.7, p-value = 8.361e-14) and the effect is of 0.29 standard deviations, which is around 5% also.

Figure 3. Comparison of assessment results between years
A variable that could influence the results of the previous sections is that maybe the this year is in any sense "special". To rule this out we took the results of the same courses from the previous year, which are all in Classical format.
The results display that, for classical teaching, both years are very similar and follows the same density pattern, which support the hypothesis that FT correlates a positive increase in grades. Differences between these two years are not statistically significant.

Teachers choosing Classical teaching are of equivalent quality than those that chose FT
This is a quite interesting topic in which our results maybe are counterintuitive. As can be shown on figure 4, students don't perceive any group of teachers as preferred. An ANOVA test gives F-value 0.92, p= 0.341 which means that the hypothesis can't be rejected with the data.

There is no correlation between the group size, the mean grades for both classical and FT.
In figure 5 we show a scatterplot of the mean grade for all the groups in the dataset, classified by teaching style, and both visually and through an ANOVA test is clear that the null hypothesis can't be rejected.

Conclusions
We have presented the results of an analysis carried out in a large scale flipped teaching experience at Universitat Politècnica de Valencia. The analysis was directed to solve these questions: -Q1. Is there an effect to students' performance in classical assessments when flipped classroom is adopted as a teaching model?
-Q2. Do students perceive that the best teachers choose FT?
The answer of question one is a clear yes, FT students perform better when compared with their classical colleagues. This is a great result for FT, because most of the perceived value of the methodology is directed to skills that don't necessarily show up in a written exam. In our results FT students won clearly this round.
Question 2 was directed to investigate a common criticism for these experience: intuitively we may think that the "best" teachers should choose "more advanced" methodologies, where "best" and "more advanced" are not clearly defined.
Results don't hold up that idea. This can be because of several different reasons, including the capabilities of the students to valuate teacher's quality. Anyway that result is also good in terms of valuating the FT scheme: FT don't need the "best" teachers to perform significant better than our old classical teaching style.