Student Evaluation of Teaching ( SET ) : Clues on how to interpret written feedback

In this paper we present the results of a study covering 218 written comments submitted in the formal university SET questionnaire of two undergraduate physics lectures for engineering students. Concerning the SET-metrics, one of the lectures was rated as critical, while the other lecture had good results. The analysis is based on the praise and criticism framework elaborated by Hyland/Hyland (2001) for written feedback. Our findings, which also relate written feedback to quantitative variables and contrast the results between critical and good evaluations, provide a deeper insight for both, teachers and educational developers, on how to interpret written comments in a quality management process.


Introduction
The feedback from student evaluation of teaching (SET) is a major instrument to measure the degree of faculty achievements.SET is carried out with rating forms, online or paper, that students have to fill in at the end of the term.Typical items of the questionnaire include Likert-scale questions on the teacher performance, the provided material and the class organization (Marsh, 2007).Based on a defined metrics, those items are statistically compiled to measure the teacher's teaching effectiveness.The results of SET are often used to decide on tenure or promotion of the teacher (Kember et al., 2002).Another feature of SET, however, consists of providing feedback to the teacher (Yao&Grady, 2005).To support this formative purpose of the SET, the questionnaire often includes free-text questions, where students can comment on their personal experience in more detail.
Whereas the analysis of SET data mainly focuses on quantitative ratings, little is known about the impact of written feedback from free-text questions.Among the few studies dealing with written comments, Alhija&Fresko (2009) and Brockx et al. (2012) offer some valuable insight into quantitative aspects of free-text comments.Moreover, open-ended comments have been subject to linguistic analysis (Stewart, 2015) and were used for exploratory considerations (Hodges&Stanton, 2006;Stupans et al., 2016).
In this study we combine quantitative results together with lexical evidence in order to provide some interpretative hints, on how to link written comments to the overall questionnaire results.

Data and Coding
The data cover 218 written comments submitted in our formal university SET questionnaires.They result from two independent undergraduate physics lectures (table 1).According to the SET-metrics (based on Likert-scale questions) defined by the university, lecture A is regarded as a good lecture, whereas lecture B was identified as critical.
Written comments are open-ended answers induced at the end of the questionnaire by the item "Imagine that you are the lecturer teaching this course unit.What would you improve?What would you keep unchanged?".Their sole purpose is to provide feedback to the teacher and at ETH Zurich they are not relevant for the SET-metrics.We based our coding scheme (figure 1) on the "Praise and Criticism" feedback points introduced by Hyland&Hyland (2001).Each feedback point is related to one of the 7 predefined content categories and identified either as critique, praise or suggestion.The categories have been selected according to the main themes of the preceding Likert-scale questions of the questionnaire.With an average of 55 words the comments turn out to be

Student Evaluation of Teaching (SET): Clues on how to interpret written feedback
In addition we distinguished praise and criticism to whether it addresses the teacher as a person or the activity of teaching.In order to specify the degree of politeness, we also codified mitigation strategies.Pairing occurs when praise and criticism are used in combination.Hedges refer to the lexical mitigation of any feedback and details were recorded when concrete examples or further details are mentioned to underpin the feedback.
Most of the comments are written in German, but English is used as well.Furthermore, the comments range from single keywords to complex sentences, while using emoticons and special characters.Relying on automated or semi-automated analysis tools, as used in other studies (Stupans et al., 2016;Zaitseva et al., 2013), turned out to be inapplicable.All comments were hand-coded and double-checked for reliability.
Many errors in the script!Clicker questions are helpful and make the somehow dry lecture less monotonous.

A M L pairing hedges
Table 2. Example coding of a comment (3 feedback points).

Critical lectures tend to entail more written feedback.
A total of 423 feedback points could be identified, 158 for lecture A and 265 for lecture B. This results in an average of 1.8 feedback points per comment for lecture A and an average of 2.0 feedback points for lecture B. The overall feedback for the critical lecture B, thus, was significantly more extensive.

Overt criticism is less frequent in good lectures.
In lecture A, overt praise (t+, a+) (n=70) occurred three times as often as overt criticism (t-, a-) (n=23).In contrast, for lecture B the occurrence of overt praise and overt criticism was identical (n=72).

Written Feedback addressing the teacher as a person is primarily positive.
Overt praise referencing the teacher (t+) could be identified in both lectures (A: n=31, B: n=26).Overt criticism (t-) only occurred 4 times in lecture B. Those instances, however, were heavily mitigated.Otherwise they would have classified for offensive comments.Stewart (2015) showed evidence that the praising tendency is generally directed to the teacher's person (t+) and criticism to the product of the teacher's actions (a-).E.g. "The teacher was highly motivated" vs. "The lecture was boring".We could only support these findings for the category L.

Conclusion
Often lecturers feel confused and disappointed when reading students' comments, especially negative ones (Hodges&Stanton, 2007).With our study we offer a framework to interpret comments in the broader context of the evaluation results.Comparing comments from critical and good evaluations turned out be extremely helpful.The fact that even good evaluations show a considerable number of critical feedback points was surprising.Identifying a possible bias in critical evaluations was another revealing finding.Even though the same problem is addressed in several independent comments, this does not a priori point to a major deficiency.Further data will be needed to support our results and we are planning to pursue the study with additional evaluation data sets.

Figure 1 .
Figure 1.Coding scheme.A single comment mostly includes several feedback points.

Figure 2 .
Figure 2. Distribution of the SET-value for lecture B according to the three subpopulations (SPSS Boxplots).

Table 1 . Student evaluations and written comments included in the study. Lectures A and B are independent physics introductory courses for undergraduate engineers.
Schiltz, G.rather extensive and almost all comments include a set of different feedback points (table2).