Education Science Students’ Statistics Anxiety: Developing and Analyzing a Scale for Measuring their Worry, Avoidance, and Emotionality Cognitions

Current instruments for assessing university students’ statistics anxiety prevailingly emphasize the affective construct component. In order to unfold the construct in a more exhaustive and differentiated manner, a scale for measuring university students’ worry, avoidance, and emotionality cognitions was developed. In two samples of education science majors the present pilot study aimed at analyzing the scale’s psychometric properties and at gaining preliminary validation results. Principal component analyses led to the formation of a unidimensional scale which appeared to be sufficiently reliable. Its relations to domain-specific self-belief and background variables turned out as theoretically expected – thus, for the time being the scale should claim criterion validity. 


Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety
250 uite a number of undergraduate and graduate students of the social sciences, education, psychology and business appear to struggle with statistics (Onwuegbuzie & Wilson, 2003;Zeidner, 1991). When dealing with the requirements of quantitative method courses which commonly are compulsory for earning their degree, these students mostly suffer from strong failure expectations and frequently experience feelings of apprehension and personal threat. As a result, they are at risk to develop and maintain a heightened level of anxiety in the face of statistical analysesin particular, when being confronted with statistical tasks of data gathering, processing, and interpreting (Cruise, Cash, & Bolton, 1985).
Structurally, students' emerging statistics anxiety has to be considered a multidimensional construct reflecting the complex interplay of several cognitive, motivational, and physiological components (Rost & Schermer, 1989). Based on empirical findings of test anxiety research it can be defined as a domain-specific form of performance or evaluation anxiety which manifests as repeatedly occurring worry cognitions, task-irrelevant and interfering thoughts, marked states of emotional tension and physiological arousal (Zeidner, 1991). The worry component of test anxiety refers to the students' mental anticipation of failure and its negative consequences, whereas the emotionality component refers to their feelings of tenseness, nervousness or distress, and the physiological component refers to their perceptions of bodily symptoms. Over the past decades, there has been ample evidence for these components being distinguishable but mutually reinforcing (Deffenbacher, 1980;Hodapp & Benson, 1997;Kieffer & Reese, 2009;Sarason, 1984). In most cases they could be demonstrated to negatively affect the students' learning process and achievement outcomes. However, the worry component generally turned out to most strongly predict academic performance or test results (Cassady & Johnson, 2002;von der Embse, Jester, Roy, & Post, 2018). This debilitating effect of worry cognitions appeared to be mainly caused by their strongly biased and taskirrelevant mode of information processing (Schwarzer, 1996;Zeidner, 1998).
Based on this conclusive body of evidence a theoretically and methodologically sound framework for the statistics anxiety construct Q 7(3) 251 should implicitly consider its cognitive, emotional, and physiological components. In particular, it is essential that it addresses the issue of relevant worry cognitions as they are closely linked to the debilitating effects of statistics anxiety on statistical learning and performance. To date, however, most empirical analyses in the field have emphasized the emotional and physiological components of statistics anxiety (Cruise et al., 1985). They define the construct as feelings of anxiety or as habitual anxiety in the face of statistically loaded situations Onwuegbuzie & Wilson, 2003). Admittedly, the measurement items used in these empirical analyses cover a wide range of statistical tasks that students typically encounter in everyday study or the context of their course. These items can be empirically assigned to various situation-or task-specific dimensions. Thus, for instance, factor analyses of the task-and courserelated items of Zeidner's (1991) Statistics Anxiety Inventory led to the development of a content-and a test-specific subscale. Likewise, factor analyses of the widely used Statistics Anxiety Rating Scale (Cruise et al., 1985) and the conceptually related Statistical Anxiety Scale (Vigil-Colet, Lorenzo-Seva, & Condon, 2008) provided separate subcomponents concerning the students' interpretation and test anxiety, their fear of asking for help and fear of statistics teachers. Similarly, the Statistics Anxiety Measure (Earp, 2007) and the Statistics Comprehensive Anxiety Response Evaluation (Griffith et al., 2014) revealed some distinct task-or situationspecific subcomponents referring to statistically relevant course requirements and situations. In summary, these approaches to measuring the construct of statistics anxiety undoubtedly represent typically anxietyevoking task features and test situations in a most elaborate way. However, with the exception of the Statistics Anxiety Measure (Earp, 2007) and the short research scale Hong and Karstensson (2002) used, both of which include single worry items, it is notable that all other instruments fail to integrate students' cognitive, emotional, and physiological anxiety reactions, in particular with regard to the critical worry component.
In contrast, a concurrently operating research line in the test anxiety field had already decomposed the statistics anxiety construct and assessed the students' worry and emotionality responses separately. However, in most cases a composite score including both components was used because of both components' high interrelation (Benson, Bandalos, & Hutchinson, 1994;Finney & Schraw, 2003;González, Rodrígez, Faílde, & Carrera, 2016;Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety 252 Hong & Carstensson, 2002). That way, albeit merely having total anxiety scores available, the interpretation of students' responses explicitly allowed for a traceable cognitive perspective. As this research line essentially contributes to refine the statistics anxiety construct with respect to its motivationally operating components, its task-or situation-specific references appear less differentiated. That is, in all studies the items for assessing statistics anxiety referred exclusively to the taking of a statistical test or exam. This contextual limitation, hence, should challenge the representativity or content validity of the worry and emotionality measures, because their scores account only for a particular part of the relevant learning setting (Haynes, Richard, & Kubany, 1995).
Moreover, another issue crucial to the conceptualization of the statistics anxiety construct refers to the role of the students' avoidance tendencies. Already in the very first beginning of empirical test anxiety research, Mandler and Sarason (1952) posited anxiety responses to manifest as "implicit attempts at leaving the test situation" (p. 166). Subsequently, empirical findings lent support for this assumption and yielded sound evidence for students' avoidance cognitions being substantially related to their anxiety responses (Blankenstein, Flett, & Watson, 1992;Galassi, Frierson, & Sharer, 1981;Hagtvet & Benson, 1997;Skaalvik, 1997). In particular, the analyses of Elliot and McGregor (1999), Pekrun, Elliot andMaier (2009), andPutwain andSymes (2012) demonstrated clearly that students with heightened avoidance orientations reported a higher extent of worry cognitions and lower scores on subsequent exam performance. Heretofore, only few conceptualizations of the test anxiety construct had addressed this issue and claimed the students' escape or avoidance cognitions to constitute an essential part of their worries and to represent an important factor to elicit interfering, task-irrelevant thoughts (Pekrun, Goetz, Perry, Kramer, Hochstadt, & Molfenter, 2004;Schwarzer & Quast, 1985). Accordingly, further research should develop appropriate measurements being designed not only to assess the students' worries about threatening failure outcomes, but also to inquire their thoughts to preferably avoid getting involved with threatening tasks or situations. Currently available questionnaires for measuring students' statistics anxiety either do not consider their avoidance cognitions at all (Griffith et al., 2014;Onwuegbuzie & Wilson, 2003;Vigil-Colet et al., 2008) or include just a single item assessing avoidance behavior (Earp, 2007). Research in the field should refine its conceptualization of students' worry responses and develop instruments that explicitly capture avoidance cognitions with respect to statistically loaded tasks and situations. That way, an important step to refine the substantive and structural stage of construct validation would be done (Benson, 1998).

Approaching refined measurement
To overcome the conceptual limitations of current instruments as a very first attempt a new scale to assess university students' statistics anxiety was developed. Nevertheless, it should adopt the particular strengths of existing instruments. Thus, it was assigned to approach a refined measure of the construct by meeting the following criteria: Its items should (1) specifically take into account the students' anxiety responses in a most differentiable way and, hence, consider their worry and avoidance cognitions as well as their emotional reactions. Thereby, its items should embed the various anxiety reactions (2) into a representative range of statistically loaded task features and course situations the students would typically encounter.
The construction of this scale for measuring university students' "Worry, Avoidance, and Emotionality Cognitions Encountering Statistical Demands" (WAESTA) largely followed the procedure of facet theory using a mapping sentence (Guttman & Greenbaum, 1998). This mapping sentence served as a heuristic device to cover all major facets of the construct and, thus, to achieve sufficient content validity (Edmundson, Koch, & Silverman, 1993). A pool of eligible items was drawn up using this constitutive mapping sentence which included conceptually relevant anxiety components, situational references, and intended response categories (Zeidner, 1998). In particular, each item to represent the statistics anxiety domain was specified with respect to four key facets: a relevant reaction facet referring to the worry, avoidance, and emotionality component, and three contextual facets referring to the (1) outcome in a statistics exam, the (2) individual learning of statistical procedures and handling of statistical demands, and the (3) public mastering of statistical content. Furthermore, an additional range facet defined the response categories to assess the students' perceived magnitude of individual anxiety reactions. As seemingly appropriate response range a Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety 254 four-point format was decidedin order to avoid artificial complexities in the respondents' decision making but instead to ensure a cognitivemotivationally realistic as well as just manageable number of rating references. This four-facet mapping sentence was used as a crossclassification template to systematically operationalize the statistics anxiety construct as it allowed to operationalize the various elements of facets in a most differentiated manner (Hox, 1997). That way, a final scale version with 17 four-point Likert-type rating items was built (Table 1). Sample item and response range: I would hardly be able to present a report on statistical research findings adequately.
Does not apply 1 2 3 4 Applies in full All items concerned a mentally imaginable situation the students should easily manage to anticipate. Eight items referred to the students' worries about their potentially expected failure to master the course exam and to cope with several statistical requirements. Four items concerned their cognitions to preferably avoid the statistics course and particular statistical demands. Five items are related to their emotional tension when being confronted with a certain statistical task (Appendix).
As statistically indicated task requirements for the understanding of course contents, the interpretation of quantitative research results as well as the application of statistical formulas and procedures were considered. Likewise, the oral presentation and explanation of statistical content in the public course situation was included. To warrant conceptual clarity, the avoidance items should neither tap the students' avoidance reactions by suppressing or substituting individually occurring threat cognitions (Williams, 2015), nor should they refer to the students' actual avoiding strategies to cope with disliked or threatening academic events (Onwuegbuzie, 2004). Rather they should operationalize the students' mentally processed avoidance thoughts or even escape illusions before or during statistical task completionand, thus, might indicate a specific subcomponent of worry cognitions. Moreover, the component of physiological symptoms or bodily tensions was not explicitly addressed. Instead it was thought to be indirectly inferred from the emotionality items. Conceptually, this restriction seemed to be justifiable, since the students' perceived affective state should always reflect their actual physiological arousal (Zeidner, 1998).

Validation framework and objectives
As test anxiety is assumed to be a multifaceted construct (Zeidner, 1998), first of all, the factor structure of the WAESTA scale should be analyzed. As the WAESTA items were theoretically designated to represent each the worry, avoidance, and emotionality component of the statistics anxiety construct, a clear three factor solution appeared to be expectable, at best. However, relevant research findings in the test anxiety field had demonstrated these components being substantially correlated (Deffenbacher, 1980;Cassady & Johnson, 2002;Hodapp & Benson, 1997;Hong & Karstensson, 2002;Sarason, 1984). Furthermore, in certain research contexts dealing with school students' domainor subject-specific test anxieties, all worry, avoidance, and emotionally items repeatedly loaded on one common anxiety factor (Faber, 1995(Faber, , 2012b. Therefore, an accurate prediction of the scale's ultimate factor structure seemed difficult. Rather the present study should explore the scale's underlying structure in a most tentative wayand should, thus, take into account three alternatives: a three factor solution separating the worry, avoidance, and emotionality components, a two factor solution with the worry and avoidance items loading on a first factor and the emotionality items loading on a second factor, and a one factor solution subsuming all items. Presuming the avoidance component to represent a specific worry element, the two factor solution could definitely reveal a reasonable perspective, in particular. With the reservation of this initially performed analysis, the final version of the WAESTA scale should be determined and its psychometric properties examined. As relevant cognitive-motivational constructs academic competence and control beliefs were assessed (Schunk & Zimmerman, 2006) which are well proven to regulate students' engagement and learning approach in the long term. In particular, they essentially affect the students' anxiety experience. As unfavorable competence beliefs usually come along with increased expectancies of failure, they will provoke a strong sense of personal threat and, thus, lead to an individually heightened level of anxiety. As the students, likewise, are not (or not anymore) able to realize individually feasible perspectives to prevent a certain failure outcome, they will develop reduced control beliefs which all the more strengthen their feelings of threat and anxiety.
There is sound evidence that domain-specific academic competence beliefs or self-concepts substantially predict the individually existing magnitude of test anxiety (Ahmed, Minnaert, Kuyper, & van der Werf, 2012;Goetz, Pekrun, Hall, & Haag, 2006). Correspondingly, in the statistics domain the crucial role of self-concepts had been well established. In most cases, high-anxious students reported a lowered self-concept of own mathematics or statistical competencies (Bandalos et al., 1995;Benson, 1989;González et al., 2016;Macher et al., 2012;Williams, 2014;Zeidner, 1991). Therefore, it should be assumed the WAESTA scale scores to correlate negatively and substantially with the students' mathematics selfconcept. As well, to sufficiently clarify the domain-specificity of the WAESTA scale, its relation to the students' verbal self-concept should be concurrently analyzed. According to the multidimensional feature of academic self-beliefs (Green, Martin, & Marsh, 2007), research in the field could consistently demonstrate the students' mathematics anxiety being substantially related to their performance and motivation in the mathematics but not in the verbal domain (Goetz, Frenzel, Pekrun, Hall, & Lüdtke, 2007;Gogol, Brunner, Preckel, Goetz, & Martin, 2016). From this validation perspective, the WAESTA scale should claim preliminary subject-specificity if its correlation with the verbal self-concept variable would turn out to be distinctly weaker than with the mathematics self-concept variable.
Besides, the students' control beliefs largely manifest as implicit theories or mindsets which may stress an entity view of more or less fixed and unchangeable abilitiesor an incremental view of more or less modifiable and changeable abilities (Dweck & Leggett, 1988). They could be demonstrated to significantly affect the students' motivational orientations, learning strategies, and, eventually, their task performance (Blackwell, Trzesniewski, & Dweck, 2007;Burnette, O'Boyle, VanEpps, Pollack, & Finkel, 2013;Cury, DaFonseca, Zahn, & Elliot, 2008). These implicit theories principally might not only concern an individual's cognitive ability but also might emerge in a domain-specific manner and refer to the perceived malleability of certain skills or competencies (Dweck & Molden, 2005). Consequently, they should also play a motivationally crucial role in the students' learning of statistics. With respect to the statistics domain, an entity view of own competencies would diminish or even suspend any control perspective. Unfortunately, previous studies in the field had seldom analyzed the role of implicit theories. If at all, they had referred to the students' general ability beliefs, but not to specific beliefs about statistical competencies (Zonnefeld, 2015) or had only considered the students' beliefs to master statistical demands through strategy use and effortful behavior (Schutz, Drogosz, White, & Distefano, 1998)thus, reflecting an incremental view of learning approach. However, these learning control beliefs could be demonstrated to significantly predict course grades. Accordingly, against the background of research findings the WAESTA scale scores should be reasonably assumed to correlate positively and substantially with the students' entity view of less or not malleable statistical competence.
Furthermore, as another motivational criterion variable, the students' task values were considered. Conceptually, from an expectancy-value perspective on achievement motivation task values concern the students' perceived importance or adequacy of a certain activity to fulfill their personal needs and to attain their personal goals (Eccles & Wigfield, 2002). In particular, Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety 258 these task values had been demonstrated to regulate the students' motivational orientations, learning strategies, and academic choices (Wigfield, Hoa, & Klauda, 2009). Task values evidently emerge in a task-or at least domain-specific manner (Gaspard, Häfner, Parrisius, Trautwein, & Nagengast, 2017;Selkirk, Bouchey, & Eccles, 2011). Thus, they should characteristically affect the students' learning and performance in the statistics domain as well. Previous research in the field had primarily analyzed the students' perceived utility or worth of statistical knowledge and competencies as an attitudinal construct (Nolan, Beran, & Hecker, 2012) focusing on the usefulness of statistics in personal and professional contexts (Cruise et al., 1985;Dauphinee, Schau, & Stevens, 1997). The students' ratings of the worth of statistics appeared to positively correlate with their statistical achievement as well as with their learning strategies to a slight extent only (Emmioǧlu & Capa-Aydin, 2012). In comparison, the relations of utility perceptions with domain-specific measures of academic self-beliefs were stronger. Students with low competence beliefs valued statistics as less important (Baloğlu, 2002;Chiesi & Primi, 2009;Dauphinee et al., 1997;Vanhoof, Kuppens, Sotos, Verschaffel, & Onghena, 2011). Similarly, students' statistics anxiety was also moderately correlated with their utility ratingsindicating those students suffering from a heightened level of statistics anxiety tendentially perceived statistics to a lesser extent as useful (Baloğlu, 2002;Chew & Dillon, 2014;Nasser, 2004;Papanastasiou & Zembylas, 2008;Papousek et al., 2012). Accordingly, the WAESTA scale scores should be assumed to correlate inversely and substantially with the students' perceived value of statistical competence.
As a relevant background variable to explain the students' statistical selfbeliefs and competencies their prior mathematical learning, in particular their latest school grade had been well proven. From the perspective of selfconcept development (Marsh & O'Mara, 2008), previous failure experience in mathematics will evidently lead to form low competence beliefs in the statistics domain and, eventually, contribute to strengthening the emergence of domain-specific anxiety responses. In various studies students with poor school grades in mathematics reported a heightened level of statistics anxiety (Beurze, Donders, Zielhuis, de Vegt, & Verbeek, 2013;Birenbaum & Eylath, 1994;Chiesi & Primi, 2010;Lalonde & Gardner, 1993). Accordingly, the WAESTA scale scores should be assumed to correlate negatively, but low in magnitude with the students' mathematics grade they had last earned at school.

Participants and procedure
In the present study the data of both a construction sample and a validation sample were analyzed. The construction sample consisted of 113 graduate students (n = 94 females, n = 19 males) from a German university Master's course in educational sciences (n = 80) and special education (n = 33). They all were enrolled in a compulsory course on empirical research methods. Therefore, the participation rate was sufficiently high at 82 per cent. Seventy-four of the students had already acquired elementary statistical knowledge during their first degree, whereas 39 were required to attend a course in basic descriptive and inferential statistics. Both the subgroup with and without statistical knowledge did not significantly differ with respect to gender (chi-square test, p > .05) and age (Mann-Whitney U-test, p > .05). Also, there was no significant difference of gender ratio within each subgroup (binomial test, p > .05).
The validation sample was thought to scrutinize the findings from the construction sample one year later. It consisted of 87 graduate students from the same Master's courses: educational sciences (n = 59) and special education (n = 28). The sample was predominantly female (n = 74). As with the construction sample all the students were enrolled on a compulsory course on empirical research methods. The participation rate was rather high at 89 per cent. Fifty of the students had acquired statistical knowledge during their first degree whereas 37 had to attend an introductory statistics course. Once again there were no subgroup differences in gender, gender ratio, or age (Stappert, 2017).
In both samples all relevant data concerning the self-belief and background variables under consideration were gathered on the course's first term. For that purpose, a questionnaire including all items to measure the students' self-concept, statistics anxiety, implicit theories, task values, and relevant background information was administered. To prevent a priming Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety 260 effect of the self-concept items, they were presented at the end of the questionnaire.
Both samples had missing data (5.7% and 7.3%). As they did not produce any systematic pattern in the construction (MCAR test p = .182) and in the validation sample (MCAR test p = .178), they were treated as "missing completely at random" (Little, 1988). The missing values were estimated by means of the two-step iterative expectation-maximization algorithm (Graham, 2012).

Measures
Students' academic self-concepts in mathematics and language (German) were assessed using nine six-point rating items for each subject. These items referred to the students' most recent learning experiences at school and addressed their competence beliefs with regard to meeting subject-specific demands. The wording of the items was strictly parallel. In the majority, the items originate from well proven instruments (Faber, 2012a;Möller, Streblow, Pohlmann, & Köller, 2006;Rost, Sparfeldt, & Schilling, 2007). For the purpose of this study they were adapted and phrased retrospectively. Sample item: "I tried hard in mathematics/German, but I did not perform very well." Principal component analysis (with varimax rotation) revealed a two-factor solution allowing for a clear distinction between the subjectspecific self-concept facets. Hence, it was possible to build two scales for measuring the subject-specific academic self-concepts. Their reliability was most appropriate for both the mathematics and the language self-concept scale (Table 2). High scale scores indicated the students' competence beliefs being positive. According to the multifaceted feature of the construct, the self-concept variables appeared to lowly correlate in the construction sample (r = .14, p > .05) and in the validation sample (r = -.05, p > .05).
To assess students' implicit theory of statistical competencies, a short scale with five four-point rating items was administered in the construction sample. As current instruments in the field only allowed for measuring the students' implicit intelligence theory (İlhan & Cetin, 2013;Kooken, Welsh, McCoach, Johnston-Wilder, & Lee, 2016), a new scale was created. Following the recommendations of Hong, Chiu, Dweck, Lin and Wan (1999), all items tapped an entity view of personal statistical competence. Unfortunately, due to their insufficient item-test correlation (rit < .22) two items had been deleted. With an average item intercorrelation of Fisher's z' = .44 and an average item-test correlation of Fisher's z' = .52 the final scale's reliability appeared to be just acceptable. Sample item: "To work with statistics, you need a talent that I simply do not have." High scale scores indicated the students to perceive their statistical ability being fixed, hence as less malleable in nature. In the validation sample, a slightly revised scale with four four-point Likert items was used (Stappert, 2017). In view of sample size and item number its reliability appeared to be sufficient (Table  2).

Descriptive statistics and reliabilities of the scales for measuring validation variables
Significance: *p ≤ .05, ***p ≤ .001 AM = arithmetic mean, SD = standard deviation, zS = z-standardized skewness, zK = z-standardized kurtosis, α = internal consistency (Cronbach's coefficient alpha) The utility value students attributed to statistical competence was measured by means of a short scale. In the case of the construction sample it consisted of five four-point rating items dealing with the perceived utility of statistics for the students' current studies and intended career. Sample item: "Statistics will not play an important role in my future professional life". With an average item intercorrelation of Fisher's z' = .40 and an average item-test correlation of Fisher's z' = .49 the scale's reliability was just acceptable (Table 2). High scale scores indicated the students to consider statistics being less important. In the validation sample, an extended version of the scale was used. It consisted of eight four-point Likert items. Its reliability was once more just acceptable (Table 2). Here again, high sum scores indicated students to perceive statistics as being less useful for their current studies and later professional development (Eichhorn, 2018).
Finally, as relevant background variable the students' most recent school grade in mathematics was inquired in both samples.

Scale formation
For determining the final version of the WAESTA scale in the construction sample, first of all, descriptive item statistics were calculated. The avoidance item 04 as well as the worry item 11 showed a significant negative skew indicating most students to agree with the statementsin detail they would preferably give a presentation without any statistical content and during a presentation they would strongly hope not being asked statistical questions. Furthermore, as the analysis revealed a significant negative kurtosis score for the items 03, 06, 14, and 15, their distribution appeared to be platykurtic. Accordingly, the students' relevant item responses denoted a heightened variance or difference among them (Table 3).
In the construction sample, the Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett's test of sphericity demonstrated the inter-item correlations being appropriately strong (KMO = .878, BTS p < .001). Therefore, a principal component analysis (PCA) was conducted in order to clarify the latent scale structure. However, it could not statistically separate the three anxiety components. Neither a varimax nor an oblique rotation procedure could yield any loading pattern to separate the worry, avoidance, and emotionality items in a conceptually proper way. Rather all analyses led to a unidimensional structure (Table 3). This solution revealed sufficiently high factor loadings and could explain 43.59 per cent of extracted variance. Though for further clarification three provisional subscales representing the students' worry, emotionality, and avoidance cognitions were formed and the relations among their sum scores examined. In line with the PCA result, the subscales were strongly correlatedin particular, the worry with the emotionality scale r = .80 and with the avoidance scale r = .72, the emotionality with the avoidance scale r = .69 (all p < .001). Consequently, all WAESTA items could be used to build the scale's final version. For its total score, neither the z-standardized scores of skewness and kurtosis nor the Shapiro Wilk W-test (W = .988, df = 113, p = .443) could evince any significant deviation from the normal distribution assumption. High total scores indicated the students' to report stronger worry, avoidance, and emotionality cognitions. The scale's reliability was estimated in various ways and turned out to be adequate: Its internal consistency (Cronbach's coefficient alpha) amounted to α = .92, its split-half reliability (odd-even method using Spearman-Brown correction) to r12 = .89, and its standard error (based on coefficient alpha) was se = 2.67.  Significance: *p ≤ .05, **p ≤ .01 AM = arithmetic mean, SD = standard deviation, zS = z-standardized skewness, zK = z-standardized kurtosis, a = factor loading, rit = corrected item-test correlation In spite of the small number of participants in the validation sample, a principal component analysis (PCA) of the WAESTA items was conducted. As their communalities ranged from h = .412 to h = .660 and the Kaiser-Meyer-Olkins measure revealed an appropriately high score (KMO = .901), this procedure appeared to be most reasonable (de Winter, Dodou, & Wieringa, 2009;MacCallum, Widaman, Zhang, & Hong, 1999). The results revealed one common factor with considerably high loadings (ranging from a = .532 to a = .804). Accordingly, the unidimensional scale feature could be fully replicated and explained 50.66 per cent of extracted variance. For its total sum score, z-standardized skewness (zS = -0.330) and kurtosis values (zK = -0.190) did not indicate any significant deviation from the normal distribution assumption. Here again, the scale's reliability was estimated in various ways and turned out to be adequate: its internal consistency (Cronbach's coefficient alpha) amounted to α = .94, its split-half reliability (odd-even method using Spearman-Brown correction) to r12 = .91, and its standard error (based on coefficient alpha) was se = 2.45.

Validation results
As an initial approach to analyze the external validity of the WAESTA scale, its zero-order correlations with the criterion variables under consideration were first analyzed. As the results could demonstrate (Table 4), the WAESTA scores were closely and significantly associated with the students' mathematics self-concept but not with their language self-concept. This particular finding might be considered to provide preliminary evidence that the WAESTA scale measures rather a domain-specific than a general facet of the students' test anxiety experience. Furthermore, the scale scores were most strongly correlated with the students' entity views of own statistical competence. A heightened level of statistics anxiety came along with a deep understanding of own statistical competencies being less or even not malleable in nature. With the negative instrumental value of statistics, the WAESTA sum score correlated moderately positive. Students reporting a higher level of statistics anxiety tendentially perceived statistical competencies as less important. Finally, the relation between the WAESTA score and the most recent school grade in mathematics appeared to be positive and significant, though low in magnitude. Hence, students with a heightened level of statistics anxiety had been less successful in the mastery of mathematical demands at school.
To get more differentiated validation results, a series of regression analyses with the WAESTA scale as dependent variable were computed for both samples. As this procedure allowed for controlling the covariations among all predictor variables with respect to their empirical overlap and multicollinearity, it should help to unravel the complexity of construct relations. In particular, a sequence of regression models including an advancing number of predictor variables was consecutively tested (Table 5). In both samples the standardized residuals of WAESTA sum scores did not violate the normal distribution assumption. In each case, the Shapiro Wilk W-test could demonstrate the standardized residuals being normally distributed (construction sample: W = 986, df = 113, p = .308, validation sample: W = 979, df = 87, p = .182).
The results for regression model A clearly demonstrated the mathematics self-concept to explain the most part of anxiety variance. However, adding the entity beliefs to the regression equation in model B and C, the predictive power of the students' mathematics self-concept was reduced to a minimal and insignificant extent. Instead, the students' entity beliefs largely contributed to the WAESTA sum score. As the mathematics self-concept and the entity belief variable in both samples were substantially correlated (r = -.51 in the construction sample and r = -.45 in the validation sample), but the entity belief variable in both samples was more strongly related to the anxiety variable (r = .63 in the construction sample and r = .80 in the validation sample)the massive decline in the self-concepts' beta weight must be seen as a result of multicollinearity. This predictive pattern occurred in both samples. Moreover, only in the construction sample the students' negative value of statistics substantially and independently explained additional variance in the WAESTA sum score. The difference between samples might be due to the fact, that the methods for assessing the value variable were not comparably formatted. Apart from this, all regression analyses demonstrated the students' statistics anxiety to be essentially and most closely predicted by their control beliefsas reflecting their perceived malleability of individual competencies in the statistics domain. Finally, for further clarification of the WAESTA scores, their mean differences between both the educational science and the special education students were analyzed. As the comparison groups were small and unequal Faber,Drexler, in size the nonparametric Mann-Whitney U-test for independent samples was used. In the construction sample a significantly higher level of statistics anxiety in the special education subgroup could be found (Z = -2.314, p = .021, effect size r = 0.22). In the validation sample, the level of statistics anxiety did not significantly differ between educational science and special education students (Z = -0.374, p > .05). However, with the small size of the validation sample in mind, this finding should be considered cautiously.

Discussion
The present study should examine the internal and external validity as well as the psychometric properties of the newly developed WAESTA scale for measuring educational science students' worry, avoidance, and emotionality cognitions in the domain of statistics learning. Conceptually, this measurement approach should integrate both the strengths of a more situation-and a more reaction-focused research line in the field. As a substantive result, this scale could be demonstrated to represent the construct in a unidimensional manner. The final scale version included all items as initially administered in both samples. Its internal consistency was most sufficient. Furthermore, its relations with self-belief and attainment variables most widely turned out as theoretically predicted. Specifically, total WAESTA score was more strongly correlated with the students' mathematics than with their language self-conceptand, thus, the scale could claim domain-specific validity. These findings correspondingly held for both the construction and the validation sample. For the time being, the WAESTA scale can be considered internally and externally valid as well as having adequate psychometric properties. Nevertheless, some results definitely deserve further attention.
In particular, the scale's underlying structure consistently appeared to be unidimensional. This finding indicates the strong empirical overlap among the worry, avoidance, and emotionality responsesand, thus, the cognitivemotivational interplay of anxiety components. Similarly, close relations had been already found elsewhere (Deffenbacher, 1980;Hodapp & Benson, 1997;Cassady & Johnson, 2002;Chin, Williams, Taylor, & Harvey, 2017;Hong & Karstensson, 2002;Sarason, 1984), especially with respect to domain-or task-specific facets of test anxiety (Faber, 1995(Faber, , 2012b. By no means, this result does challenge the need for a separate assessment of worry, avoidance, and emotionality cognitions. Rather this approach should ensure to obtain a more differentiated measuring of statistics anxiety and, thereby, should contribute to reducing the interpretation ambiguity of item responses. In that regard, it should certainly increase the scale's cognitivemotivational representativity and content validity. Likewise, the students' avoidance cognitions were found to be most closely related to their worry, but only slightly less closely related to their emotionality cognitions. Therefore, according to relevant findings (Galassi et al., 1981;Hagtvet & Benson, 1997), avoidance cognitions must be seen as an important feature within the students' anxiety experience and, thus, should have contributed to completing and refining the measuring of statistics anxiety (Putwain, 2008).
With respect to the validation results, both the correlation and regression analyses suggest, at first glance, that students' implicit entity beliefs are sufficient to explain their statistics anxiety. The entity beliefs appear to obviously play a crucial role in the prediction of statistics anxietyas could be expected from the view of social-cognitive theories (Dweck & Leggett, 1988;Schunk & Zimmerman, 2006). However, in the students' cognitivemotivational processing, they will operate in a more complex manner. According to relevant theoretical conceptions and empirical findings (Blackwell et al., 2007;Chiesi & Primi, 2010;Emmioǧlu, 2011;Onwuegbuzie, 2003;Sesé, Jiménez, Montano, & Palmer, 2015), it should be assumed that implicit beliefs actually mediate the effects of students' selfconcept and their learning background on the dependent anxiety variable. The massive decline in the self-concept variable's beta weights, when adding the entity belief variable to the regression equation, apparently supports this assumption (Table 5). Indeed, within the validation framework this indirect effect cannot be adequately substantiated with correlational or regression analysis, but only with multivariate modeling method (Kline, 2011). Future research should make every effort to apply such modelling techniques in order to clarify the role of entity beliefs in the statistics domain.
Beyond the purpose of scale validation, the empirical findings concerning the students' entity beliefs might even extend the previous research in two respects: at the level of construct specificity, the measuring of entity beliefs Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety 270 did not refer to the perceived malleability of general cognitive abilities but enquired the perceived malleability of statistical competencies. As this entity belief variable was only very weakly correlated with language self-concept (construction sample r = .07; validation sample r = -.12) it should be considered domain-specific. Hence, for the domain of statistical learning, this particular finding appears to be in line with the recommendations of the implicit theories approach (Dweck & Molden, 2005). Accordingly, at the level of construct relations, the results allow for refining the nomological scope of the statistics anxiety frameworkat least, as it refers to the type and role of self-belief variables (Bandalos et al., 1995;González et al., 2016;Onwuegbuzie & Wilson, 2003;Zeidner, 1991).
The present study undeniably suffers from some conceptual and empirical limitations. First of all, composition and size of both student samples do not allow for generalizing the empirical findings as should be required. Instead, the findings reported here might claim a sort of local validityall the more, as their data basis referred to a certain university setting. Further analyses should necessarily remedy this problem and examine the WAESTA scale with other student samples from other educational science contexts.
Moreover, the validation framework is still lacking in several respects and should be further completed (Benson, 1998). The present study assessed students' mathematics self-concept retrospectively. As a proportion of both samples did not have any prior experience of statistics using a measure of statistical self-concept would have been misguided. However, provided that further research could include participants being most comparable in their statistical background, their self-concept in the statistics domain should be absolutely used to elaborate scale validation (González et al., 2016). Likewise, concurrent measures of the students' self-efficacy to master certain statistical tasks could help to further differentiate the scale's criterion validity (Finney & Schraw, 2003;Perepiczka, Chandler, & Becerra, 2011). Not least, an appropriate validation of the WAESTA scale will require analysis of its relations with other instruments for measuring statistics anxietyfor instance, by comparing it with the German adaptation of the STARS questionnaire .
Another considerable lack of the present study concerns the missing of a relevant performance measure. As only a certain part of students in both samples had yet to pass an exam in introductory statistics, sufficiently robust data were not available. Further validation studies should analyze the relation between the WAESTA scores and suitable measures of students' actual statistics performance. Especially, this relation should be most instructivein as much as relevant studies commonly reported low to moderate correlations (Bandalos et al., 1995;Finney & Schraw, 2003;Sesé et al., 2015;Tremblay, Gardner, & Heipel, 2000;Vigil-Colet et al., 2008;Zeidner, 1991). However, these results do not really indicate a general flaw in the measures' criterion validity. Rather, they reflect the motivational consequences of statistics anxiety within a strongly restricted setting (Pekrun, 1988). As the successful passing of statistical requirements in the Master's degree is mandatory, the students' increasingly experienced worry, avoidance tendencies, and feelings of apprehension could dispose them to strenghten their learning effort in order to avoid an impending failure outcome (Macher, Papousek, Ruggeri, & Paechter, 2015;Martin & Marsh, 2003). Accordingly, for the WAESTA scale, also a moderate relation with the students' statistical performance should be assumed. Finally, as both samples in this study were small and predominantly female, gender was not included in the validation analyses. Relevant findings in the field could consistently demonstrate the females to report a higher level of statistics anxiety (Benson, 1989;Hong & Karstensson, 2002;Macher et al., 2012;Onwuegbuzie & Wilson, 2003). Interestingly, despite the apparently heightened anxiety level of female students, some studies could not substantiate any significant disadvantage in their exam performance (Bradley & Wygant, 1998;Macher, Paechter, Papousek, Ruggeri, Freudenthaler, & Arendasy, 2013). This finding needs further clarification with respect to the underlying motivational and behavioral processes. Hence, female students might have overrated their individually existing anxiety level (Zeidner, 1998)possibly due to a self-derogatory gender stereotyping effect (Bieg, Goetz, Wolter, & Hall, 2015;Pomerantz, Altermatt, & Saxon, 2002). As well, pursuing a more adaptive coping strategy to avoid feared failure, they might have ramped up their learning approach (Martin & Marsh, 2003). Given a larger sample size with a more Faber, Drexler, Stappert & Eichhorn -Education science students' statistics anxiety 272 adequately balanced gender ratio, this issue should also be examined with respect to the WAESTA scale.
In summary, the present findings yield important information concerning the internal and external validity of the newly developed WAESTA scale. However, they must be seen as preliminary in nature. Therefore, they should represent just a very first step in method development.