Cover Page

Education Set

coordinated by

Gérard Boudesseul and Angela Barthes

Volume 2

Quantitative and Statistical Data in Education

From Data Collection to Data Processing

Michel Larini

Angela Barthes

images

Introduction

This book outlines the main methods used for a simple analysis, and then a more elaborate one, of quantitative data obtained in a study or a work of research. It is aimed primarily at students, teachers and researchers working in the education sector, but may also be illuminating in the various domains of the human and social sciences.

The book may be viewed as a step-by-step course: it begins with an introduction to the various methods used to gather data and, one step at a time, it touches on the essential aspects of the quantitative analysis techniques used in the field of research in education to extract meaning from the data.

Essentially, the book is designed for readers who are new to these types of methods. Nevertheless, it could also be very useful for doctoral candidates, and even for researchers, who already have experience, if their approach to the data is merely software based, and they wish to gain a better understanding of the fundaments of these methods in order to make better use of them, take their analyses further and avoid certain pitfalls.

Unlike many other books on the subject, which can be rather difficult to read, or which examine one method and one only, we elected to present a range of the most widespread approaches that can be used easily in the area of education. Thus, readers who want a detailed understanding are advised to consult more specialized publications.

This book is not a mathematics book which presents all of the (often complex) theoretical bases of the methods employed. Nor, though, do we wish to limit it to a presentation of the formulae and the procedures for using these methods. At every stage, we have sought to offer a balanced presentation of the method, with the twofold objective of being comprehensible and enabling users to handle the data in full awareness of what they are doing. Thus, when we do go into some degree of mathematical detail, it is not absolutely essential to read these parts (though it may be helpful to some).

In today’s world, students and researchers are in the habit of using software packages where all they need to do is input the data and, simply, press a button. This approach carries with it a certain amount of risk, if the people using the software have insufficient prior knowledge of the fundaments of the methods the program employs. Obviously, throughout the presentations herein, we have used software tools, but deliberately chose not to include a discussion about those tools. The ways in which they are used differ from one program to another, and they evolve very quickly over time. It is possible that, by the time you come to read this book, the programs used here will no longer be in circulation or will have evolved, and indubitably, there will be others that have been developed, which perform better and are more user friendly. In any case, before processing any data, readers will need to invest time and effort in learning how to use a software tool properly; that prior investment is absolutely crucial. After all, before you can use a car, you have to learn to drive. However, time and again, we present the calculations manually, because they will help readers to follow the theoretical process, step by step, from raw data to the desired results, and this is a highly enlightening approach.

Without going into detail, we can state that it is indispensable to perform quantitative data analyses when faced with data taken from a large number of individuals (from a few dozen to thousands or more). The researcher or student collects the data they need; those data are entered into a table cross-referencing the individuals sampled with the various parameters (variables) measured: the Individuals/Variables [I/V] table. This is the starting point for the data analysis, because it tends not to be directly interpretable, but we need to extract as much information from it as possible. In order to do so, the researcher takes a series of steps.

The first step is elementary descriptive statistics. It consists of constructing other, more explicit tables, extracted from the [I/V] table, and then generating graphical and cartographic representations of those data. In addition, in the case of numerical variables, it is possible to achieve a more accurate description by introducing mathematical indicators: mean, variance, standard variation for each of the variables, and covariance and correlation coefficient for each pair of variables. After using the tools offered by descriptive statistics, researchers are able to begin to present the data, comment upon them, and compare them to the original working hypotheses.

The second step is confirmatory statistics, also known as statistical inference. At this stage in the process, the researcher is able to present the data in a legible form, and has been able to make observations about the dataset and draw conclusions. However, for obvious practical reasons, these data will have been collected only from a reduced sample, rather than from the entire population, and there is nothing to suggest that were we to look at other samples within the same population, the same conclusions would have been reached. The researcher then needs to consider whether the results obtained on the sample or samples at hand can be generalized to apply to the whole population. This is the question addressed by confirmatory statistics based on fundamental concepts of probability and on the law of coincidence and the law of large numbers. Confirmatory statistics gives us laws that predict the probability of a given event occurring in a population. With that in mind, it is possible to compile probability tables, which can be used as the basis for statistical tests (averages, Student’s t-test, χ2 distribution, ANOVA, correlation test, etc.), which the researcher needs to use to find out whether the results obtained can be generalized to the entire population.

The third step involves multivariate data analysis techniques, which offer overall observation of the links that may exist between more than two variables (3, 4, …, n). They are being used increasingly frequently. They complement elementary descriptive statistics as they can reveal unexpected connections and, in that sense, they go further in analysis of the data. Principal component analysis (PCA), which applies to numerical variables, is at the root of multivariate methods. Factorial correspondence analysis (FCA) and factorial multiple correspondence analysis (FMCA), for their part, apply to qualitative data, and they are built on the theoretical foundations of PCA.

The fourth step that may be used is statistical modeling. Having demonstrated the existence of links between the different variables, we can attempt to find any mathematical relations that might exist between one of the variables (known as a dependent variable) and one or more other variables (known as explanatory variables). In essence, the aim is to establish a predictive model that can illustrate how the dependent variable will be affected as the explanatory variables evolve; hence, the method is known as statistical modeling. For example, Pascal Bressoux sets out to build a model that determines the opinion that a teacher has of his pupils (the dependent variable) as a function of multiple explanatory variables, such as their academic performances, tardiness to school, their parents’ socioprofessional status, etc. Statistical modeling can deal with situations that vary widely with the nature of the explanatory variables and of the dependent variable (normal or logistic regression), depending on whether the explanatory variables act directly or indirectly, or on the effects of context with the existence of various levels of interactions (pupils, class, school, town, etc.).

The book is organized in the same order as the steps that a researcher needs to take when carrying out a study. Hence, it consists of six chapters, each having between two and five sections. Chapter 1 deals with data collection in education. Chapter 2 looks at elementary descriptive statistics. Chapter 3 is given over to confirmatory statistics (statistical inference). Then, in Chapter 4, we examine multivariate approaches, followed by statistical modeling in Chapter 5. Finally, Chapter 6 is devoted to presenting tools commonly used in education (and other disciplines), but are gaining robustness as they become slightly more quantitative than normal, and it is helpful to formally describe this process. The two examples cited here relate to social representations in education and the studies leading from links to knowledge. The basic idea is to show that many methods can be improved, or transformed, to become more quantitative in nature, lending greater reproducibility to the studies.