Cover Page

Applied Statistics

Theory and Problem Solutions with R

Dieter Rasch
Rostock
Germany

 

Rob Verdooren
Wageningen
The Netherlands

 

Jürgen Pilz
Klagenfurt
Austria

Wiley Logo

Preface

We wrote this book for people that have to apply statistical methods in their research but whose main interest is not in theorems and proofs. Because of such an approach, our aim is not to provide the detailed theoretical background of statistical procedures. While mathematical statistics as a branch of mathematics includes definitions as well as theorems and their proofs, applied statistics gives hints for the application of the results of mathematical statistics.

Sometimes applied statistics uses simulation results in place of results from theorems. An example is that the normality assumption needed for many theorems in mathematical statistics can be neglected in applications for location parameters such as the expectation, see for this Rasch and Tiku (1985). Nearly all statistical tests and confidence estimations for expectations have been shown by simulations to be very robust against the violation of the normality assumption needed to prove corresponding theorems.

We gave the present book an analogous structure to that of Rasch and Schott (2018) so that the reader can easily find the corresponding theoretical background there. Chapter 11 ‘Generalised Linear Models’ and Chapter 12 ‘Spatial Statistics’ of the present book have no prototype in Rasch and Schott (2018). Further, the present book contains no exercises; lecturers can either use the exercises (with solutions in the appendix) in Rasch and Schott (2018) or the exercises in the problems mentioned below.

Instead, our aim was to demonstrate the theory presented in Rasch and Schott (2018) and that underlying the new Chapters 11 and 12 using functions and procedures available in the statistical programming system R, which has become the golden standard when it comes to statistical computing.

Within the text, the reader finds often the sequence problem – solution – example with problems numbered within the chapters. Readers interested only in special applications in many cases may find the corresponding procedure in the list of problems in Appendix A.

We thank Alison Oliver (Wiley, Oxford) and Mustaq Ahamed (Wiley) for their assistance in publishing this book.

We are very interested in the comments of readers. Please contact:

d_rasch@t‐online.de, l.r.verdooren@hetnet.nl, juergen.pilz@aau.at.

Rostock, Wageningen, and Klagenfurt, June 2019, the authors.

References

  1. 1985 Rasch, D. and Tiku, M.L. (eds.) (1985). Robustness of statistical methods and nonparametric statistics. In: Proceedings of the Conference on Robustness of Statistical Methods and Nonparametric Statistics, held at Schwerin (DDR), May 29‐June 2, 1983. Boston, Lancaster, Tokyo: Reidel Publ. Co. Dordrecht.
  2. 2018 Rasch, D. and Schott, D. (2018). Mathematical Statistics. Oxford: Wiley.

1
The R‐Package, Sampling Procedures, and Random Variables

1.1 Introduction

In this chapter we give an overview of the software package R and introduce basic knowledge about random variables and sampling procedures.

1.2 The Statistical Software Package R

In practical investigations, professional statistical software is used to design experiments or to analyse data already collected. We apply here the software package R. Anybody can extend the functionality of R without any restrictions using free software tools; moreover, it is also possible to implement special statistical methods as well as certain procedures of C and FORTRAN. Such tools are offered on the internet in standardised archives. The most popular archive is probably CRAN (Comprehensive R Archive Network), a server net that is supervised by the R Development Core Team. This net also offers the package OPDOE (optimal design of experiments), which was thoroughly described in Rasch et al. (2011). Further it offers the following packages used in this book: car, lme4, DunnettTests, VCA, lmerTest, mvtnorm, seqtest, faraway, MASS, glm2, geoR, gstat.

Apart from only a few exceptions, R contains implementations for all statistical methods concerning analysis, evaluation, and planning. We refer for details to Crawley (2013).

The software package R is available free of charge from http://cran.r‐project.org for the operating systems Linux, MacOS X, and Windows. The installation under Microsoft Windows takes place via ‘Windows’. Choosing ‘base’ the installation platform is reached. Using ‘Download R 2.X.X for Windows’ (X stands for the required version number) the setup file can be downloaded. After this file is started the setup assistant runs through the installation steps. In this book, all standard settings are adopted. The interested reader will find more information about R at http://www.r‐project.org or in Crawley (2013).

After starting R the input window will be opened, presenting the red coloured input request: ‘>’. Here commands can be written up and carried out by pressing the enter button. The output is given directly below the command line. However, the user can also realise line changes as well as line indents for increasing clarity. Not all this influences the functional procedure. A command to read for instance data y = (1, 3, 8, 11) is as follows:

 > y <- c(1,3,8,11) 

The assignment operator in R is the two‐character sequence ‘<-’ or ‘=’.

The Workspace is a special working environment in R. There, certain objects can be stored that were obtained during the current work with R. Such objects contain the results of computations and data sets. A Workspace is loaded using the menu

 File – Load Workspace... 

In this book the R‐commands start with >. Readers who like to use R‐commands must only type or copy the text after > into the R‐window.

An advantage of R is that, as with other statistical packages like SAS and IBM‐SPSS, we no longer need an appendix with tables in statistical books. Often tables of the density or distribution function of the standard normal distribution appear in such appendices. However, the values can be easily calculated using R.

The notation of this and the following chapters is just that of Rasch and Schott (2018).

1.3 Sampling Procedures and Random Variables

Even if we, in this book, we mainly discuss how to plan experiments and to analyse observed data, we still need basic knowledge about random variables because, without this, we could not explain unbiased estimators or the expected length of a confidence interval or how to define the risks of a statistical tests.

References

  1. Crawley, M.J. (2013). The R Book, 2nd edition, Chichester: Wiley.
  2. Rasch, D. and Schott, D. (2018). Mathematical Statistics. Oxford: Wiley.
  3. Rasch, D., Herrendörfer, G., Bock, J., Victor, N., and Guiard, V. (2008). Verfahrensbibliothek Versuchsplanung und ‐ auswertung, 2. verbesserte Auflage in einem Band mit CD. R. Oldenbourg Verlag München Wien.
  4. Rasch, D., Pilz, J., Verdooren, R., and Gebhardt, A. (2011). Optimal Experimental Design with R. Boca Raton: Chapman and Hall.