statistical analyses with calc

I don’t know why, but, occasionally, people may have to do minimal or even considerable statistical calculations using Calc, the‘s spreadsheet part of the Office Productivity Suite. Although, I wouldn’t recommend doing so since there are other, better software packages to complete simple and complex statistical analyses, such as R, a statistical programming language and environment with excellent analysis and graph capabilities. R can be found at one of its many mirrors.

However, for those who need or have to (or … whatever), here are some very very basic info on using statistical functions in Calc. Calc has a considerable variety of functions documented in the openoffice wiki. of statistical (and other types of) functions that you can use. There is also OOo_Stat, an external add-in, that you can include to assist you with your analyses. More on that later on.

MInd you, I am not going to provide a full “Statistics using Calc” guide here, I just want to suggest a few statistical functions for basic descriptive and inferential statistics.

A. Basic Descriptive Statistics
I. Central Tendency

  1. AVERAGE returns the mean average of its list of arguments
  2. MEDIAN returns the median value (the value at the 50th percentile) of its respective list of arguments
  3. MODE returns the most frequent value of its list of arguments

There are other functions, like GEOMEAN, HARMEAN, and TRIMMEAN which give the geometric mean, the harmonic mean and the trimmed mean (excluding a portion of highest and lowest values) of their arguments.

II. Variation

  1. KURT estimates the kurtosis of the distribution of values (ie, how flat or how tall the distribution is)
  2. SKEW estimates the asymmetry of the distribution (ie, how different from the symmetrical normal distribution it is)
  3. STDEVestimates the standard deviation of the distribution
  4. VAR estimates the variance (ie, the square of the standard deviation) of its list of arguments

Note, that there are functions STDEVP and VARPP for the standard deviation and variance of the population where the sample has come from.

B. Some (not all) inferential statistics
I. Simple tests

  1. CHITEST estimates the chisquare test of goodness-of-fit
  2. FTEST estimates the likelihood that two samples have the same variance.
  3. TTEST estimates Student’s t-test of one or two samples

II. Correlations, and simple regression

  1. CORREL estimates the Pearson product-moment correlation coefficient between two arrays of values
  2. PEARSON is identical to CORREL
  3. INTERCEPT estimates the intercept b_0 of the regression y = b0 + b1*x
  4. SLOPE estimates the slope b1 of the same regression equation
  5. STEYX estimates the residual standard error for the regression
  6. RSQ estimates the R^2, the square of the correlation coefficient between x and y>
  7. LINEST is a different kind of story. It provides a linear estimation for the data at hand. Basically it summarizes the regression analysis. It is an array function which means it must be called like this {=LINEST(arg1;arg2;arg3;arg4)}.
Linest Function Output

Linest Function Output

Data in the above example come from the R program and its cars dataset which “… give the speed of cars and the distances taken to stop” according to the help files. R help files cites Ezekiel’s 1930 Methods of Correlation Analysis published by Wiley as the source of the data and D. R. McNeil’s Interactive Data Analysis also published by Wiley in 1977) as a reference source for the data.

The output of the LINEST function consists of eight cells between F18 (upper left) and G22 (lower right) cells. Unfortunately no explanation is provided. I have tried to write in the adjacent cells some labels about each cell in the output of LINEST. Please note, that the third line of output (F21:G21) shows only the F-criterion for the fitting of data and the denominator degrees of freedom. In statistical parlance, F(1,48) = 89.57, p < 1.49e-012, which can easily be verified by using another Calc function FDIST.

There is additional output above the F18:G22 range of cells, basically the output of the CORREL, PEARSON, RSQ, INTERCEPT, SLOPE, STEYX commands from top to bottom, which all verify the output of the LINEST function.

An add-in macro for Calc, it comes with its own instructions on how to install it (you must enable macros) and then appears either as and additional top-level menu or as an item under Tools. OOo_STAT will not appear in Writer. It includes functions for graphics and advanced inferential statistics not available through other Calc functions. It is still in 0.5 version (which would make it a beta) but works pretty well.

After install, OOoStat gives three choices: Applications, Basic Stats, and Multivariate Statistics as shown below

OOo Stat macro

OOo Stat macro

The Applications menu allows for visualization of data via histograms and other manipulations.

OOoStat First Menu

OOoStat First Menu

The Basic Stats deals with one- and two- way ANOVA, Correlation Coefficients and Multiple Regression.

OOo_Stat Basic Stats

OOo_Stat Basic Stats

The Multivariate Analysis menu involves Principal Components Analysis, Correspondence Analysis, Multivariate Analysis of Variance (MANOVA), analyses which may be better completed using a dedicated statistical analysis package.

OOo_Stat Multivariate Statistics

OOo_Stat Multivariate Statistics

All in all, Calc provides a rather extensive set of tools (either through functions or through the OOo_Stat addin) which may assist users in their statistical analysis. However, beware, the user should always be in control of the analysis tools and not the other way round.



Enhanced by Zemanta