statistical analyses with openoffice.org calc

I don’t know why, but, occasionally, people may have to do minimal or even considerable statistical calculations using Calc, the OpenOffice.org‘s spreadsheet part of the Office Productivity Suite. Although, I wouldn’t recommend doing so since there are other, better software packages to complete simple and complex statistical analyses, such as R, a statistical programming language and environment with excellent analysis and graph capabilities. R can be found at one of its many mirrors.

However, for those who need or have to (or … whatever), here are some very very basic info on using statistical functions in Calc. Calc has a considerable variety of functions documented in the openoffice wiki. of statistical (and other types of) functions that you can use. There is also OOo_Stat, an external add-in, that you can include to assist you with your analyses. More on that later on.

MInd you, I am not going to provide a full “Statistics using Calc” guide here, I just want to suggest a few statistical functions for basic descriptive and inferential statistics.

A. Basic Descriptive Statistics
I. Central Tendency

  1. AVERAGE returns the mean average of its list of arguments
  2. MEDIAN returns the median value (the value at the 50th percentile) of its respective list of arguments
  3. MODE returns the most frequent value of its list of arguments

There are other functions, like GEOMEAN, HARMEAN, and TRIMMEAN which give the geometric mean, the harmonic mean and the trimmed mean (excluding a portion of highest and lowest values) of their arguments.

II. Variation

  1. KURT estimates the kurtosis of the distribution of values (ie, how flat or how tall the distribution is)
  2. SKEW estimates the asymmetry of the distribution (ie, how different from the symmetrical normal distribution it is)
  3. STDEVestimates the standard deviation of the distribution
  4. VAR estimates the variance (ie, the square of the standard deviation) of its list of arguments

Note, that there are functions STDEVP and VARPP for the standard deviation and variance of the population where the sample has come from.

B. Some (not all) inferential statistics
I. Simple tests

  1. CHITEST estimates the chisquare test of goodness-of-fit
  2. FTEST estimates the likelihood that two samples have the same variance.
  3. TTEST estimates Student’s t-test of one or two samples

II. Correlations, and simple regression

  1. CORREL estimates the Pearson product-moment correlation coefficient between two arrays of values
  2. PEARSON is identical to CORREL
  3. INTERCEPT estimates the intercept b_0 of the regression y = b0 + b1*x
  4. SLOPE estimates the slope b1 of the same regression equation
  5. STEYX estimates the residual standard error for the regression
  6. RSQ estimates the R^2, the square of the correlation coefficient between x and y>
  7. LINEST is a different kind of story. It provides a linear estimation for the data at hand. Basically it summarizes the regression analysis. It is an array function which means it must be called like this {=LINEST(arg1;arg2;arg3;arg4)}.
Linest Function Output

Linest Function Output

Data in the above example come from the R program and its cars dataset which “… give the speed of cars and the distances taken to stop” according to the help files. R help files cites Ezekiel’s 1930 Methods of Correlation Analysis published by Wiley as the source of the data and D. R. McNeil’s Interactive Data Analysis also published by Wiley in 1977) as a reference source for the data.

The output of the LINEST function consists of eight cells between F18 (upper left) and G22 (lower right) cells. Unfortunately no explanation is provided. I have tried to write in the adjacent cells some labels about each cell in the output of LINEST. Please note, that the third line of output (F21:G21) shows only the F-criterion for the fitting of data and the denominator degrees of freedom. In statistical parlance, F(1,48) = 89.57, p < 1.49e-012, which can easily be verified by using another Calc function FDIST.

There is additional output above the F18:G22 range of cells, basically the output of the CORREL, PEARSON, RSQ, INTERCEPT, SLOPE, STEYX commands from top to bottom, which all verify the output of the LINEST function.

C. OOo_STAT
An add-in macro for Calc, it comes with its own instructions on how to install it (you must enable macros) and then appears either as and additional top-level menu or as an item under Tools. OOo_STAT will not appear in Writer. It includes functions for graphics and advanced inferential statistics not available through other Calc functions. It is still in 0.5 version (which would make it a beta) but works pretty well.

After install, OOoStat gives three choices: Applications, Basic Stats, and Multivariate Statistics as shown below

OOo Stat macro

OOo Stat macro

The Applications menu allows for visualization of data via histograms and other manipulations.

OOoStat First Menu

OOoStat First Menu

The Basic Stats deals with one- and two- way ANOVA, Correlation Coefficients and Multiple Regression.

OOo_Stat Basic Stats

OOo_Stat Basic Stats

The Multivariate Analysis menu involves Principal Components Analysis, Correspondence Analysis, Multivariate Analysis of Variance (MANOVA), analyses which may be better completed using a dedicated statistical analysis package.

OOo_Stat Multivariate Statistics

OOo_Stat Multivariate Statistics

All in all, Calc provides a rather extensive set of tools (either through functions or through the OOo_Stat addin) which may assist users in their statistical analysis. However, beware, the user should always be in control of the analysis tools and not the other way round.

Enjoy,

I.

Enhanced by Zemanta

7 thoughts on “statistical analyses with openoffice.org calc

  1. While I found your information from “A. Basic Descriptive Statistics” reasonably informative and helpful, your opening statements and tone are at best condescending, and generally insulting. Since the article is about statistical analysis in OpenOffice Calc, suggesting that a person is stupid for using Calc is highly unprofessional and completely in appropriate. If you prefer another software package that is fine. However, such should not be your opening for this topic, nor should it be presented as treating those using Calc as inferior. Please take the time to re-work you opening so that your skill and knowledge shine cleanly instead of through the fog of resentment it currently generates.

  2. Mike,

    I’d like to know what you found “… at best condescending, and generally insulting”. Yes, Calc (as well as Excel or any other spreadsheet program) was not meant to be used for statistics work. There are better, more accurate, more analytical programs for such work.

    If you object to the phrase ” … for those who need or have to (or … whatever), “, then I am sorry that you felt in such a way. I never said that the people using Calc are inferior, only that there are other options for their statistical work.

    Thank you, however, for you kind words about my skill and knowledge. I appreciate it.

    I

  3. I agree with you that for professional statistical analysis or for research, more specific statistical packages such as R should be used. But for simple descriptive statistics or for educational purposes Calc may be appropriate.

    I have been written a set of macros for a class that I teach in descriptive statistics, I’ve called it ODStatistics and it can be downloaded from:

    http://sourceforge.net/projects/odstatistics/files/odstatistics-0.3.17.zip/download

  4. Walter,

    Amazing macro and very very detailed work. This should be recommended highly and yes, I have already download it and used it.

    I may say that it works perfectly with LibreOffice.

    I

  5. I use LibreOffice and OooStat to teach statistical functions in my course at a business school. Your page is very helpful.

    I’m glad to have followed through Walter’s link. Very helpful, once again!

    Shankar

  6. You post very interesting posts here. Your page deserves
    much bigger audience. It can go viral if you give it
    initial boost, i know useful service that can help you, simply search in google: svetsern traffic tips

Leave a comment