Professor Rudy A. Gideon
Retired July 2005
Retired from teaching parttime
May 2008
But still working on research
Department of Mathematical Sciences
University of Montana
Missoula, MT 59812
Phone: 406-243-4162
Phone Home: 406.728.3858
FAX: 406-243-2674
www.math.umt.edu/gideon
gideon@mso.umt.edu
Last books used for classes:
An Introduction with Statistical Applications by John J. Kinney.
Introduction to Modern Nonparametric Statistics by James J.
Higgins.
Biostatistical Analysis by J.H. Zar.
wea0203.sav (wea0203.sav) This is an SPSS data file containing temperatures of Missoula
for one year in weekly intervals from Monday through Friday. It is to be used
for regression, correlation, and time series examples.
From here on appears only
the research work on the use of correlation coefficients as general statistical
tools. It is called the CES, Correlation Estimation System.
Publication Links
Publication 1: The Correlation Coefficients; this
is a refined combined version of papers 1 and 2 below. To appear in November
2007 issue of JMASM , Journal of Modern Applied Statistical Methods, #951
Publication 2: Correlation in Simple Linear Regression;
this is a refined version of paper 5 with the same title, found below the list
of publications. It is currently in the publication review process. Communciations
in Statistics, theory and methods, #A06-277. See below for the data sets that
are used.
Publication3: Location and Scale Estimation with Correlation
Coefficients, Communciations in Statistics, theory and methods #A06-374
Publication 4: The Relationship between
a Correlation Coefficient and its Associated Slope Estimates in Multiple Linear
Regression, Sankhya #B 08011, currently being reviewed
Publication 5: Correlation and Regression without
Sums of Squares, College Mathematics Journal, #08-054. This was rejected as
too advanced. Need another mid-level journal to submit to.
Publication 6: Nonlinear Correlation Coefficients,
maybe send to The Canadian Journal of Statistics when revised,
#CJS 1370CT07
Data set number 1: Major League Baseball Data
from 1989 used in Publication 2
Data set number 2: Major
League Baseball Data from 1992, Atlanta Braves and Opponents hits and runs for
175 games, used in Publication 2
Research, a Billabong of Statistical Estimation using Correlation and Rank-Based
methods
All of the work below is part of a general system of estimation with correlation
coefficients. Whatever can be done with Least Squares can also be done with
the following methods. Because the work is so extensive it can only gradually
be posted. It has taken many years to develop and has been supported by 5 Ph.
D. students and numerous Masters students. The students names appear below as
I am indebted to them.
It has been funded by a private grant from John Bryan and from the National
Security Agency. The work is an extension of the basic papers
- Gideon, R.A. and Hollister, R.A. (1987), "A Rank
Correlation Coefficient Resistant to Outliers", Journal of the American
Statistical Assoc. vol 82, pp656-666
- Gideon, R.A., Prentice, M.J., and Pyke, R (1989), "The Limiting Distribution
of the Rank Correlation Coefficient, GD, appearing in Contributions to Prob
and Statistics (Essays in Honor of Ingram Olkin), edited by Gleser, L.J. et
al Springer Verlag, N.Y. pp217-226
Research Interests
- The use of correlation coefficients in statistical
estimation
- The use of the rank based CC called the
The Greatest Deviation in Statistical estimation
- The use of software package Splus in implementing
the above two topics
- Rank Based CC estimators are robust, so robustness is of interest
- Making computer packages more versatile by
incorporating correlational methods
Robust Study
Robust Comparison (Baseball gametimes regressed
on hits,runs,pitchers,LOB,BB, and Ks)
The newspaper, USA Today, article on baseball gametimes appeared on 19 March
2002 on Sports Page 3C.
This Robust link is here for comparison of 4 robust multiple regression methods
and ordinary Least Squares, the data was analyzed as it was generated and
so many partial analyses occur after games number 9,14,20,31,39,46,53,58,65,82
end of first half of season. Second half analyses after games, 44,69, 79.
The combined analysis of all 161 games. Some quotes about length of games
from USA today and a statistical comparsion.
Paper #6 will contain the methodology.
Graphs of Various Baseball Data Variables, comparing
LS and GDCC
Online papers showing how to
use any Correlation Coefficient to estimate parameters in a wide variety of
settings
All details are illustrated with the Greatest Deviation Correlation Coefficient
(GD), the links are the numbers to particular papers.
#1 A Generalized Interpretation of Pearson's
r
Contents:
- Pearson's CC is defined using the diagonals of a parallelogram
- This diagonal idea is used to explain the definition of other correlation
coefficients E.G. Greatest Deviation, Gini's(Spearman's footrule),
- An absolute value correlation coefficient is defined which should be usedanytime
who uses L-one methods are employed
- A median absolute deviation correlation is defined, the correlation extension
of MAD methods
Figures for #1
# 2The Correlation
Coefficients
- Continuous and rank absolute value correlations are defined
- All correlations are examined as the difference in measures from perfect
negative and positve correlation e.g., Pearson, Spearman, Kendall, Greatest
Deviation, and the absolute value correlations
- A 0-1 graph-table is given that shows how to compute GD, Kendall, Spearman,
and the rank absolute value CC all on the same graph-table
- The asymptotic distributions are given for the above four rank CC's and
an example is worked out
- A small example suggests which CC's are most robust
- The general tied-value procedure, which allows complex calculations such
as those in regression problems, is reviewed
#3 The Geometrical Definition
of GDCC and its Uniqueness
Contents:
- The basic counting technique to compute the Greatest
Deviation Correlation Coefficient (GDCC)
- The exact null distribution up to n = 15
- The population and sample definitions of the GDCC are illustrated by geometry
#4 Random Variables, Regression, and the
GDCC
Contents:
- A population discussion of GDCC and simple linear
regression
- The minimum sum of squares of probabilities(volumes)
for the bivariate normal and Cauchy distributions
- The population regression lines
- Asymptotic relationship between the CC and the slopes
in simple linear regression
- The correlated bivariate Cauchy Distribution is
defined with parameter rho
- The Bivariate Normal and Cauchy have the same GDCC,
(2/pi)*arcsin(rho)
- GDCC can do regression for all elliptical bivariate distributions, from
Cauchy, the Student t's, to Normal
#5 Correlation in Simple Linear Regression
Contents:
- The correlation equation is shown to be a general way
to define an equation for simple linear regression.
- Least Squares is done via Pearson's Correlation and
then Kendall's tau and also the Greatest Deviation Correlation
- Several examples are given, the graphs and figures are
in the link just below
- Confidence intervals for the slopes are constructed via the asymptotic distribution.
Figures for #5
#6 Gideon, R. A., and Rothan, A. M. (2004a),
"Location and Scale Estimation with Correlation Coefficients"
- Methods showing how to construct estimates of variation, standard deviation,
and location, median or mean, using nonparametric correlation coefficients
are given. (Actually, any correlation coefficient may be used.)
- The work is connected to Downton (1966) work and D'Agostino (1971, 1973)
- An example is given comparing the robust properties to examples given in
Iglewicz (1983) and Nemenyi, Dixon, White, and Hedstrom (1977)
- The estimate of the median through the GD is shown to be nearly as efficient
as the classical mean when the data are normally distributed.
- How to construct confidence intervals and perform hypothesis testing for
the standard deviation is explained.
#7 Multiple Regression technique
with Asympotics (student-Miller)
- The multiple Taylor series expansion for Pearson's
Correlation is used to connect correlation to classical distribution theory
- This result is expanded to allow inference the
Greatest Deviation Correlation Coefficient
- A multiple regression example is given
- Partial and multiple correlation coefficients are
defined for GDCC and examples given
- This paper explains how the baseball game time example in the "robust
comparison" link is done
#8 this link will contain a paper delivered
to the IMS Annual Meeting in Banff, Canada Tuesday July 30, 2002
Open link #8 only with Internet Explorer (MS) not with Netscape to see a Power
Point Presentation
- The correlation Principle
- Measuring linearity with a nonparametric correlation
coefficient
- examples with good and bad data
- the order norm is used but not explained; order norm
is to be a later addition, already written, but not yet added here
- the same classical interpretation of regression and correlation can be used
with nonparametric correlations; i.e., fraction of regression explained
#9 A Robust Norm Using GDCC (Carol Ulsafer
was a co-author of this work)
-
The paper "Location and Scale Estimation with Correlation Coefficients"
is used to develop a robust norm.
- The Norm is called an order norm because it is based on the ordered data.
- An order "inner product" is defined.
- A study is made on the zero of the order norm.
- The triangle inequality is shown not to hold.
- A landcover satellite example is given.
#10 Gideon, R. A., and Rothan, A. M. (2004b),
"Elementary Slopes in Simple Linear Regression"
-
A weighted average of the elementary slopes is shown to give the least-squares
estimate when the regressor variable values are fixed and the error is independent
and normal.
- It is shown that the elementary slopes have a rescaled Cauchy distribution
for bivariate normal data.
- This Cauchy distribution is then used with correlation coefficients to estimate
the regression parameters.
- Simulations show that with outliers distributed symmetrically, both Kendall,s
Tau and GD operating on the original data and GD operating on the elementary
slopes slopes of the bivariate data are robust in estimating the slope in
simple linear regression.
- An example of the process using bivariate normal data with some contamination
in the Y-variable along with a scatterplot of comparison of fits by least
squares, GD, and GD operating on the elementary slopes
-
A Bivariate Cauchy distribution is analyzed via simple linear regression
and GDCC
- Inference is done with the asymptotic distribution of GDCC
- The distribution free property over the class of bivariate t's is demonstrated
at the ends; the Cauchy and Normal
- Robustness is shown by comparing the normal and Cauchy distributions
- A geometric method is used to estimate a ration of scale parameters
- This paper again is general so that the method could be applied with
other correlation coefficients
#12
Sheng,HuaiQing,(Tom), Ph.D. advisor Gideon, R.A. 2002 "Estimation in Generalized
Linear Models and Time Series Models with Nonparmetric Correlation Coefficients"
-
I: Linear Regression and Nonparametric Correlation Coefficients
Simple Linear Regression and Multiple Linear Regression
- II: Generalized Linear Models and Estimation, Estimation using
the Greatest Deviation Correlation Coefficient, GLM with the Poisson, GLM
with the Logistic
- III: Nonlinear Models and Estimation, examples compared to
least square and the steepest decent methods.
- IV: Time Series Models and Estimation, ARMA model, moving
averages, autoregressive processes, mixed models, forecasting, practical examples
- V: Bibliography, references include the papers above as well
as related work.
- This work shows the generality of estimation with Nonparametric Correlation
Coefficients on advanced techniques by utilizing the Greatest Deviation CC.
It is a very general estimation procedure. The Ph.D. disseration is on file
at the University of Montana Library, Missoula, MT. It can be accessed at
- http://wwwlib.umi.com/dissertations/fullcit/3041406
#13 A Two-Sample Experiment Analyzed by
the Correlation Method
-
This is an independent two sample problem, measuring the distance around
an Oval in the center of the University of Montana campus
- It was performed by students in an applied statistics class, one sample
used a step counting method and the second sample was based on measuring time
- There were outliers present
- The GD method was compared to classical methods
- Bootstrapping and Permutation tests were employed
- Quantile plots are given to illustrate the results
- As always this method is shown to be very practical and it can be used to
avoid decision making about the inclusion of outliers
- The methods are explained in previous papers posted on the Web site, numbers
6 and 8
#14 General Definition of Correlation
Coefficients
-
presented in Minneapolis August 2005 at the National Meeting, Poster
Session
- General outline of correlation methods for location, scale, regression
parameters
- An minimization technique using correlation, an alternative to least
squares
- A regression example with bivariate Cauchy data comparing GD and LS
#15 Two robust examples including the education
data
-
This is real data
- The main emphasis is however, to compare several correlation coefficients
on real data
- The Greatest Deviation Correlation is used as it is very robust
- Spearman, Kendall, Pearson correlations compared to GDCC
- The conclusion is that GDCC gives added insight to relationships obscured
by other correlations and widely variable data
- There are two sets of data but the education data is most revealing
- The data comes first in this pdf file and the write-up is in the middle
#16 Estimating the Parameters of the Pareto Distribution
-
This is the master's thesis of Joseph Petersen
- The idea is to show that the correlation estimation method can be used to
estimate parameters in a wide variety of settings including particular distributions
- The GDCC was used to estimate the parameters and comparsions to existing
methods were made.
- Again it was demonstrated that the correlation method is useful and robust
using GDCC
Acknowledgments
- Ron Pyke, University of Washington, for being my
Masters supersvisor, and completing the asymptotic distribution derivation and
write-up
- John Gurland, University of Wisconsin, my Ph.D.
advisor at Madison Wisconsin
- Mike Prentice, University of Edinburgh, for asking the
question, "What is it estimating?" and allowing me a Sabbatical in Scotland
- Student names will appear here soon; immediately below are five Ph.D. students
- Dale Mueller,Spring 1978, "A Geometrical View of the Kolmogorov-Smirnov
Statistics with Multi-Sample Generalizations"
- Sister Adele M. Rothan, Summer 1982, "A Distribution-Free Scale Test of
the Kolmogorov-Smirnov Type"
- Robert Hollister, Summer 1984, "A Correlation Coefficient Based on Maximum
Deviation"
- Steve Rummel, Summer 1991, "A Procedure for Obtaining a Robust Regression
Employing the Greatest Deviation Correlation Coefficient"
- HuaiQing (Tom) Sheng, Spring 2002, "Estimation in Generalized Models and
Nonlinear Models with the Greatest Deviation Correlation Coefficient" (This
includes times series models)
- Brian Steele, Spring 1995,(with David Patterson as co-advisor) ""Estimation
in Generalized Linear Mixed Models via EM Algorithm"
Below are the names of students or people who have helped keep my research
alive by either being a master's student, participating in a seminar, being
on a grant, or just being there for a discussion and helping with the reseach.
- Gerald Schumann (1978, initial development of ideas)
- Huey-Fen Shiue (1984, general understanding and development of nonparametric
methods)
- Jian-Jian Ren (1987, general development of nonparametric methods)
- Young Hoon Park (1989, general use of nonparametric methods)
- Don Gilmore (1991, graphical calculation of GD and three other rank correlations)
- Li-Chiou Lee (1991, location estimation with GD)
- John Bruder (1991, location estimation with GD)
- Mike Thiel (1991, location estimation with GD)
- Hongzhe Li (1992, a study of GD methods in multiple linear regression)
- Bill Stoner (1992, GD location estimator)
- Ming Yin (1993, development of C program for multiple regression)
- Wexin Zhou (1993, S-Plus programs for GD)
- Josef Crepeau (1993, small sample GD location estimator)
- HuaiQing Sheng or Tom (1994, GD in general linear models and non-linear
regression)
- Jacquelynn Miller (1995, multiple regression development with GD)
- Christopher Vahl (1995, Studying the robustness of GD or rank based CC statistics)
- David Goldsmith (1997, S-Plus work on the continuous absolute correlation
coefficient)
- Jeff Stratton (1999,Testing for normality using a correlation type statistic)
- Yueju Li (1999, Comparison of tests of fit between Pearson's CC and two
Kolmogorov-Smirnov type tests, Lilliefors)
- Jiang Qun (1999. correlation methods on one-sample data to estimate location
and scale parameters)
- John Gee (2002,Using correlation coefficients and order statistics to estimate
sigma)
- Joe Peterson (2004, Quantile estimation of the Pareto distribution with
correlation coefficients)
- Isaac Grenfell (2004, The use GD and MAD correlations as estimators in spatial
statistics)
- Joyce Schlieter (20 years of support, preparation of workshop for ASA meeting)
- Carol Ulsafer (6 years of support, order norm development, editorial assistance)
- Merle Manis (lifetime listening to wacky ideas in statistics while being
in algebra)
- Charles Bryan (lifetime moral support, and small grant from his brother
John, $30,000)
Links