Children's Mercy Hospital
Find a Doctor | Press Room | Careers | Directions & Locations

About Us | Contact Us | Giving to Children's Mercy
For Patients and Families   Your Child's Health   Clinical Services   |   For Health Care Professionals   Medical Education   Medical Research

Category: Unusual data. These pages describe data analysis that does not fit easily into the more traditional categories of data analysis. If I get a sufficient number of pages on the same general topic, I will create a new category. Articles are arranged by date with the most recent entries at the top. You can find the theme and closely related categories and other resources at the bottom of this page.

Stats: Bootstrap estimates of the standard error (June 20, 2008). A regular correspondent (JU) on the MEDSTATS email discussion group asked about using the bootstrap to estimate the standard error of the mean in a simple case with 9 data values. He wanted to know why the commonly used approach in the bootstrap community was to use n instead of n-1 in the variance denominator. It seemed to him that n-1 would produce an unbiased estimate of the standard error and wanted to know if that was true just in this special case or true in general. He quoted from the book by Efron and Tibshirani that they felt that for most purposes either method would work well.

Stats: A brief overview of instrumental variables (April 14, 2008). People will often ask me questions that are outside my area of expertise. Yes, I know you're shocked to hear this, but there are lots of areas of statistics where I only have a vague understanding. One of these questions was about instrumental variables. I could only offer a vague explanation, but I hope that is better than no explanation at all.

Stats: Parametric tests for a ratio (October 27, 2006). Dear Professor Mean, I computed a variable, Y3, which is the ratio of two other variables, Y1 and Y2. Can I use a parametric test on this ratio?

Stats: The problem with ranking ordinal scales (June 29, 2006). When I was young and naive, I thought that anytime you encountered ordinal data, it would make the most sense to use a test statistic based on ranks, such as the Mann-Whitney-Wilcoxon test or the Kruskal-Wallis test. Unfortunately, the ranks can sometime distort the true nature of an ordinal scale. I thought that I had provided an example of how ranks can distort things, but I could not find it this morning when someone asked a question relating to ordinal scales. So here is the example again.

Stats: Randomization tests for paired data (January 24, 2006). The randomization test offers a lot of flexibility for analyzing data in ways well beyond what traditional tests might offer. Here's a simple example from the Chance Data Sets web page.

Stats: Outcomes research (November 24, 2004). Someone asked me for a simple definition of outcomes research. I hemmed and hawed and could not come up with a good definition. It turns out that the Agency for Healthcare Research and Quality has a nice definition.

Stats: Report cards (August 27, 2004). I'm working on a project looking at some outcomes that might eventually become part of a report card or benchmarking system. This is an area fraught with controversy and it needs to be handled very carefully. Here are a few references that I have accumulated that address some of these issues.

Stats: Randomization test (July 14, 2004). I received some data from a project where the outcome measure was the degree of improvement after a treatment, with values of -1 (slight decline), 0 (no change), 1 (slight improvement), 2 (moderate improvement), and 3 (large improvement). The two treatments had quite different results. The old therapy had eight patients, three of whom showed a slight decline and five of whom showed no change. Among the eight patients in the new therapy, one showed no change, three showed a slight improvement, six showed moderate improvement, and two showed a large improvement. There are several approaches that you could try with this data. Even though I did not have a problem with computing averages, I was a bit nervous about the t-test. This data is clearly non-normal, and with the sample sizes as small as they are, I'd be worried about whether the t-test would be valid. An interesting alternative is the randomization test.

Stats: McNemar's Test (June 17, 2004). I received an email asking how to test two correlated proportions to see if one proportion is significantly larger than another. This is a classic application of McNemar's test.

Stats: Analyzing percentage data (May 24, 2004). I received one of those difficult to answer questions: how do I analyze my data when the outcome variable is a percentage. That depends a lot on the context of the problem. The first thing to look at is whether the percentage involves counts of some type, and if so, do you know the numerator and denominator. Instead, the percentage might be the ratio of two continuous measurements.

Stats: Parametric versus nonparametric tests (July 30, 2001). Dear Professor Mean: When should I use a parametric test versus a non-parametric test?

Stats: Outliers (January 28, 2000). Dear Professor Mean: I have recently conducted a survey of attitudes toward research from a professional group. There are some outliers (+/- 3SD) that I would eliminate , but others conducting the research with me feel that this might be a minority view, and should not be eliminate from the dataset......any views or references that I should read to confirm my view, or theirs?

Stats: Composite scores (January 27, 2000). Dear Professor Mean: I have developed a method to distinguish among several products that we need to buy so our company can make a good purchasing decision. I created a composite score which is a weighted average of several different indicators of quality. I want to use statistics to determine when two different products have significantly different composite scores.

Stats: Mixture models (January 27, 2000). Dear Professor Mean: I have read a journal article where the authors used a mixture model . What is this?

Stats: Physician Performance Data (January 27, 2000). Dear Professor Mean: Producing statistics of physician performance or group performance or whatever seems to be one of the great growth industries in medicine. Graphs of performance in just about anything seem to be produced - usually with something that looks at first glance like a normal distribution (and almost never with any statistical addenda). But I would like to know whether we can use them sensibly as anything other than pictures? In particular when I am one of the subjects of the analysis how do I interpret my own performance?

Stats: Splines (January 27, 2000). Dear Professor Mean: Can you send me a basic definition of splines?

Stats: Bootstrap (January 26, 2000). Dear Professor Mean: I've heard a lot about how the bootstrap is going to revolutionize statistics. How does the bootstrap work?

Stats: Injury index creation (September 23, 1999). Dear Professor Mean: I want to create an injury index that describes the severity of an injury to a child. This would include information about the type of injury, the location of the injury, the age of the child, etc. What's the best way to do this?

Stats: Chi-square (September 3, 1999). Dear Professor Mean: Can the Chi-squared test be used for anything besides categorical data?

Stats: Page's test (September 3, 1999). Dear Professor Mean: I have recently come across a statistical test (Page's L test), with which I am unfamiliar. Does anyone either have information about this test or know where I might find information about it?

Theme and closely related categories:

Other resources:

  • 100 Statistical Tests Description: Gopal Kanji lists specific details of many statistical tests, some quite obscure. This book is for students who want more mathematical details.
  • Comparison of hospital episode statistics and central cardiac audit database in public reporting of congenital heart surgery mortality. Description: One of the more lively debates in medicine today is the use of report cards to summarize performance of hospitals and/or individual physicians. This paper takes individual statistics compiled by hospitals (hospital episode statistics) and compares them to a centralized database. There are large discrepancies between the two, and the authors suggest that individual hospitals should spend the effort to more rigorously collect and validate their data.
  • Correspondence Analysis Excerpt: This paper is an introduction to correspondence analysis, a statistical method allowing to analyze and describe graphically and synthetically big contingency tables, that is tables in which you find at the intersection of a row and a column the number of individuals who share the characteristic of the row and that of the column. Description: This website provides a good general overview of what correspondence analysis is and how to use it.
  • Ed Rigdon's SEM FAQ. Description: This is the first place you should look if you have questions about Structural Equation Models.
  • Instrumental variable Excerpt: In statistics and econometrics, an instrumental variable (IV, or instrument) can be used to produce a consistent estimator of a parameter when the explanatory variables (covariates) are correlated with the error terms. Such correlation can be caused by endogeneity, by omitted covariates, or by measurement errors in the covariates. In this situation, ordinary linear regression produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation, that is correlated with the suspect explanatory variable, and that is uncorrelated with the error term.
  • Instrumental Variable Estimation Excerpt: One way of identifying models that cannot be estimated by using multiple regression is through the use of instrumental variables. For path analysis, the disturbance must not be correlated with each causal variable. There are three reasons why such a correlation might exist: * Spuriousness (Third Variable Causation): A variable causes both the endogenous variable and one its causal variables and that variable is not included in the model. * Reverse Causation (Feedback Model): The endogenous variable causes, either directly or indirectly, one of its causes. * Measurement Error: There is measurement error in a causal variable.
  • Overview of Computer Intensive Statistical Inference Procedures. Description: This page provides a nice overview of the permutation test, randomization test, Monte Carlo estimation, bootstrapping, the jackknife, and Markov Chain Monte Carlo methods.
  • The use of bootstrap methods for analysing Health-Related Quality of Life outcomes (particularly the SF-36). The article provids an illustrative example of how to use the bootstrap method.
  • The Zoo of Loglinear Analysis Excerpt: "Loglinear Analysis is a multivariate extension of Chi Square. You use Loglinear when you have more than two qualitative variables. Chi Square is insufficient when you have more than two qualitative variables because it only tests the independence of the variables. When you have more than two, it cannot detect the varying associations and interactions between the variables. Loglinear is a goodness-of-fit test that allows you to test all the effects (the main effects, the association effects and the interaction effects) at the same time."

[Return to full topic list] [Read current weblog entries]

This webpage was written by Steve Simon on 2007-06-20, edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page.