Measurable symmetry accounts for less than 1% of the variance in the attractiveness of women's faces and less than 3% of the variance of the attractiveness of men's faces. Before we went and checked, we had both believed the widely circulated story that symmetry was a big deal in attractiveness.
So how did it happen that so many of us believed that symmetry was a big deal in attractiveness judgments? There were studies that said so, obviously. But when we looked at the details, it turned out that the initial studies showing big effects typically involved samples of less than 20 faces each, which is irresponsibly small for correlational studies with open-ended variables. Once the bigger samples starting showing up, the effect basically disappeared for women and was shown to be pretty low for men. But no one believed the later, bigger studies, even most of their own authors -- pretty much everyone in my business still thinks that symmetry is a big deal in attractiveness.
So, the first lesson I learned: Small samples are dangerous. They're so dangerous that we need to force larger samples. How? My solution has been to ditch the old p<.05 significance standard. Right now, most social scientists allow themselves to call something a real finding if there is less than a 1 in 20 chance that it's from random noise. It's a standard that arose before computers, in a day when scientists ran their numbers by hand and so just didn't run very many numbers. These days, all you have to do to basically ensure at least one significant finding is to measure 7 variables -- the resulting correlation table has 21 correlations, and, just at random, you'd expect at least one to be p<.05. Look at my dissertation or any of my published work (two new ones are coming out soon), and you'll find that I'm using p<.005 -- a 1 in 200 standard. I don't think anyone has noticed -- I haven't had any reviewers comment on it at all. But what the .005 standard does is force larger samples, leading to more stable estimates, leading to more replicable results. So that was my first lesson from the health-attractiveness paper -- false positives are easy to get and powerfully resilient to correction, so we need better studies in the first place, which means bigger samples, in which case the only way to police it is using better significance criteria.
HT: Robin Hanson at Overcoming Bias