Pages

4/20/15

A Trick For Higher SAT scores? Unfortunately no.

Wouldn’t it be cool if there was a simple trick to score better on college entrance exams like the SAT and other tests?




There is a reputable claim that such a trick exists. Unfortunately, the trick does not appear to be real.


This is the story of an academic paper where I am a co-author with possible lessons for life both inside and outside the Academy.

[related post: Top 5 mistakes made by college students.]
[related post: How Technology will change education]


In the spring of 2012, I was reading Nobel Laureate Daniel Kahneman’s book, Thinking, Fast and Slow.




Professor Kahneman discussed an intriguing finding that people score higher on a test if the questions are hard to read. The particular test used in the study is the CRT or cognitive reflection task invented by Shane Frederick of Yale. The CRT itself is interesting, but what Professor Kahneman wrote was amazing to me,


“90% of the students who saw the CRT in normal font made at least one mistake in the test, but the proportion dropped to 35% when the font was barely legible. You read this correctly: performance was better with the bad font.”


I thought this was so cool. The idea is simple, powerful, and easy to grasp. An oyster makes a pearl by reacting to the irritation of a grain of sand. Body builders become huge by lifting more weight. Can we kick our brains into a higher gear, by making the problem harder?




Malcolm Gladwell also thought the result was cool. Here is his description his book, David and Goliath:


The CRT is really hard. But here’s the strange thing. Do you know the easiest way to raise people’s scores on the test? Make it just a little bit harder. The psychologists Adam Alter and Daniel Oppenheimer tried this a few years ago with a group of undergraduates at Princeton University. First they gave the CRT the normal way, and the students averaged 1.9 correct answers out of three. That’s pretty good, though it is well short of the 2.18 that MIT students averaged. Then Alter and Oppenheimer printed out the test questions in a font that was really hard to read … The average score this time around? 2.45. Suddenly, the students were doing much better than their counterparts at MIT.




As I read Professor Kahneman’s description, I looked at the clock and realized I was teaching a class in about an hour, and the class topic for the day was related to this study. I immediately created two versions of the CRT and had my students take the test - half with an easy to read presentation and half with a hard to read version.


(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.
How much does the ball cost? _____ cents
(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.
How much does the ball cost? _____ cents (in my experiment, I used Haettenschweiler - I do not know how to get blogger to display Haettenschweiler).


Within 3 hours of reading about the idea in Professor Kahneman’s book, I had my own data in the form of the scores from 20 students. Unlike the study described by Professor Kahneman, however, my students did not perform any better statistically with the hard-to-read version. I emailed Shane Frederick at Yale with my story and data, and he responded that he was doing further research on the topic.




Roughly 3 years later, Andrew Meyer, Shane Frederick, and 8 other authors (including me) have published a paper that argues the hard-to-read presentation does not lead to higher performance.


The original paper reached its conclusions based on the test scores of 40 people. In our paper, we analyze a total of over 7,000 people by looking at the original study and 16 additional studies. Our summary:


Easy-to-read average score: 1.43/3  (17 studies, 3,657 people)
   Hard-to-read average score: 1.42/3  (17 studies, 3,710 people)


Malcolm Gladwell wrote, “Do you know the easiest way to raise people’s scores on the test? Make it just a little bit harder.”


The data suggest that Malcolm Gladwell’s statement is false. Here is the key figure from our paper with my annotations in red:



I take three lessons from this story.


  1. Beware simple stories.


“The price of metaphor is eternal vigilance.”  Richard Lewontin attributes this quote to Arturo Rosenblueth and Norbert Wiener.


The story told by Professor Kahneman and by Malcolm Gladwell is very good. In most cases, however, reality is messier than the summary story.
  1. Ideas have considerable “Meme-mentum”


And yet it moves,” This quote is attributed to Galileo when forced to retract his statement that the earth moves around the sun.


The message is that It takes a long time to change conventional wisdom. The earth stayed at the center of the universe for many people for decades and even centuries after Copernicus.




I expect that the false story as presented by Professor Kahneman and Malcolm Gladwell will persist for decades. Millions of people have read these false accounts. The message is simple, powerful, and important. Thus, even though the message is wrong, I expect it will have considerable momentum (or meme-mentum to paraphrase Richard Dawkins).


One of my favorite examples of meme-mentum concerns stomach ulcers. Barry Marshall and Robin Warren faced skepticism to their view that many stomach ulcers are caused by bacteria (Helicobacter pylori). Professor Marshall describes the scientific response to his idea as ridicule; in response he gave himself an ulcer drinking the bacteria. Marshall gives a personal account of his self-infection in his Nobel Prize acceptance video (the self-infection portion starts at around 25:00).




  1. We can measure the rate of learning.  


We can measure the rate of learning. Google scholar counts the number of times a paper is cited by other papers. I believe that well-informed scholars who cite the original paper ought to cite the subsequent papers. We can watch in real-time to see if that is true.



paper
comment
citations as of April 20, 2015
citations as of today
Alter et al. (2007). "Overcoming intuition: metacognitive difficulty activates analytic reasoning." Journal of Experimental Psychology: General 136(4): 569.
Original paper showing hard-to-read leads to higher scores


344
Thompson et al. (2013). "The role of answer fluency and perceptual fluency as metacognitive cues for initiating analytic thinking." Cognition 128(2): 237-251.
Paper contradicts Alter at. al by reporting no hard-to-read effect.
38
Meyer et al. (2015). "Disfluent fonts don’t help people solve math problems." Journal of Experimental Psychology: General 144(2): e16.
Our paper summarizing the original study and 16 others.
0  (this “should” increase at least as fast as citations for Alter et. al, 2007)

21 comments:

  1. I love this story -- thanks Terry. Here is another example of folklore that rests on half-truths (Yiddish saying: "A half truth is a whole lie."). Woody Allen is widely quoted as having said: "80% of life is just showing up." And indeed that is what he actually said. But if you key in the phrase "% of life is just showing up" into Google, you get the following percentages: 99%, 95%, 90%, 87%, 85%, 80%, 50%. I am not surprised that Gladwell would say something like that as he is just a popularizer, but Kahneman is usually better than that. Good for you for debunking that myth.

    ReplyDelete
    Replies
    1. Thank you for this comment. Professor Kahneman is blameless. The new data did not exist when he wrote the first edition. Terry

      Delete
    2. I disagree with your statement holding Professor Kahneman blameless for two reasons. First, his sample size was too small; second, he's smart enough to know that you should never use a single evaluation for proof of anything.

      Delete
    3. Yes, the sample was small. But when Kahneman wrote his book, there were three experiments showing a disfluent font benefit to reasoning: this one, one on syllogistic reasoning, and one on the "Moses" problem.

      All three had small samples. And all three have since failed to replicate. But the point is that Kahneman wasn't leaping from a SINGLE evaluation. He was leaping from three, and just citing one.

      Delete
    4. I agree with AnonymousKahnemanDefender in the sense the Professor Kahneman's conclusions are based a wide-variety of studies.

      Delete
    5. Could Kahneman be accused of Wanting To Believe because it conveniently fit his theories? Many of the examples he uses look vulnerable to this sort of examination. He's said himself that psychologists need to get their act together: http://www.nature.com/news/nobel-laureate-challenges-psychologists-to-clean-up-their-act-1.11535

      Delete
    6. A belated response to Jeremy Kahn. Yes, Kahneman may have been less critical because the studies confirmed his belief. I have definitely done this, and expect that it is common. We look more carefully at data that are inconsistent with our prior beliefs.

      Delete
  2. Another example of the replication crisis in psychology---fantastic findings are proving difficult to replicate. To me this one was bound to be false because we have known overwhelmingly across many, many tasks that stimulus difficulty correlates with decremented performance. This more straightforward finding holds in psychophysics and psychometrics.

    ReplyDelete
    Replies
    1. But the new results show that stimulus difficulty has no effect on performance, not a decremental effect. It what sense, therefore, could you have "known" this to be false.

      Delete
  3. Great article. Sadly priority is rarely given to replication, and funding, even less so. I like how you responded above as well. Rather than lambast someone you point out the need for more data/replication. Do you not think this type of example is at least partly due to the infatuation with p<0.05? Lets say the original study p value was 0.05 it had a 5% chance of being due to chance, therefore it positively required replication IMHO. This exemplifies a problem with reporting science results via the current publishing paradigm and the absence of p>0.05 results in the literature means many useful thing go unreported.

    ReplyDelete
    Replies
    1. Hi Alan,

      Yes. There is nothing magic about 0.05 other than generally used for publication. Depending on the context one might act on a result with a p-value > 0.05. For example, someone who is sick might not want to wait to try an experimental drug.

      Sincerely,

      Terry

      Delete
  4. A deeper problem is highlighted in this story: We, as a society, are not skilled at generating statistically significant findings or interpreting experimental results appropriately. Results from a study population of 40 individuals, such as the CRT mentioned here, only tell us that an outcome **can** happen. They do not imply that the same result is **likely** to happen. This mistake is epidemic in learning and education research as well as the popular press. We experiment with a small number of students, find something that works for them and suddenly mandate that treatment for every school child in the English-speaking world. In education, we should be using small population studies to inform ourselves about how to personalize teaching, not how to generalize our practices. These small-scale results challenge us to investigate just who -- what kind of student -- will respond as our study subjects did, not to assume that our small sample was representative of the whole population. Even more troubling is to contemplate how this tendency toward flawed generalization from small studies impacts democratic decision-making. It is "too bad" if scholars fail to update their citations to reflect emerging revisions in psychological theory. It will be catastrophic if voters and legislators base the laws that control the many on effects found only in a few.

    ReplyDelete
    Replies
    1. Great insights into how easily we fool ourselves and then make bad rules based on our misperceptions. I can see it now--new legislation that all standardized tests be written in sloppy cursive!

      Delete
  5. “And yet it moves.”

    Not the best example, since we now know from the modern theory of gravity (general relativity, whose centennial is occurring pretty much as we speak) that it's perfectly correct to speak of the earth lying motionless whilst the entire cosmos wheels about it.

    ReplyDelete
    Replies
    1. A fair point. Michael, what do you think about the ulcer story?

      Delete
  6. I use this example in a talk about replication and open science. Just out of curiosity: Did you (or any of the 16 other studies) have problems at first to publish your non-replication? Do you think the current replication debate made it easier to publish your results?

    ReplyDelete
    Replies
    1. Hello Felix,

      I did not attempt to publish my non-replication independently from the others. I do not know if others tried. I agree with you that it is harder to publish results that show no difference between control and treatment groups.

      Terry

      Delete
  7. Thank you Terry. I find the Kahneman and Gladwell writings entertaining but too glib.

    ReplyDelete
  8. Dear AmOK,

    Thank you for your comment. Popular writing is challenging, and I am sure I have failed as well to remain correct while trying to get out of the academic weeds.

    Terry

    ReplyDelete
  9. I apologize for not digging into the research myself, but I am wondering if instead of manipulating font anyone tried giving the test where the treatment group received a statement at the beginning like, "Here are three questions that appear simple, but most people get them wrong--can you figure out the real answers?" If there is any truth to "System 1, System 2" thought processing, then surely something like that would activate System 2.

    ReplyDelete
    Replies
    1. That kind of warning does increase CRT scores. But only a little.

      Delete