Correlational and Causational Data
One of the most important lessons from Consumer Behavior is correlative and causative data, and the understanding thereof. Understanding it is also handy when you are determining if someone is lying to you with statistics. Unfortunately neither the blog nor the book can go into enough depth on this topic to substitute for a class: but some research on your own time in both statistics and correlative data should prove very interesting.
Still, I can get into the basics with the blog and I am debating if this topic should (in slightly more depth) be covered in the book.
So: What is a correlation?
Correlation is a statistic way of showing that two variables are related to eachother. The lower the alpha value of a correlation (IE: Alpha= .05) the more "accurate" the correlation is. Alpha represents how CONFIDENT you are that your test is true. As a general rule, any alpha value over .10 (10% confidence) is pretty much useless in statistics.
There are tests, such as the Pearson Correlation that can help determine if something is statistically correlative, and I strongly urge anyone who doesn't remember or never learned about correlative tests to go do so.
However, what correlation is and how to calculate it isn't my subject. What is important is understanding a fundamental rule of correlation so that a massive mistake isn't made.
Even if you don't do a statistic test, it may be useful to think about how things correlate in your life. Does increasing your time spent marketing your game correlate with an increase in sales? Does changing the price correlate? And so on.
So what am I getting at? It is best exemplified two stories:
Anyone who has watched American football may have heard the following phrase:
"The team who runs the ball the most is almost always the winner." This statement is TRUE when you are talking about correlation. However, it is an UNTRUE statement. The reason is that correlation DOES NOT imply CAUSATION.
In this case the speaker says "X causes Y" because the team who runs more is more likely to win. However, the reality of this situation is that Y causes X. Because a team is winning they RUN MORE! For those who don't grasp the rules of football, a team that is winning can run more which eats up more of the clock, thus giving their opponents less time to catch up.
The alternative scenario is this: According to a famous research paper the following is true: "The shorter girls skirts get, the better the stock market performs." This statement, according to correlative data, is TRUE, however, anyone who stops and thinks about it knows that it is FALSE. The reason is, once again, CORRELATION does not show what CAUSES the event.
In this case, the third possible scenario is true: A factor that has not been considered (Z) is impacting both X and Y. In the above statement the Z variable is TIME. As time has gone on the stock market has gone up and girls skirts have gotten shorter. X and Y are unrelated to eachother but highly related to Z.
So what does this mean? The odds are, in games, you will rarely encounter a Y causes X scenario (though you should always keep it in mind, for those are the source of the largest mistakes!). However, whenever you say "Man, X caused Y" based on data that shows that as X changes Y changes (IE: As price goes up, profits go up), be very aware of the use of the word "Caused." It may be that a third variable (Z) is impacting your price / profit ratio: Such as a holiday, a review you didn't see, ect.
It could lead you to mistakenly leave your price high and hurt your profits until you realize your mistake!
Before this posts gets any longer than it will: Real quickly the only way to gain CAUSIAL data is to design a series of questions to rule out possible Z variables. This is why people conduct surveys and data mining operations. In general, determining causial data is very hard, very time consuming, and way beyond the scope of indie game development. Instead, rather than worry about statistically PROVING some event CAUSED some other event, just be aware of correlations and do your best to NOT assume that your correlation is 100% proof of CAUSATION.
Sorry for the ugly technical statistic post. However, it is important to me if anyone would find this remotely interesting to read about in a more clear fashion with actual examples (and maybe even an example of how to do a correlation test). If so, let me know, if not... We will assume the Blog covers it enough :)
Conclusion: Because data is connected (correlated) does not mean that it shows that your assumed cause for the change is true.
Still, I can get into the basics with the blog and I am debating if this topic should (in slightly more depth) be covered in the book.
So: What is a correlation?
Correlation is a statistic way of showing that two variables are related to eachother. The lower the alpha value of a correlation (IE: Alpha= .05) the more "accurate" the correlation is. Alpha represents how CONFIDENT you are that your test is true. As a general rule, any alpha value over .10 (10% confidence) is pretty much useless in statistics.
There are tests, such as the Pearson Correlation that can help determine if something is statistically correlative, and I strongly urge anyone who doesn't remember or never learned about correlative tests to go do so.
However, what correlation is and how to calculate it isn't my subject. What is important is understanding a fundamental rule of correlation so that a massive mistake isn't made.
Even if you don't do a statistic test, it may be useful to think about how things correlate in your life. Does increasing your time spent marketing your game correlate with an increase in sales? Does changing the price correlate? And so on.
So what am I getting at? It is best exemplified two stories:
Anyone who has watched American football may have heard the following phrase:
"The team who runs the ball the most is almost always the winner." This statement is TRUE when you are talking about correlation. However, it is an UNTRUE statement. The reason is that correlation DOES NOT imply CAUSATION.
In this case the speaker says "X causes Y" because the team who runs more is more likely to win. However, the reality of this situation is that Y causes X. Because a team is winning they RUN MORE! For those who don't grasp the rules of football, a team that is winning can run more which eats up more of the clock, thus giving their opponents less time to catch up.
The alternative scenario is this: According to a famous research paper the following is true: "The shorter girls skirts get, the better the stock market performs." This statement, according to correlative data, is TRUE, however, anyone who stops and thinks about it knows that it is FALSE. The reason is, once again, CORRELATION does not show what CAUSES the event.
In this case, the third possible scenario is true: A factor that has not been considered (Z) is impacting both X and Y. In the above statement the Z variable is TIME. As time has gone on the stock market has gone up and girls skirts have gotten shorter. X and Y are unrelated to eachother but highly related to Z.
So what does this mean? The odds are, in games, you will rarely encounter a Y causes X scenario (though you should always keep it in mind, for those are the source of the largest mistakes!). However, whenever you say "Man, X caused Y" based on data that shows that as X changes Y changes (IE: As price goes up, profits go up), be very aware of the use of the word "Caused." It may be that a third variable (Z) is impacting your price / profit ratio: Such as a holiday, a review you didn't see, ect.
It could lead you to mistakenly leave your price high and hurt your profits until you realize your mistake!
Before this posts gets any longer than it will: Real quickly the only way to gain CAUSIAL data is to design a series of questions to rule out possible Z variables. This is why people conduct surveys and data mining operations. In general, determining causial data is very hard, very time consuming, and way beyond the scope of indie game development. Instead, rather than worry about statistically PROVING some event CAUSED some other event, just be aware of correlations and do your best to NOT assume that your correlation is 100% proof of CAUSATION.
Sorry for the ugly technical statistic post. However, it is important to me if anyone would find this remotely interesting to read about in a more clear fashion with actual examples (and maybe even an example of how to do a correlation test). If so, let me know, if not... We will assume the Blog covers it enough :)
Conclusion: Because data is connected (correlated) does not mean that it shows that your assumed cause for the change is true.
2 Comments:
"with this, therefore because of this"
http://en.wikipedia.org/wiki/Correlation_implies_causation_%28logical_fallacy%29
Great link!
-Joe
Post a Comment
<< Home