Becoming somewhat cynical about the headlong rush toward learning analytics I’m commencing an exploration of the problems associated with big data, data science and some of the other areas which form the foundation for learning analytics. The following is an ad hoc collection of some initial resources I’ve found and need to engage with.
Feel free to suggest some more.
The Texas sharpshooter fallacy
This particular fallacy gets a guernsey mainly because of the impact of its metaphoric title. From the Wikipedia page
The Texas sharpshooter fallacy often arises when a person has a large amount of data at their disposal, but only focuses on a small subset of that data. Random chance may give all the elements in that subset some kind of common property (or pair of common properties, when arguing for correlation). If the person fails to account for the likelihood of finding some subset in the large data with some common property strictly by chance alone, that person is likely committing a Texas Sharpshooter fallacy.
Critical questions for big data
Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15(5), 662–679.
The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and other scholars are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article, we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon mythology that provokes extensive utopian and dystopian rhetoric.
The headings give a good idea of the provocations:
- Big data changes the definition of knowledge.
- Claims to objectivity and accuracy are misleading.
- Bigger data are not always better data.
- Taken out of context, Big data loses its meaning.
- Just because it is accessible does not make it ethical.
- Limited access to big data creates new digital divides.
Effects of big data analytics on organisations’ value creation
Mouthaan, N. (2012). Effects of big data analytics on organizations’ value creation. University of Amsterdam.
A Masters’ thesis, that amongst other things is
arguing that big data analytics might create value in two ways: by improving transaction efficiency and by supporting innovation, leading to new or improved products and services
this study also shows that big data analytics is indeed a hype created by
both potential users and suppliers and that many organizations are still experimenting with its implications as it is a new and relatively unexplored topic, both in scientiﬁc and organizational ﬁelds.
The promise and peril of big data
Bollier, D., & Firestone, C. (2010). The promise and peril of big data. Washington DC: The Aspen Institute.
Some good discussion of issues reported by a rappoteur, issues included.
- How to make sense of big data?
- Data correlation or scientific methods – Chris Anderson’s “Data deluge makes the scientific method obsolete” and responses. e.g. “MY TiVO thinks I’m gay”, gaming, the advantage of theory/deduction etc.
- How should theories be crafted in the an age of big data?
- Visualisation as a sense-making tool.
- Bias-free interpretation of big data.
Cleaning data requires decisions about what to ignore. Problem increased when data comes from different sources. Quote “One man’s noise is another man’s data”
- Is more actually less?
Does it yield new insights or create confusion and false confidence. “Big data is driven more by storage capabilities than by superior ways to ascertain useful knowledge”.
- Correlations, causality and strategic decision making.
- Business and social implications of big data
- Social perils posed by big data
- How should big data abuses be addressed?
Research ethics in emerging forms of online learning
Esposito, A. (2012). Research ethics in emerging forms of online learning: issues arising from a hypothetical study on a MOOC. Electronic Journal of e-Learning, 10(3), 315–325.
Will hopefully give some initial insights into the thorny issue of ethics.
Data science and prediction
Dhar, V. (2012). Data Science and Prediction. Available at SSRN. New York City.
Appears to be slightly more “boosterish” than some of the other papers.
The world’s data is growing more than 40% annually. Coupled with exponentially growing computing horsepower, this provides us with unprecedented basis for ‘learning’ useful things from the data through statistical induction without material human intervention and acting on them. Philosophers have long debated the merits and demerits of induction as a scientific method, the latter being that conclusions are not guaranteed to be certain and that multiple and numerous models can be conjured to explain the observed data. I propose that ‘big data’ brings a new and important perspective to these problems in that it greatly ameliorates historical concerns about induction, especially if our primary objective is prediction as opposed to causal model identification. Equally significantly, it propels us into an era of automated decision making, where computers will make the bulk of decisions because it is infeasible or more costly for humans to do so. In this paper, I describe how scale, integration and most importantly, prediction will be distinguishing hallmarks in this coming era of Data Science.’ In this brief monograph, I define this newly emerging field from business and research perspectives.
Codes and codings in crisis: Signification, performativity and excess
Mackenzie, A., & Vurdubakis, T. (2011). Codes and Codings in Crisis: Signification, Performativity and Excess. Theory, Culture & Society, 28(6), 3–23.