The Texas sharpshooter fallacy and other issues for learning analytics

Becoming somewhat cynical about the headlong rush toward learning analytics I’m commencing an exploration of the problems associated with big data, data science and some of the other areas which form the foundation for learning analytics. The following is an ad hoc collection of some initial resources I’ve found and need to engage with.

Feel free to suggest some more.

The Texas sharpshooter fallacy

This particular fallacy gets a guernsey mainly because of the impact of its metaphoric title. From the Wikipedia page

The Texas sharpshooter fallacy often arises when a person has a large amount of data at their disposal, but only focuses on a small subset of that data. Random chance may give all the elements in that subset some kind of common property (or pair of common properties, when arguing for correlation). If the person fails to account for the likelihood of finding some subset in the large data with some common property strictly by chance alone, that person is likely committing a Texas Sharpshooter fallacy.

Critical questions for big data

Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15(5), 662–679.

Abstract

The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and other scholars are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article, we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon mythology that provokes extensive utopian and dystopian rhetoric.

The headings give a good idea of the provocations:

  • Big data changes the definition of knowledge.
  • Claims to objectivity and accuracy are misleading.
  • Bigger data are not always better data.
  • Taken out of context, Big data loses its meaning.
  • Just because it is accessible does not make it ethical.
  • Limited access to big data creates new digital divides.

Effects of big data analytics on organisations’ value creation

Mouthaan, N. (2012). Effects of big data analytics on organizations’ value creation. University of Amsterdam.

A Masters’ thesis, that amongst other things is

arguing that big data analytics might create value in two ways: by improving transaction efficiency and by supporting innovation, leading to new or improved products and services

and

this study also shows that big data analytics is indeed a hype created by
both potential users and suppliers and that many organizations are still experimenting with its implications as it is a new and relatively unexplored topic, both in scientific and organizational fields.

The promise and peril of big data

Bollier, D., & Firestone, C. (2010). The promise and peril of big data. Washington DC: The Aspen Institute.

Some good discussion of issues reported by a rappoteur, issues included.

  • How to make sense of big data?

    • Data correlation or scientific methods – Chris Anderson’s “Data deluge makes the scientific method obsolete” and responses. e.g. “MY TiVO thinks I’m gay”, gaming, the advantage of theory/deduction etc.
    • How should theories be crafted in the an age of big data?
    • Visualisation as a sense-making tool.
    • Bias-free interpretation of big data.

      Cleaning data requires decisions about what to ignore. Problem increased when data comes from different sources. Quote “One man’s noise is another man’s data”

    • Is more actually less?
      Does it yield new insights or create confusion and false confidence. “Big data is driven more by storage capabilities than by superior ways to ascertain useful knowledge”.
    • Correlations, causality and strategic decision making.
  • Business and social implications of big data
    • Social perils posed by big data
  • How should big data abuses be addressed?
  • Research ethics in emerging forms of online learning

    Esposito, A. (2012). Research ethics in emerging forms of online learning: issues arising from a hypothetical study on a MOOC. Electronic Journal of e-Learning, 10(3), 315–325.

    Will hopefully give some initial insights into the thorny issue of ethics.

    Data science and prediction

    Dhar, V. (2012). Data Science and Prediction. Available at SSRN. New York City.

    Appears to be slightly more “boosterish” than some of the other papers.

    Abstract

    The world’s data is growing more than 40% annually. Coupled with exponentially growing computing horsepower, this provides us with unprecedented basis for ‘learning’ useful things from the data through statistical induction without material human intervention and acting on them. Philosophers have long debated the merits and demerits of induction as a scientific method, the latter being that conclusions are not guaranteed to be certain and that multiple and numerous models can be conjured to explain the observed data. I propose that ‘big data’ brings a new and important perspective to these problems in that it greatly ameliorates historical concerns about induction, especially if our primary objective is prediction as opposed to causal model identification. Equally significantly, it propels us into an era of automated decision making, where computers will make the bulk of decisions because it is infeasible or more costly for humans to do so. In this paper, I describe how scale, integration and most importantly, prediction will be distinguishing hallmarks in this coming era of Data Science.’ In this brief monograph, I define this newly emerging field from business and research perspectives.

    Codes and codings in crisis: Signification, performativity and excess

    Mackenzie, A., & Vurdubakis, T. (2011). Codes and Codings in Crisis: Signification, Performativity and Excess. Theory, Culture & Society, 28(6), 3–23.

5 thoughts on “The Texas sharpshooter fallacy and other issues for learning analytics

  1. Nicola

    I am predicting that ‘big data’ will decline as the stampede continues and the value of personal data and how it can be perceived as meaningful – currently highly marketable – goes down when people finally realise that we are not so different as humans, there are only so many ways we can be sliced and diced, including economically; also some aspects that cannot be sliced and diced because of their complexity. I don’t know how quickly though

    1. G’day Nicola, Thanks for the comments. Like you I think “big data” will decline after the stampede as with all fads. Though your point about “cannot be sliced and diced” is another reason I agree with. I’m not convinced learning is something that can be effectively “sliced and diced”. But I’m not sure about the other reason you give? Not sure we are that similar, especially if you look at what we do online. Any pointers that explain why I’m wrong? Really keen to see other perspectives. David.

  2. Nicola

    Hi David, thank you for your comment. Apologies I will try and clarify, I’m not sure about right/wrong. I guess if we can fragment ourselves into individual genetic maps and our potential for our genes can adapt – there are many possibilities in which case we are not that similar. From what I’ve understood or maybe misunderstood :-) about learning analytics is that they are looking at how humans are responding to the technologies they are using and making predictions, adapting based on the interpretation of what those responses might mean.

    I think I’m saying but as I’m writing this I’m not certain, is that there seems to be a limit to the amount of ways we can interpret the responses to the technologies we are using in learning. That limit being our understanding of the universe I guess, but I can’t see at the moment how having that understanding of either humans or what technologies can do, will cause a further increase in the value of data. I don’t know if that makes sense and need to think some more about it.

    1. Not sure if I’m going to connect exactly with your intent, but some of what you’ve said resonates.

      In terms of the limits of our ability to interpret the data, this is one of my worries about big data/learning analytics. LA doesn’t capture why the people interacted the way they did, the only capture how they interacted. The end user of dashboards and other LA tools (or even the LA tool itself through its models of learners and learning) than make its interpretation of what is meant by how the learner interacted with the technology.

      I’m skeptical of the ability to achieve a match between why the leaner did what they did and what the LA tools interpretations think why the learner did what they did. Especially the further away from the context the LA tool and its user are.

      Apparently all the algorithms etc will add the value to the data to make this possible, but I kind of doubt it.

  3. Nicola

    Thank you for replying. I’m not sure what your interpretation of my intent is and don’t need to for that matter, but I will try and explain what I think my intent is – assuming I might be capable of knowing ? I found your blogpost from a google blog search – am occasionally looking at issues and your comments relating to big data resonated. I have started to look at some of the datasheets, videos of learning analytics packages being integrated with corporate learning systems and authoring tools and that doesn’t feel quite like what the SOLAR concept paper mentioned they are trying to do. It looks like all kinds of analytics are being called ‘learning analytics’ but I have not seen pricing models that would allow the granularity to pick and choose.

    I am concerned about the ethical issues not just of interpretation but also of the motives behind the hype at the moment. There are millions being invested into ‘big data’ initiatives and schemes to encourage data & analytical start-ups. These keep markets busy which is what investors need.

    The question that I continually come back to whenever looking at economic issues is ‘why now’ and trying to apply this to ‘big data’ – as technologies have changed people want to be able to do more – e.g. health data of any kind to help improve therapies and in the case of learning analytics – your points above about how people interacted. Why has this become a hype, why huge investment and huge marketing right now?

    Why is there so much focus on behaviour, brain, technologies and interpretations being spread throughout media – why are so many hedge fund owners so closely tied with psychology – e.g. the volume of books, articles, TED talks, blogs on behavioural economics and the seemingly cult-like worship that seems to be around it. How far are these messages infiltrating (infilterating?) people’s consciousness? Where is a balance of other perspectives? Why are conclusions being drawn and the messages going out in certain ways – right now? What is the significance of now?

    I think like you, I care about what happens in this area and there are some really exciting points – being able to understand more about how we learn and where that might meet with our state of physiological and psychological health is fantastic – who wouldn’t want to be more informed. The data about environmental issues – great to see collaborations and everything else. But why ‘big’ is important – is it just to keep the wheels of ‘big investment’ turning?

    There are other ethical issues but this comment is already too long.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s