Learning analytics is being enabled/driven/sparked by the concept of “big data”, but for a while I’ve wondered just how big is the data being gathered by Universities. Preparations for a workshop earlier in the week provided an opportunity to find out.
The most recent spark for this query was Clow (2013) who gave the example of the Large Hadron Collider at CERN producing 23 petabytes of data in 2011. Some other examples found via Google included
- Google processing 7,300 petabytes in 2008.
- The new Square Kilometer Array radio telescope which is suggested will produce 1,376 petabytes of data a day when it goes live in 2024.
For a sense of scale, if you had 1 petabyte of MP3 music, you’d still be listening to that music after 2000 years of continuous play.
University “big” data?
In comparison lets assume you have the following for a single Australian University
- All the database tables and files uploaded to an LMS for the years 2004 through 2009.
- All the database tables and files uploaded to another LMS for the years 2009 to now.
- All the student demographic and age data for the same institution since around 2001.
How big is that “big” data?
I’ve been reliably informed that it is 665 gigabytes in size.
i.e. 0.000634193 petabytes.
Kind of not in the same league.
I wonder how this impacts the use of big data techniques to analyse university e-learning? How “big” does big data need to be to be useful?
The above is the combined data for the majority of online courses for a university since 2005 or so. What does that say about how “big” the data for an individual course is? Do big data techniques and assumptions break down at this size?
Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, (August), 1–13. doi:10.1080/13562517.2013.827653