Data mining with Weka – Class 2 – Evaluation

Now onto class 2 with the Data Mining with Weka MOOC.

Be a classifier

Interesting, Weka has a feature to visually build a decision tree by selecting regions of the instance space. Much nicer than the decision tree process I taught a couple of years ago about in IPT.

The idea is that this tree still yet needs be tested – which is coming up.

Training and testing

Basic machine learning situation.

The data mining process?

We’re developing a classifer on the basis of machine learning algorithm.

The test and training data needs to be different. But if there is one data set, then you need to divide – one suggestion 2/3 data for training, 1/3 for testing. Weka has the ability to do this randomly.

Basic assumption is that both sets are produced from independent sampling from an infinite population.

More training and testing

Using Weka’s percentage split to train/test the classifier multiple times.

After doing this calculate the sample mean, variance and standard deviation for each of the runs.

Interestingly the quiz questions assume some knowledge of statistics and actually require it to be used.

Baseline accuracy

Ahh, is it good to get 76% correct. A way to answer this is to look at the class (outcome) data and identify what you would get if you simply guessed. In this case, 500 of the 768 outcomes are positive (yes). So if you simply guessed yes, then you would be correct 65% of the time.

There is a classifier that does this. zeroR classifier. Establishing the base line.

Cross validation

The standard way of testing the performance.

Data set is divided into x segments. Run x times the idea of holding one segment for testing and use the rest for training. Called x-fold (where x is number) cross validation.

Stratified cross-validation – tries to hold the proportion of each class value.

Lots of data – use percentage split. Otherwise stratified 10-fold cross-validation – rule of thumb.

How big is lots? Hard to say, depends on the number of classes (2 class data set with 100-1000 samples, is probably good enough). Meaning 1000 data points.

100 number of classes needs more – looking for fair representation of each class.

Cross-validation results

Not much to write about their.

9/10 on the mid-course assessment. 10 MCQs. That’ll do.

Moving beyond a fashion: Likely paths and pitfalls for learning analytics

The following is a “webified” version of an invited presentation at the Blended Learning 2013 Summit. This version is largely an experiment with this medium/approach to sharing the talk. The slides from the talk are shown below with a written version of what I might have hoped to have said during the talk.

Click on the images if you wish to see them larger. The slides from the talk and also from a 2 hour workshop are also available.

Update – feedback on the presentation

The conference organisers have provided feedback on the presentation.

“Rating” out of a possible 5 as a speaker

Excellent Very good Good Fair Poor Responses Average
Content 13 35 11 1 0 60 4
Presentation 12 37 12 2 0 63 3.94

And comments that were provided

  • At last a few interesting and thought provoking ideas
  • This session was very informative
  • Best of the conference…grounded, intelligent, insightful

The “presentation”


Being a talk about learning analytics at a conference on blended learning it appears appropriate to start with gathering some data. Attendees were asked to take out their smart phones and participate in a poll around the question “What percentage of your institution’s online courses are good?”. Where “good” was defined as “you wouldn’t be embarrased when showing them to external folk”.

Using Poll Everywhere the actual slide showed live updates of the poll with the final data being used in a later slide. You can see related information in this image. In summary, the majority response was less than 20% of online courses were deemed good.


Welcome to the gold rush. That’s certainly what learning analytics feels like at the moment. In fact, our argument is that it’s almost certainly going to end up as yet another management fashion or fad with little significant impact on the quality of learning and teaching in Universities. The aim of this talk is to examine some of the paths institutions might take with harnessing learning analytics, to identify some of the pitfalls associated with those paths and make an argument that one of these paths is under-represented and essentially if the aim is to avoid the fad dead end.


Let’s start with the obvious. Making decisions based on available data is obvious. There isn’t anyone out there saying, “No I will not make decisions based on data”, but then there are folk doing exactly this all of the time. In terms of e-learning a contributing factor is that most people aren’t getting good data about what is going on, making it hard to base decisions on absent data. Making it more likely that decisions are based on prejudices and other biases.


Which is where learning analytics enters the picture. It promises to offer access to data upon which to base decisions. It promises to help penetrate the complexity around learning and teaching within higher education. Which would obviously appear to be a good thing.


The trouble is that like “blended learning”, “e-learning” and any of a number of other “innovations” it appears that learning analytics is fast become a fad, a management fashion. The very obviousness of making decisions based on evidence or data and the absence of it from the practice of university e-learning has (along with a range of other factors such as commercial vendors, management consultants and researchers) create a normative pull around learning analytics. If you aren’t doing something around learning analytics at the moment as a university you are almost seen as strange. You have to be doing it.


This is leading learning analytics into become yet another fad. Many of you may well remember another recent information and information systems related fad to spread through higher education – Enterprise Resource Planning (ERP) systems. About 15 years ago Australian higher education underwent a similar normative pull to adopt ERP systems as they promised to reduce costs, enhance innovation, and provide the basis for data-based decision making for the organisation. Anyone who has lived through the last 15 years in Australian higher education knows how well that turned out for us.

Fad cycles in higher education isn’t new. In terms of fad or hype cycles I much prefer Birnbaum’s fad cycle (published about the same time as the rise of ERPs) to the more widely known Gartner hype cycle. I dislike the Gartner Hype cycle because it is – as you would expect from a management consultancy firm – inherently optimistic about fads. It assumes that all fads have a plateau of productivity. On the other hand Birnbaum’s suggests that fads end with the “resolution of dissonance” – which is typically the blaming of the teaching staff for not buying into the vision – before the next big thing starts the cycle all over again.


Enough preface, let’s get to the guts of our argument.


The argument is not that learning analytics is bad. As someone teaching a course with 300+ students I see learning analytics as providing a range of potential benefits to both myself and the folk taking my course.


The argument is that almost without exception that the way universities will implement learning analytics will be bad. Bad in terms of not really improving the quality of learning and teaching within the institution. Instead it will suffer as yet another management fashion upon which (potentially) vast amounts of money and time are expended for little reward.


The aim of our talk is to avoid the peaks and troughs of the management fashion and instead.


Move directly to a productive state where learning analytics are actually making a difference to the quality of learning and teaching within universities.


And here’s how we plan to do it. We’re going to start by talking about we (as in the authors of this talk) think we know about learning analytics. This is important because how you understand or represent learning analytics will influence how you go about designing how your institution will attempt to harness learning analytics.

We’re then going to describe the three likely paths that we think universities are likely to take with learning analytics. We’re going to argue that all three are necessary, but that the emphasis will be almost entirely on the first two paths and that this will lead to the faddish (i.e. unsuccessful) use of learning analytics. Our argument is that the “do it with” path is the esssential path for something like learning analytics as it currently stands.

We’ll then offer some conclusions.

To repeat, we’re not suggesting that there is any one path to be taken. There needs to be a mixture of all three. However, there needs to be a much greater focus on the “do it with” path than most institutions will adopt.


So, what do we know about learning analytics?


Well there is strong evidence of it being a fad.

Back in 2010 the Horizon Report’s Technology Outlook for the Australian and New Zealand higher education sector made no mention of learning analytics. The closest it go was something called “visual data analysis” listed in the 4 to 5 year time frame.

In the next outlook in 2012, “learning analytics” has appeared from nowhere to be the second technology listed in the “1 year or less” time frame. The normative pull had commenced. This year’s outlook as “learning analytics” as number 1 in that same time frame. Every Australian university has felt the need to do something about learning analytics and is currently doing something about it.


But this stuff isn’t new. This the header from a script that started our collaboration around learning analytics back in 2008. Working in a central learning and teaching division we were responsible for looking after the institutional Blackboard LMS. What amazed me is that when the semester started and course sites were made available to students, no-one in the institution had any idea about the quality of the courses or if they were even ready.

This was amazing to me because I was coming from a system that automatically produced an acceptable minimal standard course website without any input from the academics. This meant we were confident that at go live there was something minimally useful for the students. We’d been doing this since 1997/1998. The idea that 10 years later you would be releasing course sites with no idea of what was there was just plain dumbfounding.

Especially given that we would receive all the common complaints from students (e.g. “the site is pointing to last year’s information”) and have to deal with them after go live. That was dumb. So we began writing an “Indicators” script that would examine the Blackboard database and generate a set of traffic lights indicating how ready/how good a course site was with the plan that this would be shown to the respective Associate Deans Learning and Teaching who would then take action.

For various reasons they didn’t like this idea – we were seen to be over-reaching our responsibilities. Apparently, data-based decision making wasn’t that important. So we started the Indicators Project and wrote some papers that to this day have had little or no positive impact on the practice of learning and teaching at that institution.


Of course, we’d been doing “learning analytics” for some time. Back in the last century we had a bit of open source software that would generate a weekly report about requests on our e-learning website. Each Monday we’d go over that report looking for patterns – good and bad – and making decisions about what might need to be done. Gaining insights from that data to drive our decision making.


The script even generated lovely visualisations to help.


Let’s return to the present. This is George Siemens, President of the Society for Learning Analytics Research and well-known speaker on the topic (and others). Many of you may have heard George speak over recent weeks as he’s been touring Australia. Given his involvement in learning analytics if anyone would know of an institution that is successfully implementing learning analytics it would be him.

The quote on this slide is from an email George sent in the last week or so to a learning analytics mailing list asking for examples of such institution. To his knowledge, much of what is being done is in specific research projects or small deployments. There aren’t successful institutional examples we can draw upon.


I know there will be some who will point to the data warehouse as a counter point. Many Australian Universities have had data warehouses for quite some time. I know of a few that were put in place because the multi-million dollar ERP system didn’t fulfil its promise. Only in turn for the data warehouse to also fail because hardly anyone was using the data produced. Beyond perhaps generating some data to hand over to the federal government every year.

This is finding backed up from the research from the Information Systems field. Australia Higher Education isn’t alone in having failures around data warehouse systems.


In fact, there are indications that all the very fashionable “bid data” and “business analytics” projects are failing as well.

Now, the information systems field has identified that the definition of failure (or success to take the positive perspective) is multi-faceted and subjective. But if the great obvious benefit of these systems is data-based decision making, then the systems not being used suggests an “absence of data-based decision making” which suggests fairly strongly at failure.


And of course this links to the broader, much written about (but nothing really successful done about) problem of information systems development failure. Large IS development projects fail. The larger they are, the more likely they are to fail.

Chances are that large scale, institution-wide adoption of learning analytics involving the multi-million dollar purchase of “Vendor X’s” business intelligence platform will fail.


Which brings us back to the Birnbaum’s fad cycle as the most likely outcome. But actually, I think Birnbaum was being an optimistic, or at least didn’t fully capture the complete story of fads and fashions.

The fad cycle didn’t stop with the resolution of dissonance. At least not for the management consultancy firms and the vendors enabling these fads. These organistions have a profit margin to think of so they did the smartest thing. Once a fad had failed in one sector.


The moved onto the next sector. It’s not that hard to track the movement of ERP systems from industrial manufacturing companies (where the ERP system arose) into the broader commercial business sector, then into government, then into higher education and then over the last couple of years into the K-12 education sector. It was interesting for me to observe the Queensland government’s Education department employing all the same promises and visions around their adoption of an ERP system 10+ years after the same messages were employed in the Australian higher education sector.


We do know that learning analytics is diverse. The list here shows all of the disciplines that had something to contribute to the Learning Analytics and Knowledge conference in 2012. It represents how learning analytics draws upon a wide array of disciplines and tries to bring those together. Clow’s description of the field has it picking and choosing tools and techniques from these disciplines.


We also know that higher education, technology, teaching and learning analytics all operate in an environment of change. The cliche, “the only thing certain is change” sums it up. Everything is change.

But beyond that, the field of Decision Support Systems (DSS) – a research field with a history going back 40+ years – has identified a range of fundamental principles for the design of information systems intended to support the decision making process (like learning analytics). One of those fundamental principles is evolutionary development. i.e. the development of these systems needs to be forever changing.


At a fundamental level, learning analytics examines the data you have gathered about what you currently do, applies a range of analysis methods to that data, and hopes to reveal insights about the future. Almost by definition it is “evaluating what exists in data”. The quality of this evaluation depends both on the analysis methods you use, but also the quality of the data and how effectively it captures what was being done.

This reliance on the past creates a tension between analytics and innovation, but it also creates other problems.


What value is there in learning analytics if the data you are capturing arises from poor practice. As the results of our quick survey reveals what passes for institutional e-learning is seen by ourselves as not being all that good.

In the presentation this was meant to show the live responses via the Poll Everywhere app. It didn’t work as expected, but the results were captured.

In response to the question, “What percentage of your institution’s online courses are good?” there were 29 responses

  1. Less than 20% – 16 (55%)
  2. Between 20% and 50% – 10 (34%)
  3. Between 50% and 70% – 2 (7%)
  4. Greater than 70% – 1 (3%)(

89% of the respondents thought that at least 50% of their institution’s online courses were not good.

What does this mean about the quality of the insights that can be gained from learning analytics based on data arising from this practice? What does it mean for what the purposes of learning analytics should be?


As it happens, this isn’t a new observation. In fact, Professor Mark Brown (who was also presenting at this conference and was in the room during this presentation) was quotes in a New Zealand paper on the weekend as suggesting that e-learning is a bit like teenage sex. What is being done – must less than what is being talked about – is not very good.

As it happens, I shared Mark’s thoughts via Twitter and his metaphor was usefully extended by a range of folk.


Fumbling in the dark suggests that learning analytics might be able to shed some light on the practice. Perhaps reveal just how messy it is and perhaps identify that the e-learning environment in universities is far from idea.


Imagine trying to analyse an act being done by people who have no idea and using data from that analysis to guide future practice.


Not only is our current practice of e-learning somewhat ignorant, but so also is our use of learning analytics to assist teachers in improving the practice of e-learning. We don’t really know yet how best to do this, no-one has really studied it.


In fact, as Dyckhoff and collaborators suggest even the bleeding edge research learning analytics tools aren’t providing answers to the types of questions that teachers ask. Let alone the types of tools currently available within our institutional e-learning environments.


Which leads us to the purpose of learning analytics. There are any number of definitions of learning analytics. Rather than try distinguish between academic analytics and learning analytics etc. the definition we’ll use is based on Clow’s (2013). The “analysis and representation of data about learners in order to improve learning”. The focus is on improving learning. If it isn’t being used to improve learning, then learning analytics is a failure.

We also like the Elias definition to tools and processed aimed at improving learning and teaching and the subsequent observation that to achieve this, those processes and tools need to integrated into the practice of teaching and learning. Hopefully you’ll see the connection between this definition of learning analytics and the rest of this presentation.


Which brings us to the three paths and the first “Do it to”


To illustrate the three paths we’re going to use this model of university teaching from Trigwell (2001). The idea is that student learning (the focus) is impacted by the strategies adopted by teachers, which in turn is influenced by the planning of the teacher, their thinking about learning and teaching and finally all are impacted by the context or environment in which learning and teaching takes place.


The “do it to” path usually starts with very smart and senior people getting together to think about strategic directions for the institution. i.e. they’ve noticed the management fashion and recognised that it’s important for the institution to react. They will engage typical techno-rational/strategic management techniques.


As a result they will make changes to the context in which learning and teaching takes place. Usually in the form of strategic and policy changes. Perhaps setting up of formal projects with visible senior management buy in, appropriate KPIs, user groups etc.


The assumption is that those changes will impact the thinking of teachers.


Which in turn will modify how they go about planning.


Which in turn results in new L&T strategies


Which in turn results in changes (hopefully improvements) to student learning.

However, in reality this type of top down change is generally so far removed from the “zone of proximal development” of the teaching staff that it never has any chance to change their thinking. After all, changing thinking is about learning and sorry, but top-down strategic change has never done a real good job at engaging staff in learning for a whole range of reasons.

Especially when confronted with an activity like learning analytics where

  • No institution is actually doing it successfully.
  • Where our prior attempts in the area have failed.
  • Where the field itself is incredibly diverse.
  • And we’re planning to use a technology/approach that is under-going rapid change in an environment that is itself undergoing rapid change.
  • Especially when the aim is to improve learning and teaching – which seems to rely significantly on the strategies/planning/thinking of individual academics – and we don’t really know how to do this with learning analytics yet.


So, in response, the most likely outcome of the “do it to” path is that the thinking and planning of academics is by passed. In some cases this is a conscious decision. For example, most of what is being done around learning analytics at the moment is retention. Retention projects – usually run by central student administration or related areas – that are avoiding engagement with academics at all and going directly to the student. In many cases, teaching staff are completely ignorant of the work being done at the institutional level and are almost certainly ignorant of the interactions between their students and the institutional actors engaged in the retention activity,

In other cases, the teaching staff will recognise the gulf between the top-down changes and what they do and either reject it as nonsensical (at least as seen by them, but also perhaps because it is nonsensical) or workaround it because it doesn’t fit with their thinking or with what is possible for them to plan and implement.


Beyond this outcome, there are range of other pitfalls associated with the “do it to” path. These are but some of them and I’m not going into these in any great detail.

This presentation is based on a talk given at the Australian SolarFlare workshop last year at which I spent more time going into these pitfalls, leaving less time for the “do it with” path. Today I’d like to focus on the “do it with” path more so I’ll not cover many of these pitfalls.


I will instead focus on what I think is a fundamental problem for Australian higher education, the almost complete ignorance that there are two main ways to view process.

Over recent years Australian higher education has adopted a myopic view of process that is ignorant of the broader management literature. Australian higher education has become enamoured and focused on top-down, strategic planning processes – essentially yet another management fashion. Australian higher education and its managers have become entirely focused on the planning school of process through. Also known as the exploitation view of process or teleological design.

What they are ignorant of is that the management literature has had a long and on-going debate between two different schools of thought on process. Characterised here as the terms “planning” and “learning”. The ideas isn’t that either school of thought is the one true view, neither is a silver bullet. Both have their weaknesses, both have their strengths. The ideas is that depending on your context and aims you should be using different approaches to process.

When you are in a fixed environment where you can clearly know and articulate the goals of your projects. When you know there is one right answer that is better than all others and you can successfully engage all stakeholders in that one right answer, then you use the planning approach.

However, when you are in a complex environment with a wicked problem. A situation where there is never going to be one right answer and even worse there is no way for you to even identify what the right answer is. Not to mention that you can’t engage all the stakeholders successfully in a right answer (even if you could identify it). Then you should use a learning approach to process.

Which type of approach do you think is appropriate for learning analytics within Australian higher education?


This is not some crackpot understanding. This is not some new, ground-breaking insight. It is well established in the literature, here are just some of the fol who have written about it and the terms they have used for the “planning” and “learning” approach to process.


Just to reinforce the point, here’s a quote from James March (a big name in the field) arguing that an over empahsis on either approach leads to an unproductive state. A state Australian higher education finds itself in today. A state I hope we don’t find ourselve in with learning analytics (who am I kidding).


So, onto the “do it for” path.


So again you have some smart people within the organisation. These folk probably aren’t as important as the “do it to” folk and these folk may even be getting together at the behest of the “do it to” folk. These are typically central learning and teaching folk, the Associate Dean’s Learning and Teaching, perhaps some innovative teaching staff and if they are really lucky the university Information Technology folk.


Knowing about learning analytics and it can help the teaching staff they will make changes to the context for the academic staff. This might be in the form of running some seminars on learning analytics. They might invite George Siemens to give a talk to the staff. They might select and install some learning analytics software in the LMS or elsewhere. They might upskill themselves in learning analytics so they can help teaching staff.


The idea is that – like the “do it to” path – these changes in the context will lead to changes in the thinking, planning and strategies of the teaching staff and subsequently improvement in the student learning experience.


Of course, anyone who has worked for central learning and teaching – like both of us have – know that this isn’t what happens. Instead, only a very small percentage of the teaching staff ever engage with the changes made to the L&T context. Almost certainly the teaching staff that attended those sessions were the ones that didn’t really need to attend. So your impact is lessened.

But then at each level through this model the impact reduces further. The greater rewards from research, the dissonance between what the LMS tool does and the pedagogy of the teacher and other reasons all contribute to a diminution of the impact of the changes made to the L&T context.


As with the “do it to” path, the “do it for” path has a range of pitfalls. We’re going to focus on three only.


The first is drawn from work by Geoghegan from 1994 around instructional technology in American universities. It has insights which I think still apply today. His general observation is that instructional technology never clears the chasm that exists between the early adopters and the majority. i.e. as identified above the “do it for” path really only impacts the early adopters.


Geoghegan identified four reasons for this.

The first is the assumption that all academics are the same. When they are demonstrable not. We’ll return to this.

The second is that the folk making many of these decisions form a self-reinforcing clique – the technologists alliance. i.e. the folk who make the decisions about what changes to make as part of the “do it for” path are mostly early adopters and boosters of technology (by the way the “IT” in “IT staff” above means instructional technology). They see the world differently.

This ends up in approaches that don’t match the experience and conceptualisations of the mainstream and because of this it fails to provide the majority a compelling reason for them to adopt what ever is being decided for them.


Convery captures a large part of this in these quotes. The technologists alliance have a much larger say about what will happen and through this the connection with the lived experience of the majority gets ignored leading to a failure to cross the chasm.


To illustrate the distinction between the early adopters and the majority, Geoghegan draws on Moore’s original work which in turn was based on the diffusion of innovations work from Rogers. This work clearly demonstrates that there is a significant difference between the early adopters and the early majority. The “do it to” and the “do it for” paths fail to engage with this diversity and subsequently fail to make any large scale or long term impact.

But it’s even worse that this. There’s a lot to criticise about the diffusion of innovations work, but perhaps the most applicable criticism here is that it under plays the diversity between teachers. It’s just not two groups with significant differences. Within each of these groups there is significant difference.


To take another task lets look at how the “do it for” approach typically works in terms of changing learning and teaching. This approach typically treats the actual teaching of a course as a black box (in this case yellow). What goes on inside that box is ignored by the “do it for” approach.

Instead any attempts to re-design a course is done before the course is taught. Once those changes are done, the course is taught with little additional focus on that experience, until at the end out pops a range of outputs, student results and satisfaction survey responses. These outputs are then used as inputs to the re-design phase before yet another black box offering of the course.

Central L&T is largely ignorant of the actual experience of teaching a course. The assumption is that the design is done outside the course offering. The assumption is that design can be, at some level, general and abstract.

This is why a lot of institutional learning analytics projects focus on measures like retention. It’s too complex to look inside a course and respond to the diversity of experience that is there.


Which brings us back to the observation about learning analytics, that there are dearth of studies figuring out how to use learning analytics within that black box. We don’t know how to do it well yet.


Which brings us to the “do it with” path. As you have probably guessed this is the path which I think will be the most fruitful and interesting in identifying how to harness learning analytics to improve learning and teaching. However, I don’t think it’s the only path, it does need to be supported by and inform the other paths. My problem is that this path will largely be ignored and its absence will result in learning analytics becoming yet another management fashion.


This path starts with a group of people working with teaching staff inside the black box that is teaching the course. At the level of the teachers’ strategies. The idea is to develop an understanding of the lived experience – in all its diversity and difficulty – and figure out uses of learning analytics (and other tools) that can help make a difference.


That insight is used to make changes at the level of strategy in response to what is actually going on. Due to the diversity of what happens at this level there will problems as to what you address. It will be very subjective, very different, there will be no-one silver bullet. So it won’t be large scale changes you’re making.


But by making those changes you learn. You understand what worked and what didn’t and you start making changes to student learning (without a long-term, large scale project where nothing happens for a long time) and by working with teachers you are helping change how they plan because they have new tools and approaches that respond to their needs.


If you’re lucky and you do it well this process may in turn lead to changes “up the ladder”, but those changes will take some time to happen and will require some work.

There is no simple, easy solution.


It is important to recognise that you are doing a lot of these interventions quickly and learning from them all. This isn’t a one off process. Each time you try some interventions you gain new insights – you and other actors learn – and you apply that learning to make further improvements and start upon a cycle of on-going learning and change.


A cycle that goes on. Remember, you also operate in an environment of change with technology that is also changing. All this requires your organisation to be able to change in response.


By now I’m hoping you can see which school of thought around process we’re leaning towards as being most appropriate for learning analytics within higher education.


In part, this is based on the view of the teacher as a primary change agent. But it’s also in recognition that learning and teaching in a modern university is increasingly complex and requires this type of approach.


It’s also influence by the growing observation that workload for teaching staff in universities is increasing. We believe this is largely due to the fact that there’s been an over emphasis on the do it to and for paths and not enough with. The people who are driving institutional e-learning are ignorant of the complexity of teaching and are providing tools, policies and approaches that are inappropriate.


It also comes back to the common insights about learning analytics where knowledge of the context in which learning and teaching takes place is essentially to deploying learning analytics.


And that any institution-wide approach to learning analytics must be based on a thorough knowledge of the context.


A common argument against this type of bottom up approach is that it doesn’t provide the type of large-scale, strategic advantage – the competitive difference – that senior management demand for their organisation. This is of course an ignorant claim based on a lack of knowledge of the reality of gaining strategic advantage from information technology.

You do not gain strategic advantage or competitive difference by adopting the same enterprise business intelligence/data warehouse system from the same vendor as all your competitors and then following what is the best-practice advice with large scale enterprise systems (as followed by all university information technology divisions) and implement those systems as vanilla. This approach removes all distinction between organisations.

We know from the information systems research that competitive advantage from ICTs only comes for organisations that are able to convert those systems into unique, practical and situated knowledge for action. In fact, it’s this type of knowledge from which the original “strategic information systems” arose, not from buying an enterprise tool from a vendor.


And now to some frameworks that are a bit more specific to learning analytics and can provide some more insight into how you implement learning analytics in ways that avoid faddism.


This is a model of analytics published by George Siemens in his 2013 journal article. It’s a fairly standard representation of the cycle of learning analytics. In doing so we think it suffers from the same over-emphasis that much of learning analytics suffers from. We think this over emphasis is bad if your aim with learning analytics is improving student learning.

We’re in the process of developing the IRAC framework (an early paper explaing IRAC) which we hope/think will help. We’re going to use this framework to outline the shortcomings of the common thinking about learning analytics.


The I in IRAC stands for information. This involves the gathering, normalising and analysis of the data and information your institution has. This is the foundation of learning analytics, without information and its analysis you can’t generate insights. This is perhaps why so much of learning analytics is focused at this stage. More than 3/4 of George’s model focuses on this stage. While this stage is important, we think this over emphasis gets in the way of using learning analytics to improve learning and teaching.


The R in IRAC stands for representation. Once you’ve gathered and analysed your information you have to represent the findings in ways that people will understand them. This is often where most of the analytics literature stops. We think this is lazy and short-sighted and it’s good to see George’s model has more that follows representation.

We also think that about of thinking and work that goes into representation is too limited. It’s almost as if once we have a dashboard, it’s all good. By the way, we think dashboards suck.


The A in IRAC stands for Affordances. Very little of the learning analytics literature considers this. George’s model does include the notion of action. i.e. you need to undertake some sort of action based on the insights that were represented as a result of your analysis of the information.

We argue that in order to encourage appropriate action it is essential that both the learning analytics applications and the broader institutional context need to offer affordances for that action. i.e. they need to make it easier and more obvious for learners and teachers to take the appropriate action.

There’s much more to learning analytics than dashboards.


The C in IRAC is change. George’s analytics model captures this through its cyclical nature. Once you’ve carried out some action you will get more information that you have to analyse, represent and that will inform new action. A cycle that goes on and on.

While change is there in George’s model it is implicit. It’s not clearly stated in the diagram in the same way that “Action” is (for example). We believe an understanding of the need for on-going change is so central to harness learning analytics that we make it an explicit part of the framework.

Change will come in many different forms. At the very least, the Information, Representation and Affordances offered by your institution’s learning analytics applications will need the capacity to change. This change should not just be first order change, it should enable and support second order change. The changing of underlying assumptions and conceptions.

After all the argument here is that we need to take a learning approach to learning analytics and this involves conceptual change.


Just briefly, as argued above we thinking that much of the learning analytics literature as the various weights of the different parts of the IRAC framework wrong.

Information is foundation/central to learning analytics. But the gathering and analysis of that information is only a very small part of the process of improving learning and teaching with learning analytics.

There needs to be a much greater focus on the representation, affordances and change components of the IRAC framework. An organisation needs to invest much more time on the R, A and C components if they are to actually make significant and on-going improvements to learning and teaching.


Earlier on in the presentation you saw parts of an email George Siemens sent asking about examples of systemic approaches to learning analytics. On the screen you can see excerpts from a reply to George’s email. A reply that encapsulates much of what we’ve argued here.

A top-down systemic approach is only going to alienate folk. It’s also going to limit you to very broad (and poor) indicators of learning – GPA and retention rates.

Instead, you need to provide a foundation of resources that will enable a range of approaches to develop in ways that acknowledge (and leverage) the idiosyncrasies of learning and teaching.


So some conclusions.


A couple of weeks ago I watched an online video of a talk given by Brett Victor. A smart guy, ex-Apple engineer who is doing some really great stuff with technology. In this talk Brett assumed the persona of a software developer giving a talk in the early 1970s – he used an OHP and slides for period authenticity. The talk was aimed at highlighting a few interesting technical breakthroughs from the last 1960s/early 1970s that promised to revolutionise how software was developed. None of that revolution has happened.

Throughout his talk Brett showed how – even folk as technically proficient as software developers – resist technological change.


Amongst his reflections/conclusions was the observation that while technology changes quickly, people change their minds about how to use that technology very slowly. Even the most technically proficient of people – the people developing technology – show this.

I’d like to add that organisations – where you’re talking about groups of people changing their minds – change even more slowly. Perhaps this is why University IT systems are still focusing on single systems and hierarchical command and control while the world has moved into networked ways of being.


It is perhaps why Australian universities are ignorant of the learning school of thought around process.


Brett’s second point (or at least the 2nd one I took away) was that as a creative person, the most dangerous though you can have is that you know what you are doing. Similarly, I would argue that this is the most dangerous thought an Australian university can have about learning analytics and e-learning more generally.

The trouble is that the over-emphasis on the planning approach to process is based on the this very assumption. That the institution knows what is doing. This approach by its very nature will limit the impact of learning analytics or any form of innovation.


This is why the “do it with” approach is so essential. It acknowledges that we don’t know what we’re doing when it comes to learning analytics and thus opens us up to all sorts of possibilities that will arise from productive engagement with the reality of learning and teaching.


As part of this process, we think that getting the information right is a good first step, but that it is the step on which we should spend the least amount of time and energy. That representation, affordances and on-going change should consume a lot more attention and resources.

Because in the end we don’t know much about learning analytics and its important that we question everything and learn as much as we can.



Abrahamson, E., & Fairchild, G. (1999). Management fashion: Lifecycles, triggers and collective learning processes. Administrative Science Quarterly, 44(4), 708–740.

Arnott, D., & Pervan, G. (2005). A critical analysis of decision support systems research. Journal of Information Technology, 20(2), 67–87. doi:10.1057/palgrave.jit.2000035

Birnbaum, R. (2000). Management Fads in Higher Education: Where They Come From, What They Do, Why They Fail. San Francisco: Jossey-Bass.

Bright, S. (2012). eLearning lecturer workload: working smarter or working harder? In M. Brown, M. Hartnett, & T. Stewart (Eds.), ASCILITEÕ2012. Wellington, NZ.

Ciborra, C. (2002). The Labyrinths of Information: Challenging the Wisdom of Systems. Oxford, UK: Oxford University Press.

Clow, D. (2012). The learning analytics cycle. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge – LAK  Õ12, 134–138. doi:10.1145/2330601.2330636

Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, (August), 1–13. doi:10.1080/13562517.2013.827653

Dawson, S., Bakharia, A., Lockyer, L., & Heathcote, E. (2011). ÒSeeingÓ networks : visualising and evaluating student learning networks Final Report 2011. Main. Canberra.

Dyckhoff, a. L., Lukarov, V., Muslim, A., Chatti, M. a., & Schroeder, U. (2013). Supporting action research with learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge – LAK  Õ13 (pp. 220–229). New York, New York, USA: ACM Press. doi:10.1145/2460296.2460340

Elias, T. (2011). Learning Analytics: Definitions, Processes and Potential. Learning.

Geoghegan, W. (1994). Whatever happened to instructional technology? In S. Bapna, A. Emdad, & J. Zaveri (Eds.), (pp. 438–447). Baltimore, MD: IBM.

Goldfinch, S. (2007). Pessimism, computer failure, and information systems development in the public sector. Public Administration Review, 67(5), 917–929.

Lockyer, L., Heathcote, E., & Dawson, S. (2013). Informing Pedagogical Action: Aligning Learning Analytics With Learning Design. American Behavioral Scientist, XX(X), 1–21. doi:10.1177/0002764213479367

Lodge, J., & Lewis, M. (2012). Pigeon pecks and mouse clicks : Putting the learning back into learning analytics. In Mark Brown, M. Hartnett, & T. Stewart (Eds.), Future challenges, sustainable futures. Proceedings ascilite Wellington 2012 (pp. 560–564). Wellington, NZ.

March, J. (1991). Exploration and exploitation in organizational learning. Organization Science, 2(1), 71–87.

Mor, Y., & Mogilevsky, O. (2013). The learning design studio: collaborative design inquiry as teachersÕ professional development. Research in Learning Technology, 21.

Sharples, M., Mcandrew, P., Weller, M., Ferguson, R., Fitzgerald, E., & Hirst, T. (2013). Innovating Pedagogy 2013: Open University Innovation Report 2. Milton Keynes: UK.

Siemens, G. (2013). Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, (August). doi:10.1177/0002764213498851

Siemens, George, & Long, P. (2011). Penetrating the Fog: Analytics in Learning and Education. EDUCAUSE Review, 46(5).

Suthers, D., & Verbert, K. (2013). Learning analytics as a middle space. In Proceedings of the Third International Conference on Learning Analytics and Knowledge – LAK  Õ13 (pp. 2–5).

Swanson, E. B., & Ramiller, N. C. (2004). Innovating mindfully with information technology. MIS Quarterly, 28(4), 553–583.

Trigwell, K. (2001). Judging university teaching. The International Journal for Academic Development, 6(1), 65–73.

Getting started with Weka

And now for the second MOOC of the year – Data Mining with Weka from the University of Waikato. Weka is an open source data mining application. Time to find out a bit more about one of the technical aspects of learning analytics. Hoping this might be useful in terms of research with The Indicators Project (we should really update that site).

First up is “Class 1 – Getting started with Weka”


YouTube video introduction – 9 minutes. Powerpoint slides with talking head in the corner. Some of it devolves to reading the slides.

Data mining is a mature technology. The course is trying to take the mystery out of it. Look at some common algorithms and see what practical use they might be.

Data mining is about going from raw data to information that can be useful for predictions.

Shopping, royalty cards and artificial insemination as examples.

Data mining uses machine learning algorithms for the data mining application

Weka – the software used here – is a data mining workbench. Algorithms for a range of necessary purposes.

5 classes in the course, 6 lessons in each class, with an activity – fairly standard xMOOC format

  1. Getting started with Weka
  2. Evaluation
  3. Simple classifiers
  4. More classifiers
  5. Putting it all together

There’s a signed certificate from the University if you get 70% or more.

The instructor has a published book. Used in the course with parts being made available with permission from publisher.

Typical NZ flipped world map.

And now the activity. A simple true/false quiz which is more about recognising that data mining is applicable in a wide array of applications. Promoting the book a bit.

Data mining with Weka

Almost 5 minutes of the video downloading and installing the Weka software, which I’d done. So a bit of skipping.

Weka has four applications

  1. Explorer – the focus of this course
  2. Experimenter – performance comparisons, of machine learning algorithms on different data sets
  3. KnowledgeFlow – graphical interface
  4. Simple CLI – yea, the command line.

So, the play data set about the weather

  • instances – 14 rows associated with a particular day.
  • attributes – 5 columns, 4 of which is weather, 1 is whether or not play occurs. The aim is we’re looking at predicting the play variable.

Of course the bit about the standard data sets was hidden in the installation process that I skipped. But usefully they have the transcript, download that and a quick search found the solution.

All very simple and accessible and just a bit slow for anyone with a modicum of technical/CS experience.

The activity is also simple, but it does test what was covered in the video.

Exploring datasets

Classification or supervised learning problem – trying to predict the class problem – i.e. play in the weather data. The standard is the last is the class value.

Classification problem

“Good to get down and dirty with your data” – checking the reasonable value for the data. The preprocess stage of the Weka explorer is useful way of doing this.

Would be interesting to get the data set for the MCQs in the activities. I have a feeling that doing some data mining on that might reveal B to be a good random choice.

The questions in this MCQ do involve using and demonstrating your understanding of the concepts and the software. Also take a bit more time. This is better than the “stats” MOOC I started a little while ago.

Building a classifier

Using a J48 classifier to analyse.

Run it on the Glass data set and generates a success rate of 68%. Also a confusion matrix that gives some idea of where the errors are.

Using a filter

Introduction to part of the pre-processing part of the process. Manipulating the data set to get it ready for classification etc.

Visualising your data

More useful methods for examining the data.

Java/Weka doesn’t seem to handle the “right-click” problem for a Mac consistently.

And that’s class 1 finished. Found this a better designed experience, or perhaps that’s just my prejudices showing through.

University data isn’t that “big”, what are the implications?

Learning analytics is being enabled/driven/sparked by the concept of “big data”, but for a while I’ve wondered just how big is the data being gathered by Universities. Preparations for a workshop earlier in the week provided an opportunity to find out.

Big data

The most recent spark for this query was Clow (2013) who gave the example of the Large Hadron Collider at CERN producing 23 petabytes of data in 2011. Some other examples found via Google included

For a sense of scale, if you had 1 petabyte of MP3 music, you’d still be listening to that music after 2000 years of continuous play.

University “big” data?

In comparison lets assume you have the following for a single Australian University

  1. All the database tables and files uploaded to an LMS for the years 2004 through 2009.
  2. All the database tables and files uploaded to another LMS for the years 2009 to now.
  3. All the student demographic and age data for the same institution since around 2001.

How big is that “big” data?

I’ve been reliably informed that it is 665 gigabytes in size.

i.e. 0.000634193 petabytes.

Kind of not in the same league.


I wonder how this impacts the use of big data techniques to analyse university e-learning? How “big” does big data need to be to be useful?

The above is the combined data for the majority of online courses for a university since 2005 or so. What does that say about how “big” the data for an individual course is? Do big data techniques and assumptions break down at this size?


Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, (August), 1–13. doi:10.1080/13562517.2013.827653

Is institutional e-learning a bit like teenage sex?

2013 post updated for 2015 to (un)scientifically test for any changes over time.

Professor Mark Brown (@mbrownz) from Massey University the National Institute for Digital Learning at Dublin City University is quoted in a New Zealand newspaper as saying this about institutional e-learning

E-learning’s a bit like teenage sex. Everyone says they’re doing it but not many people really are and those that are doing it are doing it very poorly.

As it happens, this was a point I’m aiming to make/use in a presentation at (the 2013 version of) this conference (which Prof Brown also happens to be speaking at). A point that will be strengthened by Prof Brown’s comments, however, I’m wondering if I can gather some data to explore this further.

2015 update: Results of the 2013 surveys (this one and one live during the presentation) are provided in a link at the end of this post. I’ll “release” the results of this survey in a week or two. Perhaps live during my dlrn15 session on Sunday 16th October.

The survey

To that end please feel free to complete this brief survey.

It’s short, doesn’t store any identifying information and I’ll share the results here later in the week. The survey doesn’t explore all aspects of the relationship between institutional e-learning and teenage sex, it’s just a quick exploration of the topic.

Three questions

  1. Your institution’s country?
  2. Multiple choice – “What percentage of your institution’s online courses do you consider good?”

    Where good is defined loosely as “good enough to show other folk without being overly embarrassed”.

    (This isn’t meant to be a well designed survey or formal research).

  3. Your role was/is within the institution?

2015 update: Results of the 2013 survey.

Useful “analytics” – Faces as an example

A couple of weeks ago I expressed one of my reservations with the large buckets of money and time universities (and others) are currently investing in learning analytics in a blog post titled Bugger analytics, just give me useful information. My reservation is that the learning designers, data scientists, commercial software vendors, management and other members of Geohegan’s (1994) technologists’ alliance are so enamoured of the theoretical future possibilities of learning analytics they are ignoring (or perhaps are possibly completely ignorant) of the current information needs of learners and teachers. The following is an attempt at a small example

Faces and photo galleries

I currently teach pre-service teachers. As part of my course the students head out for three weeks Professional Experience in a school. One of the most common problems the pre-service teachers face in the initial stages of their time in a classroom, is learning the names of their students. For many this has been made simpler because many schools now have information systems that can print out a couple of pages with students names next to student photos. These pages are amongst the first requests the many of the pre-service teachers make and they then spend time memorising names.

This has been a problem in University education as well. Reek and Reek (1996, p. 15) writing in the context of a course with 350 students

A vital part of the laboratory instructor’s job is to make the students feel comfortable and welcome. This requires that they learn the students’ names quickly. Many of our faculty have difficulty matching names and faces, as we have a large number of students, To assist the faculty in learning to recognize their students and to facilitate students getting to know their classmates, we decided to build an electronic class photo album containing pictures and short personal sketches for each student in the class

They then proceed to tell the story of how they used their Silicon Graphics Indys to take student photos as part of an ice breaker at the start of labs. The photos were displayed using this new thing called the World-Wide Web via “Mosaic or Netscape”. (One of their big problems was running out of disk space, what with 250+ students and “massive files” averaging 220Kb, they had to dedicate a disk to the project).

When you move to teaching by distance education or online learning and have students you never meet face-to-face, having photos of students can be beneficial. During the late 90s and early this century most university enterprise systems did not provide this feature. Even though these systems actually stored student ID card photos in a central database, the idea that providing access to these photos to teaching staff would be beneficial never got to the implementation phase.

We’re now in 2013 and I work in an institution where I cannot get access to a page that lists basic information about my students (name, campus, location, etc) and their photo. The institution’s multi-million dollar ERP system – the last fad promised to revolutionise the information needs of universities – doesn’t provide it. Moodle, the institution’s LMS, does provide access to photos – if the students update their profile – but not necessarily in a way that will usefully fulfil the above need.

Enter Faces a Moodle block that was announced today. It’s described as

It can be difficult to put a name to so many faces! Faces is a simple Moodle block that allows teachers to print out a collection of names and profile photos of those students who attend their classes.

Faces can print a collection of profile photos in grid format for the entire class or by group. It can be useful for use in the classroom or meetings when its useful to ‘put a name to a face’

Small examples and indefinite postponement

Faces and the desire to associate names to faces is, in the scheme of things, a fairly small-scale need. That’s certainly how I’d imagine it would be seen by the governance structures around institutional e-learning. Governance processes serve an important purpose, but they also suffer a flaw (or more). A governance process is from one perspective is a priority scheduling algorithm. One of it’s task is to priortise needs to ensure efficient utilisation of a scarce resource. Anyone forced to sit through an Operating Systems course will now that priority scheduling algorithms run the risk of a problem called indefinite postponement (aka starvation).

In indefinite postponement, objects with low priority are likely to be starved of the scarce resource. They never get a go because there is always an object of a higher priority.

Learning analytics can be defined as the provision of useful information to people in ways that action can be taken to improve learning. As learning analytics increasingly becomes an enterprise concern, it is increasingly like to suffer this problem of indefinite postponement and miss opportunities like this. Opportunities like this are important, because a University learning ecosystem is a complex adaptive system. Small changes in complex adaptive systems can lead to significant changes.

So, some possible suggestions to address this problem might include

  • Don’t focus so much on the multi-million dollar vendor product and the complex analysis methods that you forget about the simple information needs of the learners and teachers.
  • Modify the governance process to purposely allow some of the small scale projects to get a go.
  • Supplement your large scale, centralised enterprise learning analytics platform project with lots of safe-fail probes.
  • Set up your learning platform to break out of the constraints of scarcity. e.g. break out of the single integrated system (single point of failure) approach and adopt a network mindset


Geoghegan, W. (1994). Whatever happened to instructional technology? In S. Bapna, A. Emdad, & J. Zaveri (Eds.), (pp. 438–447). Baltimore, MD: IBM.

Reek, M., & Reek, K. (1996). An Electronic Class Photo Album. ACM SIGCSE Bulletin, 28(4), 15–18.

The importance of “We don’t know what we’re doing”

The video below is of a talk by Brett Victor on “The Future of Programming”. But don’t let that stop you, underpinning the talk is an important message for folk involved in learning, teaching and most things. The two main points I took away from it are (as applied to my area of interest):

  1. “Technology changes quick, peoples’ minds change slowly”.

    This is evident all the time in e-learning. Learners and teachers still operating on prior assumptions, their minds haven’t changed. Of course, the problem extends beyond that to the technologists’ alliance pushing e-learning (in its various forms).

  2. “We don’t know what we’re doing”

    That the ability to say this and reimagine what is possible is perhaps the most important capability for not only changing minds but with redefining what it means to learn and to teach.

    In a “strategic management” culture it is impossible for leaders to admit that they don’t know what they’re doing. With academic researchers only slightly less likely to admit this. Two big barriers.

    Which obviously makes my cluelessness a major strength.

The talk is based on the assumption that it is 1973 and the talk is being given by a programmer of that age. It introduces four important ideas that had arisen over the prior 10 years and which promise to be essential to the future of computing. The point is that they haven’t. In fact, most of computing has focused on the opposite, more limited approaches. But the real point is not the failure of these ideas, but the fact these insightful and interesting ideas arose when we didn’t know what we were doing. The opposite to what is happening today, where everyone (at a certain level) is certain about what they are doing.

Learning Analytics: The Emergence of a Discipline

Continuing my reading of some recent learning analytics journal articles, the following is a summary and some thoughts on

Siemens, G. (2013). Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, (August). doi:10.1177/0002764213498851


A really good, detailed overview of learning analytics as an “emerging discipline”. Even some good coverage of the institutional adoption of learning analytics.

But I have some niggles which I’m still trying to express. Here’s the first go and then into the summary of the paper.

The biggest niggle I have after reading this is trying to marry up a discipline like learning analytics that arises from and perhaps even revels from the complexity and variability of big data with a very traditional teleological/enterprise conception of organisational implementation.

If the ultimate goal of learning analytics is to reject the sort of rationalist/modernist decomposition of knowledge into programs and courses into a network of knowledge where computational methods guide the learner through this complex network, then why must the enterprise implementation of learning analytics depend on rationalist/modernist implementation approaches that break the process down into separate components?

Related or perhaps a re-phrasing. If the organisational context within which learning analytics is applied (e.g. a university) is a complex adaptive system then why is implementation positioned as a heavyweight, teleological exercise in project management with complete ignorance of the best way to explore, encourage and discover appropriate change in such a system – safe-fail probes. Now, this paper doesn’t actually suggest a heavyweight, teleological exercise in project management, but I can see most management looking at the “learning analytics model” proposed (see image below) and the identified importance of institutional support within the paper and making the leap to a heavyweight, teleological exercise in project management.

George actually argues for the need to keep human and social processes central in LA activities and yet the focus of the field still seems to be on the techniques and applications rather than whether or not these will be used, or used effectively by learners and teachers. Wrapped up in that, is the whole idea of learning analytics focusing on some ideal future embodied in its applications and techniques or whether it is directly addressing the current problems and needs.


The paper’s abstract follows with my emphasis added on what I’m interested in reading about

Recently, learning analytics (LA) has drawn the attention of academics, researchers, and administrators. This interest is motivated by the need to better understand teaching, learning, “intelligent content,” and personalization and adaptation. While still in the early stages of research and implementation, several organizations (Society for Learning Analytics Research and the International Educational Data Mining Society) have formed to foster a research community around the role of data analytics in education. This article considers the research fields that have contributed technologies and methodologies to the development of learning analytics, analytics models, the importance of increasing analytics capabilities in organizations, and models for deploying analytics in educational settings. The challenges facing LA as a field are also reviewed, particularly regarding the need to increase the scope of data capture so that the complexity of the learning process can be more accurately reflected in analysis. Privacy and data ownership will become increasingly important for all participants in analytics projects. The current legal system is immature in relation to privacy and ethics concerns in analytics. The article concludes by arguing that LA has sufficiently developed, through conferences, journals, summer institutes, and research labs, to be considered an emerging research field.


Starts with Phillip W Anderson’s poitn about “more is different” which as Wikipedia summarises

in which he emphasized the limitations of reductionism and the existence of hierarchical levels of science, each of which requires its own fundamental principles for advancement.

. In this case the point is that “that quantity of an entity influences how researchers engage with it”. This is what is happening with “big data” leading to the view it can “transform economies and increase organisational productivity and increase competitiveness. But education is lagging behind but the explosion of interest is happening.

Aside: I’ve heard of Anderson’s “more is different”. In fact, I’ve heard George use it before. I’ve just skimmed a bit of Anderson’s article. I find the notion of “quantity influences how you engage with it” interesting/troubling on a couple of fronts. Need to think more on this. But does/should the presence of more data alone fundamentally change what’s happening? If yes, then is there any University using learning analytics to fundamentally change what they are doing – beyond adding new systems and another range of support staff to perform tasks that the teaching staff aren’t doing?

Brings in Kuhn to talk about the evolution of science, knowledge etc. The idea of a “network of theory” and the importance of connections between entities as representative of knowledge is pushed.

Analytics is positioned then as “another approach, or cognitive aid” that can assist folk “to make sense of the connective structures that underpin their field of knowledge”. Large data sets have changed the method of science and the question investigated. Transition from Yahoo (hierarchical classification) to Google (big data and algorithms) mentioned.

Aside: so how long and how will the hierarchical classification of education (programs and courses) last before transforming into a “big data and algorithms” (or something else) approach.

With the increasing use of e-learning, comes more and more data – which may be imperfect on any number of fronts – that “offer an opportunity to explore learning from new and multiple angles”.

This view of “a new mode of thinking and a new model of discovery” is traced back (in part) to AI and machine learning and has some good quotes from authors in those fields about the “unreasonable effectiveness of data” (Halevy, Norvig, and Pereira, 2009) and “the emergence of a new approach to science” (Hey, Tansley and Tolle, 2009).

Defining learning analytics and tracing historical roots

Starts with the SOLAR definition of learning analytics

Learning analytics is the measurement, collection, analysis, and reporting of data about learners and their contexts, for the purposes of understanding and optimizing learning and the environments in which it occurs.

Adds another from a business intelligence perspective aimed “developing actionable insights”

Differentiates LA from Educational Data Mining (EDM) by suggesting that EDM has a “more specific focus on reductionist analysis” but that there will be overlap.

Identifies layers of analytics drawing on Buckingham-Shum’s (2012) three layers

micro- meso- macro-
Process-level Institutional Cross-institutional
Mostly learner, but perhaps teacher. department, university region, state, international
Social network analysis, NLP, assessing individual engagement levels Risk detection, intervention and support services Optimising and external comparison

The point of the above is that “as the organisational scale changes, so too do the tools and techniques used for analysing learner data alongside the types of organisational challenges that analytics can potentially address”.

Aside: I wonder/believe that the further you go up the organisational scale the greater the tendency there is for normative approaches which along with other factors limit change. i.e. to get “good” measures across all courses in a department, you have to be measuring similar (or even the same) stuff. Assuming a level of commonality that isn’t and perhaps shouldn’t be present.

Historical contributions

Mentions broader influences as: AI, statistical analysis, machine learning and business intelligence. But aims to talk about fields/research within education that contribute to Learning Analytics, including

  • citation analysis
  • Social network analysis
  • User modeling
  • Education/cognitive modeling
  • Intelligent tutoring systems

    Mentions Burns’ 1989 description of three levels of “intelligence” for these systems: domain knowledge, learner knowledge evaluation, and pedagogical intervention.

  • Knowledge discovery in databases
  • Adaptive hypermedia
  • e-learning

Aside: I get the feel that as learning analytics develops that there will be decisions/work that ignore some of the findings that arose in these historical influences. e.g. the finding from Decision Support Systems that DSS should be built with evolutionary methods.

Aside: The absence of Decision Support Systems suggests a few possibilities, including:

  • The DSS field has failed to give itself a profile outside of research, even though it covers data warehouses etc..
  • Learning analytics is still focused on the information and how to analyse and represent it, rather than how to actually use it to make decisions.
  • Or perhaps the focus on LA is to automate decisions, rather than support human decision makers.

LA tools, techniques and applications


Commercial tools

  • Statistical software packages – SPSS, NVivo etc.
  • Tools added to existing LMS type systems – S3, Course Signals
  • Web analytics tools
  • Tableau Software, Infochimps, also Many Eyes is mentioned

    Positioned as tools explicitly written to reduce the complexity of analytic tasks.

    Proposes that as these tools improve in ease of use, affordability etc. adoption will increase.

    Aren’t these just improved BI systems? But Tableau is pushing into education, giving it away to full-time students and instructors, only downside is that it’s a Windows application. But they also have Tableau online.

Research/open – “not as developed as commercial offerings and typically do not target systems-level adoption”

Aside: Fairly limited number of research/open tools listed. Wonder if that indicates something about the quality of the research tools or just the list the author had?

Techniques and applications

Proposes that these are two overlapping components:

  • Technique – specific algorithms and models for conducting analytics.

    An area of basic research “where discovery occurs through models and algorithms”. Then leading to application.

  • Applications – “ways in which techniques are used to impact and improve teaching and learning.

    Influence curriculum, social network analysis and discourse analysis

    Much arising from source disciplines, but some arising from LA researchers now.

A distinction that “is not absolute but instead reflects the focus of researchers”.

The table below summarises LA/EDM literature drawn upon to identify the relevant areas of each. Technique focuses on the type of analysis a researcher conducts. This distinction indicates the difficulty of definitions and taxonomies of analytics. The lack of maturity around this refelcts the youth of the emerging field/discipline.

Technique Application
(Baker and Yacef, 2009) (Bienkowski et al, 2012)
Prediction Modeling user knowledge, behaviour and experience
Clustering Creating profiles of users
Relationship mining Modeling knowledge domains
Distallation of data for human judgement Trend analysis
Discovery with models Personalisation and adaptation

Aside: I wonder what it says about me when I can’t – without a bit of reflection and perhaps further reading – see the connection between the list of applications and my need as a teacher?

  • Am I simply dumb/read enough/not thinking at the right level of abstraction?

    Should read Bienkowski et al (2012).

  • Is there a gap between the research and practice?
  • Wondering if I see myself more as a teacher than a LA researcher, hence the difficulty here?

Of course, the following table offers the missing translation

Learning analytics (LA) Techniques and Applications (adapted from Siemens, 2013, p. 9)
LA Approach Examples
Modeling Attention metadata
Learner modeling
Behavior modeling
User profile development
Relationship mining Discourse analysis
Sentiment analysis
A/B testing
Neural networks
Knowledge domain modeling Natural language processing
Ontology development
Assessment (matching user knowledge with knowledge domain)
Trend analysis and prediction Early warning, risk identification
Measuring impact of interventions
Changes in learner behavior, course discussions, identification of error propagation
Personalization/adaptive learning Recommendations: Content and social connections
Adaptive content provision to learners
Attention metadata
Structural analysis Social network analysis
Latent semantic analysis
Information flow analysis

Scope of data capture

“quality” data are required

“quality” == “captured as learners are engaged in authentic learning, where collection is unobtrusive

two obvious sources – Student Information Systems and LMS. Wearable computer devices – pedometers – and the quantified self get a mention.

Criticises existing projects

  • rely on data automatically collected
  • often incomplete and mere static snapshots in time

Suggests that to be effective, future projects “must afford the capacity to include additional data through observation and human manipulation of the existing data sets”.

Aside: Now that is interesting.

Makes the point that lecture hall data are limited to a few variables: who attended, seating patterns SRS data, observational data. Where as video lecture data is richer: frequence of access, playback, pauses and so on. The EdX type MOOC stuff adds to this by allowing linkages with the errors students make on questions.

“A single data source or analytics method is insufficient when considering learning as a holistic and social process” Calls for multiple analytic approaches to provide more information to educators and students. uses example of network analysis

researchers are using multiple methods to evaluate activity within a network, including detailing different node types, direction of interaction, and human and computer nodes

Knowledge domain modeling

Picks up on the argument for more “community-centric and multidisciplinary approaches” for the complex problems to be faced. This can be helped by mapping and defining knowledge areas. Linked to the need for “data structures and computational techniques”. Google’s Knowledge Graph is mentioned.

Having a knowledge domain mapped then enables further analysis through the use of learner data, profile information, curricular data etc. Perhaps enabling prediction, intervention, personalisation and adaptation. Adaptation may not be automated. Enabling sensemaking and wayfinding has some value.

Aside: a potential link to curriculum mapping, outcomes, AITSL standards and eportfolios.

Organisational Capacity

Rare to have people with the right mix of skills. Analytics project requires: accessing, cleaning, integrating, analysing and visualising data – before sensemaking. Needing programming, statistical knowledge, familiarity with the data and the domain before being able to ask relevant questions

Requires organisational support, outlines the requirements for an “intervention” system. Any initiative requires faculty support. Greller and Drachsler (2012, p43) and their six dimensions to be considered “to ensure appropriate exploitation of LA in an educational beneficial way”:

  • Stakeholders
  • Objectives
  • Data
  • Instruments
  • External limitations
  • Internal limitations

Aside: I prefer IRAC, but then I’m biased

This perhaps the most difficult and under appreciated point

The effective process and operation of learning analytics require institutional change that does not just address the technical challenges linked to data mining, data models, server load, and computation but also addresses the social complexities of application, sensemaking, privacy, and ethics alongside the development of a shared organizational culture framed in analytics.

LA Model

LA Model (see image below) is a systemwide approach to analytics. Resources are systematized. Points out that interventions and predictive models aren’t possible without top-down support.

Does suggest that “bottom-up approach” has always gone on. i.e. teaching staff using available tools to gather insights.

Siemens (2013) Learning Analytics Model

Aside: I can see the value in this model. It matches the typical enterprise, rational approach. But I also fear that it will suffer exactly the same sort of problems every enterprise approach has followed before. Personally, I think the relative amount of space taken up by the “information” component of IRAC speaks volumes. Would be good to overlay IRAC on this. The “Action” stuff is too small to capture the complexity and the variety of what goes on there.


The most significant challenges facing analytics in education are not technical.

Amongst the significant concerns are

  • Data quality
  • Enough data to accurately reflect the learning experience.
  • Privacy
  • Ethics

Aside: My own personal barrow has me asking why adoption and effective use are not included in the challenges. I mean it’s not as if educational technology has a long list of successful adoption stories about new technologies to share.

Data quality and scope

Scope – scope of capture of data from alternative sources – wearable computing and mobile devices.

Data interoperability posts a problem because of: privacy concerns, diversity of data sets and sources, lack of standard representation which makes sharing difficult.

Distributed and fragmented data are a problem. Data trails cover different systems. Good quote from Suthers and Rosen (2011) used

since interaction is distributed across space, time, and media, and the data comes in a variety of formats, there is no single transcript to inspect and share, and the available data representations may not make interaction and its consequences apparent (p. 65).

Leads into discussion of gRSShopper and the idea of “recipes” for capturing and evaluating distributed data citing (Hawksey, 2012; Hirst 2013)

Aside: An obvious avenue here for work around BIM.


With interactions online reflecting a borderless and global world for information flow, any approach to data exchange and data privacy requires a global view (World Economic Forum, 2011, p. 33).

New opportunities arising from technology yet to be fully adressed by the legal system – e.g. copyright and IP. A low level of legal maturity (Kay et al 2012) around privacy, copyright, IP and data ownership. Privacy laws differ form nation to nation. Rising interesting questions with cross-border education.

Learner control is important. MyData Button is an initiative allowing learners to download their data. Includes the following list of questions

  • Who has access to analytics?
  • Should a student be able to see what an institution sees?
  • Should educators be able to see the analytics from other courses?
  • Should analytics be available to prospective employees?
  • What happens when a learner transfers to a different institution?
  • How long is the data kept and can it be shared with other institutions?

Raises the transaction idea. i.e. students may choose to share their data in return for enhanced services.

The dark side

Aside: Ahh, now this is interesting and likely to be overlooked by most. It even forms just one paragraph in a long paper.

Draws on Ellul’s argument that technique and technical processes strive for the “mechanisation of everything it encounters”. Argues for the need to keep human and social processes central in LA activities. Learning is essentially social. Learning is creative requiring the generation of new ideas, approaches and concepts. Analytics only identifies what has happened. And this

The tension between innovation (generating something new) and analytics (evaluating what exists in data) is one that will continue to exist in the foreseeable future.

A personal reflection

Covers the emergence of the LA field.


Questions how the field will emerge. Recaps back to Anderson’s “more is different” and how educators deal with this will encourage the emergence of the field and related tools/approaches.


Anderson, P. (1972). More is Different. Science, 177(4047), 393–396.

Baker, R., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.

Bienkowski, M., Feng, M., & Means, B. (2012). Enhancing teaching and learning through educational data mining and learning analytics.

Buckingham Shum, S. (2012). Learning analytics. UNESCO policy brief. Retrieved from

Siemens, G. (2013). Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, (August). doi:10.1177/0002764213498851