The state of educational data mining in 2009

The following is a summary and some initial reflections on the paper

Baker, S.J.D., Yacef, K. (2009) The State of Educational Data Mining in 2009: A Review and Future Visions: http://www.educationaldatamining.org/JEDM/images/articles/vol1/issue1/JEDMVol1Issue1_BakerYacef.pdf

It’s another reading for the first week of the LAK11 MOOC.

The format I use for these posts is that the overview section is essentially my summary/reflections on the paper. The rest of the sections are my potted summary of each of the sections of the paper.

EDM == Educational Data Mining

Disclaimer I let this post stew unfinished for a couple of weeks while I progressed the thesis. I’m posting it now unfinished. Time to move on with more recent things.

Overview

Gives a good overview/feel for the field. But as a high level description it can’t provide much detail about specific areas, but does provide the references to go digging.

Abstract

  • Methodological profile of early EDM research compared with 2008/2009 research.
  • Trends and shifts include
    • increased emphasis on prediction;
    • emergence of attempts to use existing models to make scientific discoveries
    • reduction in frequency of relationship mining.
  • Examine 2 ways of categorising the diversity in EDM research.
  • Review research problems addressed by the methods
  • Lists and discusses the most cited EDM papers.

Introduction

EDM field is growing, conferences, new journal. Time to review.

What is EDM?

Definition from http://www.educationaldatamining.org/

Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.

Suggests EDM is different from data mining (references an in press publication of one of the authors)

due to the need to explicitly account for (and the opportunities to exploit) the multi-level hierarchy and non-independence in educational data.

Which means models drawn from psychometrics is often used in educational data mining.

Now, I don’t know enough to be comfortable that I understand that, which means I should try and following up on that publication.

BAKER, R.S.J.D. in press. Data Mining For Education (a pre-print version). In International Encyclopedia of
Education (3rd edition), B. MCGAW, PETERSON, P., BAKER Ed. Elsevier, Oxford,
UK.

EDM Methods

Drawn from a variety of fields. Two attempts to categorise the methods are introduced, but Baker (in press) is the one it goes with.

Percentages in brackets represent percentage of EDM papers (1995-2005) using the method

  1. Prediction (28%)
    • Classification
    • Regression
    • Density estimation
  2. Clustering (~15%)
  3. Relationship mining (43%)
    • Association rule mining
    • Correlation mining
    • Sequential pattern mining
    • Causal data mining
  4. Distillation of data for human judgement (~18%)
  5. Discovery with methods
    A model is developed through any process that can be validated. It’s then used in analysis or mining.

First 3 are common in data mining.

Distillation of data not widely accepted in data mining, but matches a category in the other categorisation scheme for EDM which suggests it is common in EDM.

“Discovery with methods” most unusual from a data mining perspective.

Relationship mining most prominent in EDM.

Key applications of EDM methods

EDM research come from various fields: individual learning from software, CSCL, computer adaptive testing, student failure/retention.

Key areas

  • Student models;
    Improvement of student models a key application. Models represent student characteristics. Knowing differences enables different responses which suggests improving student learning. Some enable use in real-time. Applications include (all with references): are students gaming the system; experiencing poor self-efficacy; off-task; bored.

    In terms of student failure gives three references.
  • Domain knowledge models;
    Psychometric modeling frameworks + space-searching algorithms used to develop automated approaches from data.
  • Studying pedagogical support;
    i.e. which are most effective in which situations for which students.
  • Looking for empirical evidence to refine/extend educational theories/phenomena;
    e.g. Perera et al (2009) use Big 5 theory for teamwork to search for successful patterns of interaction in student teams.

Important trends in EDM research

Prominent papers from early years

Based on Google scholar citations, look at most prominent papers

  • (1st) Zaiane (2001) was one to propose and evangelise around EDM.
  • (2nd) Zaiane (2002) – a proposal – and (4th) Tang and McCalla (2005) – an instantiation – examine how EDM methods can help develop sensitive/effective e-learning systems.
  • (3rd) Baker, Corbett and Koedinger (2004) case study of EDM methods to open new research areas. e.g. scientific study of gaming the system.
  • (5th) Merceron and Yacef (2003) and (6th) Romereo et al (2003) present tools to support EDM.
  • (7th) Beck and Woolf (2000) use EDM prediction methods to develop student models.

Shift in paper topics over the years

Analytics, semantic web and cognitive science

I’m currently reading a draft of my wife’s PhD thesis. The thesis uses metaphor to examine the concepts that underpin research within the Information Systems discipline. It finds that research within the discipline appears to have a very heavy emphasis on techno-rational type conceptions of organisations, individuals and artifacts. There are various connections between this work and that of learning analytics and some of the assumptions behind the semantic web. This is an initial attempt to make some of these connections. Given limited time (I have to get back to commenting on the thesis), this has become more a place-holder of thoughts and ideas I need to explore more fully.

This post was prompted by this quote by Merlin Donald that is included in the thesis (emphasis added)

It is far more useful to view computational science as part of the problem, rather than the solution. The problem is understanding how humans can have invented explicit, algorithmically driven machines when our brains do not operate this way. The solution, if it ever comes, will be found by looking inside ourselves.

This captures some of my concerns when I start hearing computer scientists talk about intelligent tutors, the semantic web and other “big” applications of artificial intelligence. I don’t doubt the usefulness of these techniques in their appropriate place, however, I think it increasingly unlikely that they can effectively replace/mirror/simulate a human being outside of those limited places.

Another interesting quote from Merlin Donald’s home page

His central thesis is that human beings have evolved a completely novel cognitive strategy: brain-culture symbiosis. As a consequence, the human brain cannot realize its design potential unless it is immersed in a distributed communication network, that is, a culture, during its development. The human brain is, quite literally, specifically adapted for functioning in a complex symbolic culture.

Sounds like there are some interesting potential connections with connectivism and distributed cognition. A connection which – after a very quick skim – this paper (Donald, 2007) seems to make.

The first Donald quote mentioned above comes from the book The way we think: Conceptual blending and the mind’s hidden complexities (Fauconnier & Turner, 2003). A book that argues that conceptual blending is at the core of human thinking, or at least what makes us distinctive.

Lot’s more to read and ponder. For now, some questions

  • Is there a fit here with connectivism and/or distributed cognition (or similar)?
  • What implications do these ideas have for analytics and how it can make a difference?
  • What critiques are there of these ideas?

References

Donald, M. The slow process: A hypothetical cognitive adaptation for distributed cognitive networks. Journal of Physiology (Paris), 2007, 101:214-222.

Fauconnier, G. & Turner M. (2003). The way we think: Conceptual blending and the mind’s hidden complexities. New York, NY: Basic Books

The power of organisational structure

I find myself in an interesting transitionary period in learning. I’m in the final stages of my part-time PhD study, just waiting for the copy editor to check the last two chapters and then its submission time. I’m participating – participation that has been negatively impacted recently by the desire to get the thesis finalised – in a MOOC, LAK11 and looking at returning to full-time study as a high school teacher in training. It is from within this context that the following arises.

Yesterday I read a reflection on week 2 of LAK11 Hans de Zwart in which he quotes from a MIT Sloan Management review article on Big Data and analytics. The quote

The adoption barriers that organizations face most are managerial and cultural rather than related to data and technology. The leading obstacle to wide-spread analytics adoption is lack of understanding of how to use analytics to improve the business, according to almost four of 10 respondents.

This doesn’t come as a great surprise. After all, I think the biggest problems for universities when approaching many new technologies is grappling with the fact that most new technologies have biases that challenge the managerial and cultural assumptions upon which the institution operates. Being aware of and responding effectively to those challenges is what most institutions and those in power do really badly.

One contributing factor to this is that organisations and those in power work on assumptions that seek to maintain and reinforce their importance. Let’s use my experience as a starting university student as an example. As a new student at the university I am receiving all sorts of messages designed to help me make the transition back to study. Do you want to know what strikes me most about these messages and the transition assistance being provided?

That the organisation and communication of these help/transition resources correspond more to the structure of the organisation than to what might actually be useful to a new student. Some examples.

The “we’re here to help” message is a list of the different organisational units, which perhaps is not that surprising. But how about the “guide for students”.

Structure of a university guide for students

How would you expect a University guide for new students to be structured?

  1. By program?
    i.e. I’m enrolled in a Graduate Diploma in Learning and Teaching, a guide for those students?
  2. By discipline?
    i.e The GDL&T is within the education discipline, a guide for those students?
  3. By organisational unit?
    This university divides academic staff into schools and then schools into faculties (e.g. the Faculty of Arts, Business, Informatics and Education)
  4. One for the whole university?

Which would make the most sense? The more specific the guide, probably the more useful. But that might require more work (each program having its own guide) and lead to some fragmentation within the institution.

One of the whole university would reduce the workload and increase the commonality between students, however, it would fail to capture the diversity inherent in disciplines. I’m pretty sure that as a graduate education student, I’ll probably need to know things that are a bit different than an undergraduate engineer.

At this institution it is by organisational unit, by faculty. The institution only has two faculties. So there are two guides.

Content of the university guide

So, if the student guide is divided by faculty, then it must contain faculty specific information. Otherwise, why would there be a division.

The first really specific information mentioned was on page 12 of 19 when it mentioned residential schools for GDL&T students. However, some in the sciences and engineering do residential schools as well. On page 18 of 19 there is mention that Law students need to use a special referencing style. Apart from that there is no information that wasn’t generic to all students. Much check what’s in the other student guide.

Oh, this one starts differently. It has a letter from the Dean of the Faculty. Of course, it was only a couple of months into 2010 (by the way, both guides are still the 2010 guides, 2011 guides haven’t been uploaded yet even though a global “have you read the guides” message has been sent to all students) and the (acting) Dean had moved onto another role.

Another difference, this one mentions clothing and safety within laboratories and on field work. A lot more mention of RPL in this guide. Ahh specific information for engineering students. Must be a great help to all those non-engineering students in the faculty. And this one has screen shots of how students are to get assignment cover sheets, rather than the paragraph of text in the other guide.

So it does contain some different stuff, but still mostly institution level information and information that is already available in other forms elsewhere.

Why have these two guides?

In short, my answer would be, that the management of the two faculties have to do something. There doesn’t appear to be any other explanation why the student guides would be provided at this level. Not to mention that given they simply repeat information that is given elsewhere (and have yet to be updated for 2011) there’s probably no need for them. But it is something that has been done in the past, so it must be done now.

Organisational and cultural influences and problems for learning analytics

For me, this is an example of how organisational and cultural influences impact upon the effective delivery of learning and teaching within universities. Much of what is done, and why it is done, says more about the existing cultures, structures and agendas within the management of the institution than it does about what is best for learning and teaching.

And it won’t be any different for learning analytics. In many universities, the questions that will be asked of analytics will be those deemed important by management. It will be difficult for the questions asked to be designed to cater for the diversity of needs at the levels of discipline, program, teacher or student.

Which is why I’m worried when the Sloan article recommends this solution

Instead, organizations should start in what might seem like the middle of the process, implementing analytics by first defining the insights and questions needed to meet the big business objective and then identifying those pieces of data needed for answers.

The insights and questions that are defined are more likely to say something about the organisational and cultural influences of the host institution, than about what is best for learning and teaching.

The difference between utopian and dystopian visions

As part of the LAK11 course Howard Johnson has commented on an earlier post of mine. This post is a place holder for a really nice quote from Howard’s post, an example from recent media reports, and perhaps a bit of a reflection on responses to analytics.

The quote, some reasons and an example

I like this quote because it summarises what I see as the most common problem with the institutions I’ve been associated with. Especially in recent years as there’s been a much stronger move toward the adoption of more techno-rational approaches to management.

A utopian leaning vision can only be achieved with hard work and much effort, but a dystopian vision can be achieved with only minimal effort.

Improving learning and teaching within a modern university context is a complex task. There is no one right solution, there is no simple solution, no silver bullet. Improving learning and teaching is really hard work.

The trouble is that short-term contracts for senior management (which at some institutions now reach down to what were essentially head of school roles) and other characteristics of the organisational context mean that it is simply not possible for that really hard work to be undertaken. The organisational characteristics of Australian universities is increasingly biased towards a focus on the easy route. Something that can be implemented quickly, appear to return good results and enable a senior manager to boast about it when attempting to renew his/her contract and/or apply for a better job at a better institution.

Based on this argument, when I read this article (via @clairebroooks) and especially this quote from the article

Poor and disadvantaged students were clear winners, with university offers to students from low socio-economic backgrounds increasing by 8 per cent, following the higher participation targets set by the federal government after the 2008 Bradley review of higher education.

I find it very hard to believe that all of these institutions have adopted a utopian vision that has seen their learning and teaching practices, policies, resourcing and systems appropriately updated to respond to the very different needs and backgrounds of these students. Including the necessary re-visiting of the curriculum and learning designs used in their large introductory courses. The courses these students are going to be facing first and which traditionally, at most institutions in most disciplines, have significant failure rates already.

Instead, I see it much more likely that they’ve simply changed who they’ve accepted. At best, they may have thrown some additional resources (an extra warm body or two) to some central support division that is responsible for helping these students. These folk may even have had a couple of meetings with staff who teach those first year courses.

This is not to suggest there aren’t some brilliant folk doing fantastics work in both the central divisions responsible for the bridging and orientation of these students, or in the teaching of large first year courses. It is to suggest that this work is often/usually in spite of the organisational vision, not because of it. It is also to suggest that the existence of such work is almost certainly not repeatable or sustainable. My guess is you could go to any institution boasting how well it is serving these students and by selectively removing a handful of people cause the edifice of good practice to fall apart. The institutional systems wouldn’t be able to continue the good practice in the absence of those key folk.

The utopian vision professed by these institutions will be the result of the hard work of a few who have generally had to battle against the institutional vision and context.

One utopian vision for learning analytics

As Howard suggests much of the discussion of analytics has focused on the dystopian vision. It’s a vision I see as most the likely outcome. At least in the current institutional context.

But at the same time, I do believe that some applications of analytics can help improve the learning and teaching experience of students and staff. It’s important to be aware of and keep highlighting the dystopian vision, but it’s also important – and perhaps past time – to develop and move towards a utopian vision. Or at least to learn from trying. The following is an attempt at the early formulation of one of these. This particular vision connects with some of what I’ve been trying to do. The following does assume an institutional context for learning – that’s what I’m familiar with – am not sure how much of it would be useful for outside an institutional learning context.

Having just listened to John Fitz’s presentation via the lak11 podcast I’d like to pick up notion he mentioned of the self-regulated learner and the idea that analytics can provide useful assistance to that learner. A brief and incomplete summary of an aspect of John’s point would be that there is value in providing the learner with the information provided by analytics in order to enable the learner to make their own decisions.

I would like, however, to expand that to idea to the notion of the self-regulated teacher and the potential benefits that analytics can provide them. From my perspective there are at least three broad types of learner involved in any institutional learning context. They are:

  1. The formal student learner enrolled in a course/program.
    These folk are primarily interested in learning the “content” associated with the course.
  2. The formal teacher learner charged with running the course/program.
    These folk are/should be primarily interested in learning how they can improve the learning experience of the student.
  3. The institutional learner within which the course/program is offered.
    These “folk” are/should be primarily interested in learning how to improve the learning experience of the students and teachers within the institution. Similar to Biggs’ (2001) quality feasibility ideas. Though they are more often primarily interested in defining the learning experience, rather than engaging with and improving existing practices.

At this stage, I’m interested in how analytics can be used to help learner types 1 and 2. I’m keen on changing the learning/teaching environment for these learners in ways that help them improve their own practice (what I see as the task for learner type 3 and the task they aren’t doing). For right or wrong, for most of the higher education institutions I’m associated with the learning environment means the LMS. At least in terms of the contributions I might be able to make.

My small-scale utopian vision is the modification of the LMS environment to effectively bake in analytics informed services and modifications that can help student and teacher learners become more aware of possibly relevant improvements to their practice. Some examples include:

However, I don’t think these examples go far enough. There’s something missing. Additional thought needs to be given to the insights from the behaviour change literature which suggests that simply knowing about something isn’t sufficient to encourage change in behaviour.

This comes to the idea of scaffolding conglomerations. One idea for such a conglomeration might be to:

  • Embed SNAPP into an LMS (e.g. Moodle).
    At the moment, SNAPP is a browser based tool so it can only generate visualisations based on data in courses that the user has access to. For most people in most LMS this means you are limited by the inherent course division fundamental to LMS design. You can’t see and act upon the social networks evident in other courses.
  • Build around SNAPP some responses based on common patterns.
    One example might be a “Prompt all isolated students” feature that would present the academic with a template email (designed based on insights from theory or experience) that can be sent automatically to all discussion forum participants that aren’t connected to others. It might automatically include some statistics showing success rates between students that are isolated and those that are connected.
  • Enable user-contribution of common responses.
    Enable staff to add their own pattern response sequences.
  • Link SNAPP data with other Moodle and institutional data.
    Allow staff and students to see additional anonymised information with the SNAPP visualisations. e.g. shade red all those students who exhibit network connections similar to those who have failed the course previously.
  • Provide links to resources about good practice.
    When SNAPP detects a pattern where one person (e.g. the teacher) is the focal point of all interaction within a discusion forum, it provides a link to the literature and instructional design practice that suggests this is wrong and identifies approaches to modify practice.
  • Makes SNAPP data visible to other teachers within a cohort.
    All teachers within the psychology courses can see the network visualisations in each others courses. Thereby making visible and open for discussion social norms within those courses.

Time to stop worrying about dystopian vision (and also writing about a potential utopian vision) and start doing something. As per the Alan Kay quote

Don’t worry about what anybody else is going to do… The best way to predict the future is to invent it.

Analytics creating too much transparency? A two-edged sword?

Have been listening to a Dave Snowden podcast of a “101 organic KM course”. Amongst many familiar themes is the mention of the pitfalls of too much transparency hurting innovation.

He uses the example of expense accounts to illustrate the point. At one stage he had a large expense account which could be used to fund interesting and unusual approaches around his work. The innovation was possible was there was no itemisation/justification of the expense. Upon moving into a large company there came a requirement for itemisation. That itemisation kills off innovation.

This rings a bell at the moment, because of the current discussion about the problems with learning analytics and in particular George Siemens’ list of concerns.

In the dim dark past of the 90s, when I was an innovative, young university academic no-one took any notice of what I did within the courses I was teaching. I could do a lot of very different things that are documented in my publications from that time. Not all of them worked as I planned, but they all helped something interesting grow.

In part this was possible because of the very problem that often worried me about some of my colleagues. At that time, there were at least 2 or 3 of my fellow academics who were fairly widely known as being really bad educators. Even though one or two claimed to be great teachers, even a cursory glance at their practice and resources or a chat with a range of their students would confirm some really, really bad practice. What annoyed me at the time was that the system allowed their practice to be opaque. As long as they met various deadlines (even though they were often late) and had a reasonable grade distribution there practice was allowed to continue.

What I am only now starting to realise is that if that system wasn’t opaque, if it were too transparent, I probably wouldn’t have undertaken any of the innovative work I did. One explanation why not arises from Siemens’ list of concerns. In a university with analytics baked in and heavily relied upon by management to “manage”

  • The act of providing a quality learning experience has been reduced to a set of numbers and graphs that specify certain activities and tasks. In response to known patterns from analytics I am expected to perform certain tasks, perhaps even push certain buttons at certain times to encourage those patterns to happen again.
  • What is accepted is what is measured and has become the target. Anyone moving away from the established pattern is fighting the inertia of the organisation and its systems. (This was actually one of the problems I faced working within an institution with a history of industrial print-based education in the mid-1990s while attempting to use the Internet).
  • Different interpretations of what is good learning/teaching due to the diversity inherent in the disciplines, concepts, individual students and teachers is lost. You (and the students) are expected to follow the standard patterns that analytics has established as effective. (This is also my problem with the LMS. For some institutions it has become the case that you can do any online learning you want. As long as the functionality is provided within the LMS restrictions of quiz, discussion forum, assignment management etc.)
  • The smart/pragmatic academics and students will have identified what “analytics patterns” are required and figured out the least painful way to provide those requirements.
  • When something like the recent Queensland floods occur it will throw the analytics system into melt-down as the expect patterns won’t be there. For example, the two “late” letters I received in the post today (first post since before Christmas due to the floods) from Video Ezy asking for their DVD back. Regardless of floods cutting off all possibility of me returning it.
  • The “analytics patterns” will drive management to change policy and funding for practices so that only those patterns can be re-created. Anything that falls outside that norm will not be funded. (e.g. this is one of the major, unsolved problems the industrial, print-based distance education university had with online, it kept funding for f-t-f and DE, never figuring out that online could be different).
  • Since the “analytics patterns” have been established and the funding routinised management are able to treat the folk responsible for designing and delivering teaching like building blocks that can be replaced as needed.

And there’s more.

Learning analytics looks like being a two-edged sword.

Creating a podcast for LAK11 presentations

I’m currently participating in the Learning and Knowledge Analytics MOOC being run by George Siemens and others. This post outlines the process I used to create a podcast of the presentations (click on that link if you want to subscribe to the podcast) being given as part of the course.

Why?

The presentations are taking place within Elluminate and Elluminate recordings are made available. So why a podcast? Simply put the asynchronous and audio only nature better matches my preferences and context. So, I’ve repeated a process I use for the PLE/PLN symposium. More details below.

How?

The basic process is

  • Bookmark the mp3 files using del.icio.us using the tag lak11podcast.
  • Pass the RSS for that those tags produced by del.icio.us through feedburner to generate a podcast.
  • Subscribe to the podcast using iTunes or other software.

The one difference between this podcast for LAK11 and the PLE/PLN podcast, is that I couldn’t bookmark the original mp3 files. These files are made available via the LAK11 Moodle course. Attempting to access the files directly results in a redirect to the home page for the SCOPE Moodle instance where you can login as a guest and view the files.

Works fine if you are a person on the web, but podcast software like iTunes isn’t that smart.

The solution I adopted here was to copy the MP3 files out of the Moodle course into a location without a re-direct. In this case drop box. I was a bit reluctant to do this as these aren’t my files, however, I’m assuming that given the nature of the MOOC that this should be okay. If not, the files will be removed.

Limitations

At the moment, production of the podcast relies on new mp3 files being tagged by me with the tag lak11podcast. Would probably be more responsive if feedburner was set up to use anything anyone tagged with lak11podcast. For now, I’m leaving the restriction simply to save time and let me get on with some more reading. Happy to change it if people ask.

Introducing Hunch

One of the activities for the first week of the lak11 MOOC is to get started with using Hunch and reflect on it as a model for learning.

What is Hunch?

From the Hunch about page it is an application of machine learning to provide recommendations to users about what might be of interest to them on the web. It’s the work of a bunch of self-confessed MIT “nerds”.

Using Hunch

Creating an account on Hunch starts with logging in with either a Facebook or Twitter account. Went with Twitter. Some of the other LAK11 participants have queried the privacy question with this and then answering the questions.

The site now asks a range of questions using a fun (ish) approach using photos, increasing interest somewhat. It also provides feedback on what others have answered.

As others have noted there is a North American cultural bias to the questions.

Interesting, only 4% of respondents said they didn’t have a Facebook account.

After answering a few more than the minimum 20, Hunch presents a selection of recommendations. In this case five recommendations each for magazines, TV shows and books. I’m assuming that the categories of answers were also based on my answers. the recommendations are all good or close matches. All three categories included examples I had read/watched and enjoyed.

So, it appears that Hunch is designed with badges to earn as you use the site more, provide more information. There are other features sought to encourage connections and feedback between users. After all that would appear to be the currency that Hunch needs to generate its recommendations. The more connections, the better the math, the better the recommendations.

And perhaps that is the problem. I don’t feel the need for a site like Hunch to get the recommendations I want. I already have strategies, social networks and information sources that I use. I can’t see myself expending the effort on this sort of site. The question that is how many others might be bothered to provide this information?

That said, it does appear to be working fairly well already.

Reflections

After using Hunch, the LAK11 syllabus asks

What are your reactions? How can this model be used for teaching/learning?

and suggests sharing views in the discussion forum. I’m going to reflect here first and then check the discussion forum. Mainly because the following will be more stream of consciousness dumping than well-considered insight.

The obvious academic question to ask is what is meant by teaching/learning. Most of my experience has been/will be with more formal areas of learning and teaching and thus my reflections are likely to be coloured/biased by that experience.

My first observation (taking the viewpoint of a teacher) would be that any additional information about my students would be useful. Especially if a system like Hunch was able to provide useful recommendations. Such recommendations would be useful to the students as well, but I wonder how much freedom they would have to take up those recommendations within a formal educational setting. It would seem that what freedom does exist, lays with the teaching staff.

Such information in a L&T situation might feel somewhat similar to some of the learning style surveys that are around. Similarly, I wonder how much these type of things would reinforce existing categories/beliefs, rather than offering new paths or opportunities.

Am feeling that I’m somewhat ill-informed about the nature and capabilities of Hunch and thus somewhat ill-informed to reflect on its applicability to learning and teaching. Drawing some conclusions from the little I know means that they are building models based on answers to the questions. Then comparing that with models of the items/recommendations to come up with matches.

I wonder how difficult building these models would be for learning and teaching. It’s my understanding that disciplines such as physics have built fairly complex conceptual models of the domain, in particular for undergraduate studies. But it’s also my belief that the construction of such models was a fairly resource intensive task. Will the resource intensive nature make it difficult to implement a L&T focused Hunch? Then making the connections between other models would seem difficult. Hunch after all hasn’t handled the cross-cultural aspects all that well (probably was designed to retain the North American emphasis) and operates in an area (commercial products and services) in which there has been a lot of research and a lot of commercial interest/resources.

From the perspective of an motivated learner, a L&T flavoured Hunch could be very useful. But what percentage of learners would use such a system? e.g. given my reservations about using the current Hunch. Especially given that Hunch relies somewhat on the contributions the users make to the system. Given the limited percentage of folk that contribute content to social networking sites this is likely to limit a L&T flavoured Hunch even further.

This perhaps sums up my cynical view of the difficulty of effectively and appropriately applying analytics in L&T.

Let’s see if the Moodle discussion forum has more positive contributions.

Applying “learning analytics” to BIM

The following floats/records some initial ideas for connection two of my current projects, BIM and lak11. The ideas arise out of some discussion with my better half who is currently using BIM in one of the courses she is teaching.

Some brief background, BIM is a Moodle module that allows teaching staff to manage and encourage the use of individual student blogs. The blogs are hosted by an external blog provider of the student’s choice, not within Moodle. Typical use is to encourage reflection and make visible student work in order for comments and discussion.

BIM participation as indicator

The discussion started with the observation that by the second or third required blog post it was generally possible to identify the students in a course that would do really good and those that would do really bad. How and when the students provided their blog posts is a good indicator of overall result.

This correlation was first observed with my first use of BAM in 2006 (BIM stands for BAM into Moodle) and some findings of others.

This correlation was not something that was new. We were both able to make the observation that similar sorts of patterns exist with most educational practices. The difference is that the nature of the BIM assignments generally makes this more obvious. The discussion turned to what this pattern actually tells us?

Students with good practices

We ended up agreeing (as much as we ever do) that what this pattern is showing us is not that some students are smart and that some are not. Instead it is showing us that the “really good” students simply have the “really good” study practices. They are the ones reading the material, reflecting upon it and engaging with the assessment requirements. The “really bad” students just never get going for whatever reason. The rest of the students are generally engaging in the work at a surface level.

So, use of BIM is making this pattern more obvious, what should be done about it?

Encouraging connections

The tag line for the lak11 course is

Analyzing what can be connected

A thought that “connects” with me and what I think analytics might be good for. More specifically, my interest in analytics is focused more at the idea of

Using analysis to encourage connections

Which going by the definitions given in one of the early readings is close to what is meant by action analytics.

In the case of BIM, the idea consists of two tasks

  1. Analyse what is going on within BIM to identify patterns; and then
  2. Bring those patterns/analysis to the attention of the folk associated with a course in order to encourage action.

Some ideasOne idea

This leads to some ideas for additional features for bim. None, bits or all of them might get implemented.

Connect students with evidence of good practice

  1. Add a due date to each question a student is meant to respond to within a bim activity.
  2. Allow academic staff to choose (or perhaps create) a warning regimen.
  3. A warning regimen would be specify a list of messages to send to individual students based on the due date and the student’s own contributions to the bim activity. The specification might include
    • Time when to send messages.
      e.g. 1 week, 3 days and on the day.
    • Teacher provided content of the message.
    • Some bim analysis around the activity.
      e.g. it might include the number of students who have already submitted answers to the question, perhaps some summary of connections from previous uses of bim between when posts are submitted and overall performance. Some statistics or data about the posts so far e.g. amount of words, some textual statistics etc.
    • Links to other posts.
      This one could be seen as questionable. Links to other student posts could act as scaffolding for students not really sure what to post. Of course, the “scaffolding” could result in “copying”.

The idea being that being aware of what other students are posting or what is considered good practice would potentially encourage students, or at least make it more likely, that they may consider such practice.

This is very close to the idea behind Michael De Raadt’s progress bar for Moodle.

What “theories” exist?

One of the initial readings identified four main class of components for learning analytics. One of which is theory, which includes the statistical and data mining techniques that can be applied to the data.

I need to spend some time looking at what theories exist that might apply to BIM. e.g. I’m wondering if some of the textual analysis algorithms might provide a good proxy for evaluating the quality of blog posts and whether or not there might be some patterns/correlations with final/overall student results.

Learning analytics: Definitions, processes and potential

The following is the summary of my first reading for the LAK11 MOOC and follows on from my initial thoughts.

I decided to start with the paper title Learning analytics: Definitions, processes and potential as it appeared from the combination of the data published (Jan 2011) and title to give the more current overview. It’s also written by one of the course facilitators, so should have some connection to the course.

Summary

The paper essentially

  • Defines some terms/concepts;
  • Abstracts from some published “analytics processes” a common set of 7 processes/tasks.
  • Identifies four types of resources; and
  • combines them in the following model.

A model for learning analytics

The paper closes with what seems to be the ultimate goal of most of the folk involved with learning analytics – automated, individualised education. I’m not sure that this is a helpful aim. First, because I have my doubts that it can ever be achieved in the real world as opposed to a closed system (i.e. laboratory experiment). Second, because I think that there is a chance that having this as the ultimate aim will result in less focus on, what I think is the more fruitful approach of, working out how analytics can supplement the role of human beings in the teaching process.

Mm, that’s probably got a few assumptions within it that need to be unpicked.

The following is a slightly expanded summary of the paper.

Introduction

It starts with defining learning as “a product of interaction”. With the nature of the interaction being broadly different depending on the assumptions underpinning the learning design.

Regardless, we want to know how well things went. Traditional methods – student evaluation, grade analysis, instructor perceptions – all have limitations and problems.

Question: What are the limitations and problems with learning analytics? There is no silver bullet.

As more learning is computer facilitated, there’s interest in seeing how data accumulated can be used to improve L&T…leading to learning analytics. The application of statistics to rich data sources to identify patterns is already being used in other fields to predict future events.

The paper aims to review literature on analytics and define it, its processes and potential.

Learning analytics and related concepts defined

The cynic in me finds the definition of business intelligence particularly frightening/laughable. I do need to learn to control that.

Term Definition
Learning analytics “emerging field in sophisticated analytic tools are used to improve learning and education”..drawing from other fields of study
Business intelligence established process through which decision makers in the business world integrate strategic thinking with information technology to synthesize vast amounts of data into powerful decision making capabilities
Web analytics using web site usage data to understand how well the site is achieving its goals.
Academic analytics application of the principles and tools of business intelligence to academia
Or more narrowly by other authors, is to examine issues around student success
Action analytics greater emphasis on generating ‘action’, i.e. applying data in a “forward thinking manner”

Does mention the problems faced when implementing these type of strategies with existing institutional arrangements, especially around data/system ownership. Suggests that learning analytics is intended more specifically to address these issues. Especially in terms of providing the data/analysis to students/teachers within the teaching context. Right up to some of the automated/intelligent tutoring type approaches.

Thus, the study and advancement of learning analytics involves: (1) the development of new processes and tools aimed at improving learning and teaching for individual students and instructors, and (2) the integration of these tools and processes into the practice of teaching and learning.

I can live with that. It’s what I’m interested in. Sounds good.

Learning analytics processes

Essentially a collection of four different models/abstractions of how to do this stuff and then a synthesis into a common 7 processes of learning analytics

  1. select
  2. capture
  3. aggregate and report
  4. predict
  5. use
  6. refine
  7. share

Knowledge continuum

This is the DIKW (Data/Information/Knowledge/Wisdom) stuff which some of the KM folk, including Dave Snowden, don’t have a lot of time for. In fact, they argue strongly against it (Fricke ??).

TO DO: There is much of interest in Fricke (2007), I have not read it through and some appears heavy going, I should take the time. An interesting reference/quote is this one

Results from data mining should be treated with skepticism

drawn from some work that and describe more here

The DIKW stuff is connected to learning analytics through some work that suggests things like “Through analysis and synthesis that (sic) information becomes knowledge capable of answering the questions why and how”.

Another to do: Snowden’s thoughts on DIKW and his work suggest another “process” for learning analytics. Should take some time to look at that.

Web analytics objectives

From Hendricks, Plantz and Pritchard (2008), “four objectives essential to the effective use of web analytics in education:

  1. define the goals or objectives;
  2. measure the outputs and outcomes;
  3. use the resulting data to make improvements; and
  4. share the data for the benefit of others.

Five steps of analytics

Campbell and Oblinger (2008)

  1. capture
  2. report
  3. predict
  4. act
  5. refine

Collective application model

Summary of a Dron and Anderson model

Learning analytics tools and resources

Draws on various source to suggest that “learning analytics consists of”

  • Computers;
    Includes an interesting overview of the different bits of technology (and their limitations) that are currently available. Including some references criticising dashboards.
  • People;
    Interestingly, this is the smallest section of the four, but perhaps the most important. In particular, the observation that developing effective interventions remain dependent on people.
  • Theory;
    Points to the various “kernel theories” for analytics and the observation by MacFadyen and Dawson (2010) that there’s little advice which of these work well from a pedagogical perspective.
  • Organisations.
    Importance of the organisation in developing analytics and some of the standard “leadership is important” stuff

A start to the “Introduction to Learning and Knowledge Analytics” MOOC

So, the year of study begins. First up is an attempt to engage in a MOOC (Massive Open Online Course) on Learning and Knowledge Analytics. This first post aims to contain some reflection on the course syllabus and what I hope to get out of the course.

The problem and the promise

As the course description suggests

The growth of data surpasses the ability of organizations or individuals to make sense of it

This is a general observation, but it also applies to learning and teaching related activities.

The promise is that analytics through techniques such as modelling, data mining etc will aid the analysis of this data and help people and organisations to make sense of all the data. To improve their decision making, learning and other tasks.

The aim of the course is as

a conceptual and exploratory introduction to the role of analytics in learning and knowledge development

. It is an introductory course, no heavy math.

My reservations

I’ve dabbled in work that is close to analytics, but have always had some reservations about its promise. One of the aims of engaging in the course is to encourage me to read and reflect more on these reservations. A quick summary/mind dump of those reservations includes:

  • The data is not complete;
    At the moment, the data that is available for analytics is limited. e.g. data from an LMS gives only a very small picture of what learning and learning related activities are going on. Consequently, data driven decision making is overly influenced by the data that is available, rather than the data that is important.
  • Models and abstractions are by nature lossy;
    A lot of analytics is based on mathematical/AI models or abstractions. By definition these “abstract away” details that are deemed to be not important. i.e. information is lost.
  • Not every system is causal, except in retrospect;
    There often feels to be an assumption of (near) causality in some of this work. There are some events/systems/processes which simply aren’t causal. There is no regular, repeating pattern of “a leading to b”. Just because a lead to b this time, doesn’t mean it will next time. Some of this is related to the previous two points, but some of it is also related to the nature of the systems, especially when they are complex adaptive systems. It will be interesting to hear Dave Snowden’s (one of the invited speakers) take on this later in the course as this reservation is directly influenced by his presentations.
  • People aren’t rational;
    Personally, I don’t think most people are rational. This shouldn’t suggest that people aren’t somewhat sensible in making their decisions. One’s decisions always make sense to oneself, but they are almost certainly not the decisions that someone else would have made in the same situation. As part of that, I think our experiences constrain/influence our decision making and actions.

    This generates two concerns about analytics. First, I wonder just how much change in decision outcomes will arise from the folk seeing all the nice, lovely new visualisations produced by analytics. Are people going to make new decisions or simply use the visualisations to justify the same sub-set of decisions that their experiences would have led them to make. Second, how common amongst learners will be the patterns, models and correlations that arise from analytics? Just because the model says I did “A-B-C” does that really imply I was doing it for the same reasons as the other 88% of the population?

  • Is there enough information;
    I believe, at this currently ill-informed stage, that some (much?) of the usefulness of analytics arises from a reliance on big number statistics. i.e. there’s so much data that you get useful correlations, patterns….How many existing institutions are going to have sufficiently big data to usefully use these techniques?
  • The technologists alliance;
    Geohegan suggests there is a technologists’ alliance that has alienated the mainstream through the inability to produce an application of technology that is of absolutely compelling value in pragmatic, mainstream terms that provides the compelling reason to adopt. I think it’s important that there be researchers and innovators pushing the boundaries, but there is too little thought given to the majority and applications of innovations/new technologies/fads that they see as useful. SNAPP is a good start, but there’s some more work to be done.
  • Yet another fad;
    Analytics is showing all the hallmarks of a fad. There will almost certainly be some interesting ideas here, but the combination of the previous reservations will end up it in being misapplied, misunderstood and ultimately have limited widespread impact on practice.

    As evidence of the fad, I offer the photo below that comes from this blog post (which I reference again below).

    heads of data explosion/exploitation

  • Ethical related questions;
    A post from Johnathan MacDonald on “The Fallacy of Data Bubble Ignorance” includes the following quote

    People don’t want to be spied on. It’s an abuse of civil liberty. The fact that people don’t realise they are being spied on, is not justification to do so. Betting on a business model that goes against how society really works, will ultimately end in disaster.

    If this holds, does it hold for analytics. Will the exploitation of learning analytics lead to blow back from the learners?

    For some of the above reasons, I am not confident in the ability for most organisations to engage in the use of analytics in ways not destined to annoy and frustrate learners. Many are struggling to implement existing IT systems, let alone manage something like this. I can see the possibilities of disasters.

  • Teleological implementation.
    This remains my major reservation about all these types of innovations. In the end, they will be applied to institutional contexts through teleological processes. i.e. the change will be done to the institution and its members to achieve some set plan. Implementation will have little contextual sensitivity and thus will have limited quality adoption and will be blind to some of the really interesting context innovations that could have arisen.

A bit of duplication and perhaps some limited logic, but a start.

Onto the week 1 readings.