Learning Analytics and Data Privacy in Education Technology

It is reported that global spend on Education Technology (EdTech) will reach $250bn by the end of 2020 (EdTechXGlobal). It is a fast-growing sector with students all over the world increasingly conducting at least some of their learning on a purpose-built online learning platform. Students learning online include school (k-12) and university students, as well as people in a wide variety of adult education settings from  traineeships and apprenticeships to career development and re-training. 

As the number of students using EdTech increases, so does the pool of data which the students leave on these online platforms. This data is increasingly seen as the key to better understanding how students learn. The relatively new field of Learning Analytics (LA) uses the information from online EdTech platforms to gain insights and generate models of our students and their learning processes. Lines of enquiry are extremely broad, including for example, modelling skills development and competencies, understanding student motivations and engagement, or mining student activities and attitudes to develop and test theoretical models of learning. The promise of LA is a significant increase in the quality and effectiveness of online learning.

Even for the skilled and initiated, current approaches to data privacy in EdTech are inadequate and ad hoc. Privacy preservation approaches are hard to find and difficult to understand.

The good new is, the global research community has made significant advances in the development of effective and provable privacy risk measurement and reduction. This project pulls together a multi-disciplinary team of researchers and education industry leaders to build on and adapt these technologies to Education and Learning Analytics. We are a passionate group with a shared vision to build a data privacy risk management platform and enable the education industry to grow and meet the needs of our students and educators in the 21st century.

Our Project Goals

We aim to develop the tools and techniques for easy and automated data privacy management so that educators and EdTech providers can use their data in innovative ways that will enable them to grow their businesses and improve the learning outcomes of their students.

Based on the original research of our collaboration partners CSIRO/Data61 and University of South Australia

What if there were a way to consistently measure privacy risk across any data. To analyse student data and come up with one or two universal measures of the risk of personal re-identification. A consistent measure of privacy risk would allow us to develop standards or measures of “fitness” for our data, based on our risk appetites, and consistent, policy-based data management and sharing approaches. It would allow us to establish data sharing agreements in a similar way. Most importantly, it would allow us to measure how effective our privacy risk reduction mechanisms really are before releasing it to the public or distributing within our organisations for analytics and reporting.

What if there were a set of privacy risk reduction tools. Tools which abstracted away from the very technical and specialised techniques required for privacy risk reduction today. Which offered a range of risk-reduction options depending on your specific needs and risk appetite. And which helped strike the right balance of measurable (and provable) privacy risk reduction with adequate utility of the data for analytics purposes. These tools, in conjunction with a consistent measure of privacy risk and risk reduction, would help lift the capability the Education and EdTech industry in data privacy risk management and enable a “privacy first” approach to Learning Analytics projects. And, in so doing, allow us to share and make greater use of learning data, and innovate and improve online learning, whilst measurably protecting the privacy of our learners, teachers and mentors.

Tangible Outcomes

Tangible outcomes of this project will include: A Privacy-preserving Learning Analytics Platform providing data preparation, de-identification and analysis of EdTech data, including:

  1. Data privacy risk measurement service – providing a consistent measure across any dataset of the risk of identification of an individual or group using standard re-identification attack mechanisms. Will also provide recommendations on risk reduction approaches based on the proposed use of the data as well ask risk appetite of the user.
  2. Data preparation and de-identification Service, targeted specifically to Learning Analytics techniques. Data will be measurably de-identified for demonstrable and verifiable protection of personal and sensitive student information. 
  3. Specialist LA algorithms targeting complex learning data and tools for validation of the results.

The primary use of this platform is to provide a standardised and verifiable data protection capability for learning data, enabling it to be safely shared with analysts and researchers, and merged with other datasets. 

Learning analytics algorithms may also be adapted to work with the anonymised datasets. The platform will therefore include a set of algorithms purpose-built to perform Learning Analytics tasks on the de-identified data.