This research project will leverage and adapt or extend a number of approaches to data privacy and security to develop a privacy preserving analytics solution for the EdTech industry. The aim of the research will be to create measurably and non-reversibly de-identified datasets for Learning Analytics purposes. The solution must cover all types of information :
- Numerical and categorical data
- Rich data including video and voice
- Unformatted text
These could theoretically be used for other “secondary use” purposes as well (such as test data preparation or demonstration data), but our focus for this project will be Learning Analytics. The de-identified data will be able to meet a set of predefined “utility” and “efficacy” criteria for specified analytical goals.
Tangible outcomes of this project will include: A Privacy-preserving Learning Analytics Platform providing data preparation, de-identification and analysis of EdTech data, including:
- Data privacy risk measurement service – providing a consistent measure across any dataset of the risk of identification of an individual or group using standard re-identification attack mechanisms. Will also provide recommendations on risk reduction approaches based on the proposed use of the data as well ask risk appetite of the user.
- Data preparation and de-identification Service, targeted specifically to Learning Analytics techniques. Data will be measurably de-identified for demonstrable and verifiable protection of personal and sensitive student information.
- Specialist LA algorithms targeting complex learning data and tools for validation of the results.
The primary use of this platform is to provide a standardised and verifiable data protection capability for learning data, enabling it to be safely shared with analysts and researchers, and merged with other datasets.
Learning analytics algorithms may also be adapted to work with the anonymised datasets. The platform will therefore include a set of algorithms purpose-built to perform Learning Analytics tasks on the de-identified data.