DATE: MARCH 22, 2022 – DAY 2, 12 PM TO 3 PM PDT
MOTIVATION AND GOALS
The advancement of learning analytics is directly linked to our ability to collect and process unprecedented amounts of data. The widespread use of technology and connected devices in the learning environment keep increasing the complexity and depth of the data collected (Joksimović et al., 2019). We are currently able to record and store every event and interaction within a learning environment. At the same time, it would be impossible to interpret or analyse these complex datasets without the support of the advanced analytical techniques provided by learning analytics. This includes the ability to store, share and combine the data we have.
Learning analytics, as a field of research and practice, is currently positioned at the intersection of two adverse realities. Recent technological advances allow for the unprecedented data collection possibilities both in terms of quantity and quality (Joksimović et al., 2019). However, ethical and privacy concerns related to the utilization of available data represent a critical issue that needs to be addressed to enable the full proliferation of learning analytics. How pertinent this issue is can be observed through some of the recent examples as well as events that coincide with the emergence of learning analytics. Specifically, the ideas put forward behind the former educational technology company called inBloom Inc are almost perfectly aligned with the goals outlined in learning analytics manifesto. Nevertheless, despite the enormous funding and political support, inBloom failed to gain public trust ending up in a backlash over “inBloom’s intended use of student data, surfacing concerns over privacy and protection” (Bulger et al., 2017, p. 4). More recent events with Facebook and Cambridge Analytica, numerous data breaches scandals resulting in billions of dollars of damages and fines or failure to use data in ethical way, do not contribute to raising trust in data and analytics in general, or learning analytics.
The aim of this workshop is to demonstrate our recent work on developing privacy-preserving learning analytics. This goes beyond just anonymization as we also account for re-identification risk based on the uniqueness of individuals’ attributes. We will discuss a variety of methods that provide measurable, policy driven, and provable mitigation mechanisms for maintaining learners’ privacy. We will also show that applying these mitigation solutions to the data will not prevent us from achieving our utility goals with LA.
Participants will have an opportunity to explore in practice a learning analytics toolbox developed based on the “privacy by design” principles, incorporating some of those novel algorithms.
Dr. Srecko Joksimovic, University of South Australia
Dr. Djazia Ladjal, Practera
Dr. Chen Zhan, University of South Australia
Dr. Thierry Rakotoarivelo, CSIRO
Alison Li, Practera
The workshop will run for a 3 hours session with introduction and presentations during the first half and a hands-on activity with discussion and feedback for the second half.
The full session will be recorded and the recording will be posted in this website.
|30min||Keynote + Q&A|
|20min||Presentation: Data Privacy Risks and Mitigations|
|20min||Presentation: Privacy-enhanced Learning Analytics|
|1h||Hands-on activity: assessing and mitigating re-identification risk in LA datasets|
|30min||Discussion and feedback|
WORKSHOP HANDS-ON ACTIVITY
During the hands-on activity, participants will have access to our online based Privacy Aware LA Toolbox and will be able to:
- Assess the re-identification risk of an LA dataset.
- Test different privacy risk reduction strategies.
- Compare the before and after application of privacy risk reduction to the data.
- Assess impact of privatising data on LA utility when applying a regression based predictive model.
We will provide test datasets for the demonstration but participants are welcome to use their own datasets. If you would like to use your own data, please review requirements in the next section.
If you would like to use your own data for the hands-on activity
In order to make sure that the dataset is compatible and can run in our toolbox the day of the workshop, you will be required to send the dataset to us for checks by February the 15th, 2022 (contact us first at email@example.com).
To ensure that the demonstration will run smoothly we will pre-upload all the datasets in the toolbox and make them available to all participants. For this reason, we would advise you to not send any datasets that you are not comfortable sharing with the workshop organisers and participants and the data should not contain any Personally Identifiable Information (PII). We will ensure that all datasets are only used to prepare for the workshop and during the workshop hands-on activity. All shared data will be deleted afterward.
In the interest of time we will require for test dataset to be limited to 5 attributes with a maximum of 5,000 data points. For the risk assessment portion of the demonstration, attributes can be either numerical, logical or textual.
For the LA regression modelling portion of the demonstration, attributes need to be in numerical form with a maximum of 4 independent variables and 1 dependent variable to be predicted.
If you would like to use your own data, please get in touch with us emailing firstname.lastname@example.org and we will coordinate with you the upload of the dataset.
Bulger, M., McCormick, P., & Pitcan, M. (2017). The Legacy of InBloom. Data & Society. https://datasociety.net/pubs/ecl/InBloom_feb_2017.pdf
Joksimović, S., Kovanović, V., & Dawson, S. (2019). The Journey of Learning Analytics. HERDSA Review of Higher Education, 6, 27–63.