TI4 - 2021/22 - Critical - Moodle SRE
Size | Medium |
---|---|
Budget Epic Name | CTP Maintenance Budget |
Jira Epic | Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration. |
Feature Lead | David Kwaw & Nikola Bozhkov |
Team | Alistair Spark Ehsan Anwar David Kwaw Nikola Bozhkov |
This feature encapsulates the need for Moodle to be pro-actively monitored and performance issues dealt with before they cause any CIs.
This ties in with the idea of an LA Data Availability team but more generally an Application SRE function (Site Reliability Engineering: https://cloud.google.com/blog/products/devops-sre/how-sre-teams-are-organized-and-how-to-get-started).
Key deliverables for TI4:
- Backport of MDL-72837 load tested & applied in prod
- Formalise Load Event Investigation & write up standard to be consistent
- SRE/Ops Team training - ongoing
- Expand SRE to Assessment@UCL, UCL eXtend (tbc)
Some of the key activities that still need to be progressed:
- Post CI strands of work (Catalyst development but exchange and test)Â
- Regrading issue - https://wrms.catalyst.net.nz/wr.php?request_id=378838
- Cloudfront / S3 signed URLs ( ) - if not completed in TI2
- Active monitoring of the Redis / frontends / etc during peaks of loadÂ
- Drill through any blips in response times and document causes
- Push for resolution of any identified flaws
- Explore options for automating load testing (will need to time bound the effort on this)
- Improve CI comms channel - ISD News editing by SO & reach out to Mike Haward about Status page & get this reset to be generic - https://www.ucl.ac.uk/isd/moodle-under-maintenance
- Create a Moodle maintenance/outage page that can be used for traffic redirection in the event of a Moodle outage. This page needs to be editable by the Moodle team. Consider setting up a Moodle_Status Twitter feed as a short term measure if we are unable to obtain an editable page.
Moodle uptime is critical and this feature will always come before anything else. We currently rely on Catalyst to develop fixes for us, this will change over time but we are well resourced so this should not be seen as a barrier.
This information is provided by Digital Education
( https://www.ucl.ac.uk/isd/digital-education-team-information ) and licensed under a Creative Commons Attribution-ShareAlike 4.0 International License