TI1 - 2022/23 - Critical - Moodle SRE

Size

Medium 

Budget Epic Name

CTP Maintenance Budget

Jira Epic

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Feature LeadAlistair Spark
Team

David Kwaw

Nikola Bozhkov

This feature encapsulates the need for Moodle to be pro-actively monitored and performance issues dealt with before they cause any CIs.

This ties in with the idea of an LA Data Availability team but more generally an Application SRE function  (Site Reliability Engineering: https://cloud.google.com/blog/products/devops-sre/how-sre-teams-are-organized-and-how-to-get-started).

Key areas of focus for TI1:

  • Active monitoring during start of academic year and end of term assessment period 
  • Engage auto-scaling and monitor it's speed scaling for peaks is adequate
  • Refine the coursemodinfo caching infrastructure to reduce cost while fulfilling the bandwidth requirement
  • Drive load testing of Moodle 4.1 and pipeline based load testing
  • Continue pushing performance related core trackers 

Some of the key activities that still need to be progressed:

  • Post CI strands of work (Catalyst development but exchange and test) 
  • Regrading issue - https://wrms.catalyst.net.nz/wr.php?request_id=378838
  • Cloudfront / S3 signed URLs ( Unable to locate Jira server for this macro. It may be due to Application Link configuration. ) - if not completed in TI2
  • Active monitoring of the Redis / frontends / etc during peaks of load 
  • Drill through any blips in response times and document causes
  • Push for resolution of any identified flaws
  • Explore options for automating load testing (will need to time bound the effort on this)
  • Improve CI comms channel - ISD News editing by SO & reach out to Mike Haward about Status page & get this reset to be generic - https://www.ucl.ac.uk/isd/moodle-under-maintenance
  • Create a Moodle maintenance/outage page that can be used for traffic redirection in the event of a Moodle outage. This page needs to be editable by the Moodle team. Consider setting up a Moodle_Status Twitter feed as a short term measure if we are unable to obtain an editable page.


Moodle uptime is critical and this feature will always come before anything else. We currently rely on Catalyst to develop fixes for us, this will change over time but we are well resourced so this should not be seen as a barrier.