Site Reliability Engineer

The Site Reliability Engineer (or SRE) will work closely with the Platform Group, delivery streams and other departments to build the resources needed to influence the SRE transformation. Other tasks include automate recurring tasks, troubleshoot and perform technical investigation to assist with reliability of our Fraedom products and services once in production.

Areas of Responsibilities:

Software Engineering

  • Develop object oriented .NET code.
  • Follow development standards and practices such as peer review and TDD.
  • Contribute to software development initiatives.
  • Ability to get up to speed quick on a new topic such as software frameworks or networking.

Influencing Reliability on the Platform

  • Contribute to a monitoring platform to teach and influence what reliability looks like.
  • Collaborate on SLAs and required error budgets to ensure SLAs are met or exceeded.
  • Contribute to and influence all aspects of the value stream.
  • System monitoring to ensure service level objectives are visible to the entire organization.
  • Continuously improve and drive standards to the Fraedom monitoring processes and tools.
  • Assist in automatic recurring tasks such as job scheduling and event response.

DevOps Practices

  • Understand of the value stream of the business.
  • Use of pipeline models for implementing SRE initiatives.
  • Understand the flow of work - how we work best together to complete the work.
  • Use of testing practices such as unit and spec testing.

Major Incident and Problem Management

  • Troubleshoot and contribute to resolving incidents.

Professional Development and Teamwork

  • Develop advanced product knowledge of Fraedom technologies and the Fraedom platform.
  • Create and maintain relevant documentation to ensure team knowledge base is continuously improving.
  • Share knowledge with the wider team.

Valuable Experience:

  • Site Reliability Engineering (SRE).
  • DevOps, Improvement Kata and experimental approaches.
  • Operational understanding of distributed systems.
  • Troubleshooting experience, especially on production systems.
  • Critical thinking skills and understanding of complex systems.
  • Familiarity and understanding of infrastructure.
  • Agile development processes.
  • Monitoring systems.
  • Source control (we use Git and SVN).
  • Scripting Language (PowerShell).
  • Transactional systems i.e. banking, finance, telecommunications, etc.
  • Experience with Cloud/SaaS environments.
  • Experience working with teams across multiple time zones.

Valuable Attributes:

  • Willingness to learn, and to be open to the scope of that learning.
  • Flexible and able to adapt to a changing environment.
  • Accurate and attention to detail.
  • Excellent comprehension and communication skills, both verbal and written.
  • Able to develop and maintain strong relationships, both internally and externally
  • Can work autonomously on time critical tasks.
  • Evaluate situations, gather and analyse facts and determine critical issues.