Site reliability engineer (SRE)
BBD is looking for site reliability engineers (SREs) with cloud-native experience as well as fluency in Java / JavaScript / Golang / Python to assist in delivery of software solutions
The company
BBD is an international software solutions company that solves real-world problems with innovative solutions and modern technology stacks. Our experience spans the education, financial services, gaming, insurance, telecoms and public sectors. We maintain our track record by using our vast business domain knowledge and world-class skills to successfully deliver digital solutions for clients.
The complex problems we solve are balanced out by our flexible working culture and flat management structure. Being a part of BBD means working on dynamic project teams, while pursuing your own career growth through our Continuous Learning Programme.
The role
BBD is looking for SREs to assist in the delivery of software solutions to various clients. This includes:
- Focussing mainly on delivering and maintaining production-grade systems which adhere to the pillars of SRE
- Ability to maintain a calm temperament in the face of potential incidents as well as good interpersonal skills when communicating with clients and other team members
- Workng in an Agile team follow and interacting closely to guide and assist engineering teams as they deliver business functionality
- Driving highly available and resilient architecture decisions, then implementing them with your team
- Mentoring and advising junior SREs as they grow in the field
Requirements:
- 3+ years of experience as a DevOps or site reliability engineer
- 2+ years of experience as a software engineer with experience in popular languages such as Java / JavaScript / Golang / Python
- An understanding of distributed systems, service architectures, cloud native systems, the problems they attempt to solve as well as the related trade-offs to contribute to feature and service design
- Familiar with the implementation of monitoring and observability solutions (logging, metrics, and distributed tracing)
- Public cloud experience (AWS / Azure / GCP), certification preferable
- Local Infrastructure experience (virtualisation, *NIX systems)
- Knowledge of important networking concepts (HTTP and REST, SSL / TLS, SSH, etc)
- Container experience
- CNCF tools experience such as Docker and Kubernetes
- Experience with implementing, maintaining and troubleshooting service mesh solutions, such as Istio
- Proven experience with production systems and dealing with production issues
- Out of the box thinking to solve infrastructure and operational problems
Expert knowledge of:
- Configuration management tools and Infrastructure-as-Code tooling and practices
- Systems monitoring, alerting and analytics (NewRelic, Graphite, ELK, EFK, Nagios, Ganglia, Grafana, Prometheus, etc.)
- Implementing SLIs and maintaining SLOs / SLAs
- Incident management, on-call responsibilities, and post-mortems
- Production readiness reviews of microservice workloads
- Toil automation using one of the aforementioned programming languages
- Intersystem integration mechanisms (REST APIs, SSH file delivery, site-to-site VPNs)
BBD is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, family, gender identity or expression, genetic information, marital status, political affiliation, race, religion or any other characteristic protected by applicable laws, regulations or ordinances.