Senior Site Reliability Engineer
Company: BrainGu
Location: Boston
Posted on: March 21, 2025
Job Description:
OverviewThis role sits within the Engineering Operations Value
Stream (EngOps) supporting our flagship Developer Experience
Platform, SmoothGlue. As a member of the EngOps team, you will be
responsible for working towards our SRE strategy and operating
model and helping to mature our SRE discipline.Building iteratively
with a strong understanding of the trade-offs required to implement
SRE frameworks and capabilities is a must have as well as a strong
willingness to collaborate. Automating yourself out of a job is not
viewed as a risk but rather a worldview that is required in this
role.You will work closely with our EngOps CTO and team as well as
our Platform Product team to help inform and drive roadmaps,
metrics, and overall organizational maturity.Responsibilities
- System Architecture and Design
- Design, implement, and manage highly available, scalable, and
fault-tolerant systems.
- Collaborate with software engineering teams to optimize
application performance and reliability.
- Evaluate and recommend appropriate technologies, tools, and
infrastructure solutions.
- Infrastructure Automation
- Develop and maintain infrastructure as code (IaC) using tools
like Terraform, Ansible, or similar.
- Automate deployment, configuration, and scaling of applications
and services.
- Implement continuous integration and continuous deployment
(CI/CD) pipelines.
- Monitoring and Incident Management
- Establish and maintain comprehensive monitoring, alerting, and
logging systems.
- Respond to incidents, troubleshoot issues, and ensure timely
resolution to minimize downtime.
- Participate in on-call rotations and post-incident analysis to
drive continuous improvement.
- Performance Optimization
- Analyze system performance and identify bottlenecks; implement
optimizations.
- Conduct capacity planning to anticipate future resource needs
and scalability requirements.
- Implement strategies to improve system response times and
overall efficiency.
- Security and Compliance
- Collaborate with security teams to implement best practices for
system and data protection.
- Ensure compliance with industry standards and regulations
relevant to the company's operations.
- Mentorship and Collaboration
- Provide guidance, mentorship, and technical leadership to
junior SREs and engineering teams.
- Foster a collaborative environment by sharing knowledge and
promoting best practices.Requirements
- Bachelor's degree or equivalent work experience.
- 6+ years of relevant work experience.
- Highly motivated self-starter with excellent interpersonal and
communication skills. Able to communicate efficiently at multiple
levels of seniority.
- Highly developed documentation skills.
- Experience working in customer-facing roles, customers may be
end-users, developers, or organizational leadership.
- Certification or formal training in site reliability
engineering concepts and practices.
- Prior experience working towards SLIs, SLOs, and observability
capabilities at a large scale.
- Experience working on observability, logging, and metrics
toolsets.
- Experience with k8s and container technologies such as Docker,
Openshift, RKE, and EKS.
- Experience troubleshooting routing and networking in a cloud
environment (AWS, GCP, or Azure).
- Experience with Secrets products such as HashiCorp Vault or
CyberArk.
- Highly effective at navigating large and complex
organizations.
- Ability to work under pressure and manage tight deadlines or
unexpected changes in expectations or requirements.
- Experience working in CISO or security-led organizations is
desirable but not essential.
- AWS Solutions Architect - Associate certification is
preferred.Tech Stack
- Kubernetes, Docker, Cri-O, Containerd, or other container
technologies.
- Major programming or scripting languages.
- Istio, Linkerd, Consul, or other service mesh.
- Ansible, Terraform, Helm, Kustomize, or other Infrastructure as
Code (IaC) and Configuration as Code (CaC).
- AWS, Azure, GCP, or other cloud technologies.Specific Job Needs
- Located in one of the following locations: Boston,
Massachusetts.
- Willing to obtain and maintain a Top Secret Clearance.
- Willing to travel up to 50%.
- Expected base salary of $150,000 - $170,000.
#J-18808-Ljbffr
Keywords: BrainGu, Boston , Senior Site Reliability Engineer, Engineering , Boston, Massachusetts
Didn't find what you're looking for? Search again!
Loading more jobs...