BostonRecruiter Since 2001
the smart solution for Boston jobs

Senior Site Reliability Engineer

Company: BrainGu
Location: Boston
Posted on: March 21, 2025

Job Description:

OverviewThis role sits within the Engineering Operations Value Stream (EngOps) supporting our flagship Developer Experience Platform, SmoothGlue. As a member of the EngOps team, you will be responsible for working towards our SRE strategy and operating model and helping to mature our SRE discipline.Building iteratively with a strong understanding of the trade-offs required to implement SRE frameworks and capabilities is a must have as well as a strong willingness to collaborate. Automating yourself out of a job is not viewed as a risk but rather a worldview that is required in this role.You will work closely with our EngOps CTO and team as well as our Platform Product team to help inform and drive roadmaps, metrics, and overall organizational maturity.Responsibilities

  • System Architecture and Design
    • Design, implement, and manage highly available, scalable, and fault-tolerant systems.
    • Collaborate with software engineering teams to optimize application performance and reliability.
    • Evaluate and recommend appropriate technologies, tools, and infrastructure solutions.
    • Infrastructure Automation
      • Develop and maintain infrastructure as code (IaC) using tools like Terraform, Ansible, or similar.
      • Automate deployment, configuration, and scaling of applications and services.
      • Implement continuous integration and continuous deployment (CI/CD) pipelines.
      • Monitoring and Incident Management
        • Establish and maintain comprehensive monitoring, alerting, and logging systems.
        • Respond to incidents, troubleshoot issues, and ensure timely resolution to minimize downtime.
        • Participate in on-call rotations and post-incident analysis to drive continuous improvement.
        • Performance Optimization
          • Analyze system performance and identify bottlenecks; implement optimizations.
          • Conduct capacity planning to anticipate future resource needs and scalability requirements.
          • Implement strategies to improve system response times and overall efficiency.
          • Security and Compliance
            • Collaborate with security teams to implement best practices for system and data protection.
            • Ensure compliance with industry standards and regulations relevant to the company's operations.
            • Mentorship and Collaboration
              • Provide guidance, mentorship, and technical leadership to junior SREs and engineering teams.
              • Foster a collaborative environment by sharing knowledge and promoting best practices.Requirements
                • Bachelor's degree or equivalent work experience.
                • 6+ years of relevant work experience.
                • Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority.
                • Highly developed documentation skills.
                • Experience working in customer-facing roles, customers may be end-users, developers, or organizational leadership.
                • Certification or formal training in site reliability engineering concepts and practices.
                • Prior experience working towards SLIs, SLOs, and observability capabilities at a large scale.
                • Experience working on observability, logging, and metrics toolsets.
                • Experience with k8s and container technologies such as Docker, Openshift, RKE, and EKS.
                • Experience troubleshooting routing and networking in a cloud environment (AWS, GCP, or Azure).
                • Experience with Secrets products such as HashiCorp Vault or CyberArk.
                • Highly effective at navigating large and complex organizations.
                • Ability to work under pressure and manage tight deadlines or unexpected changes in expectations or requirements.
                • Experience working in CISO or security-led organizations is desirable but not essential.
                • AWS Solutions Architect - Associate certification is preferred.Tech Stack
                  • Kubernetes, Docker, Cri-O, Containerd, or other container technologies.
                  • Major programming or scripting languages.
                  • Istio, Linkerd, Consul, or other service mesh.
                  • Ansible, Terraform, Helm, Kustomize, or other Infrastructure as Code (IaC) and Configuration as Code (CaC).
                  • AWS, Azure, GCP, or other cloud technologies.Specific Job Needs
                    • Located in one of the following locations: Boston, Massachusetts.
                    • Willing to obtain and maintain a Top Secret Clearance.
                    • Willing to travel up to 50%.
                    • Expected base salary of $150,000 - $170,000.
                      #J-18808-Ljbffr

Keywords: BrainGu, Boston , Senior Site Reliability Engineer, Engineering , Boston, Massachusetts

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Massachusetts jobs by following @recnetMA on Twitter!

Boston RSS job feeds