Senior Infrastructure Engineer

Date - JobBoardly X Webflow Template
Posted on:
 
August 8, 2025

Job description

Description

A Senior Infrastructure Engineer is responsible for delivering scalable infrastructure for product development teams and secure, performant network infrastructure for global teams and services. This role champions observability and performance, and involves designing, building, managing, and supporting the infrastructure services that underpin Skedulo’s internal and customer-facing services, ensuring high availability, reliability, and security for all critical systems.

Responsibilities

Responsibilities

Day to Day

  • Anticipate and communicate blockers, delays, and cost overruns to relevant stakeholders, proactively seeking solutions.Monitor resource consumption for supported services within AWS accounts and identify potential bottlenecks in applications or Kubernetes cluster performance.
  • Contribute to multi-team decisions aligned with Product Development (PD) goals, actively participate in discussions, back collective decisions, and take responsibility for their successful execution and outcomes.
  • Ensure commitments are realistic, clearly understand priority and urgency of tasks, and consistently deliver high-quality work within agreed-upon timelines.
  • Participate in on-call rotation for supported services, providing timely and effective incident response and resolution to minimize downtime. This specifically includes troubleshooting issues related to AWS infrastructure, Service failures and overall Kubernetes cluster health..

Technical Duties

  • Assist in supporting and managing Skedulo’s infrastructural setup from Development to Production
  • Provide effective input to technical discussions and decisions impacting the company's ability to deliver quality engineering solutions, offering insights based on deep technical expertise.
  • Consult and assist with implementation projects across PD, providing infrastructure expertise, guidance, and hands-on support.
  • Contribute to observability efforts across multiple groups, aligning with PD's overarching observability approach by implementing and maintaining monitoring, logging, and alerting systems.
  • Deliver infrastructure enhancements with technical assistance, working independently or collaboratively to implement improvements and new features.
  • Contribute to incident response, recovery, review, and remediation, actively participating in post-incident analysis to identify root causes and implement preventative measures.

Communication Duties

  • Communicate effectively, clearly, and concisely (written and verbal) within your squad on both technical and non-technical subjects, ensuring information is understood by all team members.
  • Collaborate effectively with your squad to resolve blockers and complete tasks, fostering a team-oriented environment.

Leadership Duties

  • Models Skedulo and PD values by demonstrating integrity, accountability, and a commitment to continuous improvement.
  • Mentor junior engineers, sharing knowledge and best practices to foster their professional growth.
  • Act as a technical leader within the team, driving technical discussions and decisions, and championing architectural best practices.
  • Contribute to the continuous improvement of infrastructure processes and tools, proactively identifying areas for enhancement.

Job requirements

Minimum Qualifications

  • 3+ years cloud experience (AWS preferred, particularly in multi-account organizations): Proven experience designing, deploying, and managing scalable and secure infrastructure on Amazon Web Services (AWS), including core services like EC2, S3, RDS, VPC, IAM, and Lambda. Familiarity with multi-account strategies and governance.
  • 2+ years container orchestration experience (Kubernetes preferred): Hands-on experience with containerization technologies and orchestration platforms, specifically Kubernetes, including deploying, managing, and troubleshooting containerized applications.
  • 2+ years cloud automation experience (Terraform preferred): Demonstrated proficiency in Infrastructure as Code (IaC) principles and tools, with significant experience using Terraform for provisioning and managing cloud resources.
  • Strong understanding of Linux operating systems.
  • Experience with CI/CD pipelines (e.g., Github Actions, CircleCI, Jenkins).

Desired Skills & Experience

  • Software development ability (Python and NodeJS in particular) for automation, and tooling
  • Strong understanding of networking concepts, including TCP/IP, DNS, routing, and load balancing.
  • Ability to communicate complex technical concepts and opinions effectively to both technical and non-technical audiences.
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog).
  • Familiarity with configuration management tools (e.g., Ansible, Terraform, Cloudformation).

Additional Qualifications

  • Must be authorized to work without visa sponsorship to be considered for this position.
  • Must have reliable internet connection and an at home workstation setup. (For remote positions)