*This position is on-site in San Antonio, Texas*
*This position requires security TS/SCI clearance and a full scope polygraph*
Job Summary:
As a Site Reliability Engineer (SRE) at Team Cymru, you’ll be at the forefront of maintaining our user-facing services and production systems. In this role, you’ll blend the best of operational expertise and software craftsmanship, applying cutting-edge engineering principles, operational discipline, and innovative automation to both our environments and codebase.
As an SRE, you’ll focus on a range of systems including operating systems, storage subsystems, and networking. You’ll champion best practices for availability, reliability, and scalability, all while delving into algorithms and distributed systems.
Supervisory Responsibilities:
Duties/Responsibilities:
- Study product characteristics or customer requirements to determine validation objectives and standards.
- Analyze validation test data to determine whether systems or processes have met validation criteria or to identify root causes of production problems.
- Develop validation master plans, process flow diagrams, test cases, or standard operating procedures.
- Prepare detailed reports or design statements, based on results of validation and qualification tests or reviews of procedures and protocols.
- Conduct validation or qualification tests of new or existing processes, equipment, or software in accordance with internal protocols or external standards.
- Communicate with regulatory agencies regarding compliance documentation or validation results.
- Prepare, maintain, or review validation and compliance documentation, such as engineering change notices, schematics, or protocols.
- Recommend resolution of identified deviations from established product or process standards.
- Design validation study features, such as sampling, testing, or analytical methodologies.
- Create, populate, or maintain databases for tracking validation activities, test results, or validated systems.
- Install racked equipment, labeling and cable management.
- Resolve testing problems by modifying testing methods or revising test objectives and standards.
- Conduct audits of validation or performance qualification processes to ensure compliance with internal or regulatory requirements.
- Direct validation activities, such as protocol creation or testing.
- Coordinate the implementation or scheduling of validation testing with affected departments and personnel.
- Participate in internal or external training programs to maintain knowledge of validation principles, industry trends, or novel technologies.
Required Skills/Abilities:
- General knowledge of 4 technical expertise areas, with deep knowledge in 1 area
- Chef (basic syntax, recipes, cookbooks) and Ansible (basic syntax, tasks, playbooks)
- Terraform basic syntax and CI/CD configuration, pipelines, jobs
- Cloud resources provisioning and configuration through CLI/API
- Kubernetes basic understanding, CLI, service re-provisioning
- Provision and setup metric in Prometheus, Thanos, and Grafana, alerts and silences
- Provision and setup logs and queries for general questions
- Operating system (Linux) configuration, package management, startup and troubleshooting
- Block and object storage configuration
- Datacenter installation processes, equipment management requirements and cable management requirements
- Networking VPCs, proxies and CDNs
Education and Experience:
- High school diploma or equivalent.
- At least two years of related experience.
Physical Requirements:
- Prolonged periods of sitting at a desk and working on a computer.
Location: