Role: Senior Site Reliability Engineer
Location: Dublin, Ireland
Working Model: Hybrid
The ideal candidate is self-driven, data-driven, and has the ability to work in a distributed team. This professional holds strong knowledge of Site Reliability engineering and DevOps methodologies related to Delivery solutions & Platform Automation. In this role, you will be part of the Site Reliability team sharing your experience in the field with our Delivery, Support, Product Engineering, and Infrastructure teams. You will simultaneously focus on technical excellence and quickly delivering value to customers who have deployed our software in production. The person who fills this role is a subject matter expert who excels in collaboration, open communication, and reaching across functional borders.
- Provide support for Site Reliability / DevOps driven solutions for cloud and on-premise environments, troubleshoot issues with applications and middleware components
- Take learnings from the field and take ownership of making sure the improvements/fixes/learnings make it back into the product and to WorkFusion documentation
- Work on Ansible-based product installers and automation scripts
- Build and support Monitoring Systems around product, as well as highly available and scalable services
- Architect and implement increasingly better HA, DR, and backup solutions
- Creates the vision and improves the whole lifecycle of services - from inception and design, through deployment, operation, and refinement. This includes researching gaps in automation and laying out the plan to remove the gaps.
- Recommends and implements strategies, policies, and procedures by evaluating organization outcomes; identifying problems; evaluating trends; and anticipating requirements.
- Show ownership of customer success with WorkFusion platform management.
- Partner with Delivery, Engineering, and Product to steer SRE alignment and strategy to ensure reliability of WorkFusion platform deployments
- Respond to client reliability concerns and agile problem resolution.
- Lead resource management and efficiency strategy. Provide technical leadership development and recruitment inside the organization.
- Collaborate with our growing DevOps/Infra team to build and iterate on our infrastructure to improve reliability and performance
- Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.
- Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
- Advanced knowledge of the Platform features and functionalities
- Provide L3 escalation support to provide expert and minimize the business outage.
- Document solutions and techniques for resolving issues, ensuring information is available to the team through technical notes and the internal knowledge base"
- Strong expertise (7-10 years) in administration and engineering of Linux and Windows OS (Amazon Linux, RedHat, Centos, Windows 2016 )
- Hands-on experience (>3 years) working with Tomcat or other Java servlet containers
- Practical knowledge of administering and tuning web servers (Nginx), application servers, and databases (MySQL, PostgreSQL, MongoDB/MSSQL)
- Proficiency in Bash (> 3 years)
- Familiarity with Windows Systems and its Services (Microsoft SQL a huge plus)
- Strong Knowledge of AWS, Azure or GPC cloud services - EC2, ASG, LB, KMS, S3, Route53, Azure LB
- Solid experience (>2 years) in Ansible CM, or similar
- Deep understanding of CI/CD tools (Jenkins/Sonar/Nexus)
- Secret Management Software (Hashicorp Vault), RabbitMQ, Marathon, Mesos
- VMware (Virtualization/Hypervisor)
- Advanced Storage Knowledge
- Scripting languages (Bash, powershell,)
- System Analysis
- Monitoring and alerting experience (ELK)
- Databases (MSSQL, MySQL, PostgreSQL/MSSQL)
- Network administration, DNS, TCP/IP , Security, PKI Certificate management
Would be a plus
- Familiarity with ELK stack; Grafana/Splunk
- Practical knowledge of Hashicorp Vault
- Experience with Java development
- Deep understanding of Linux kernel , networking