Site Reliability Engineer - Cloud

 

You may not know our name, but you have surely used our innovations and solutions.

 

Our mission is to unlock the world and make it safer through cutting-edge identity technologies. Every day, around the globe, we are enabling citizens and consumers alike to perform their daily critical activities (such as pay, connect and travel), in the physical as well as digital space. We are transforming their lives by making the world more secure and yet also more streamlined.

 

We have brought together complementary know-how and technologies that have never been combined before for both the physical and digital era: secured connectivity, secured payments and secured identity management. Cybersecurity, biometrics, large scale distributed systems and Cloud computing, analytics and smart devices are at the core of both our physical products and our software and systems.

 

We serve our clients in 180 countries thanks to our 15,000 employees worldwide. 

 

Purpose

This role is responsible for maintaining the service-level agreement critical production platforms or products and providing automated operations to ensure the service to our clients is always of the best quality.

 

Key Missions

 

  • Run the production environment by monitoring availability and taking a holistic view of system health in both Azure and Private DCs
  • Designing and maintaining scalable infrastructure as code using Terraform
  • Recover platforms during production incidents to meet targeted SLO; perform detailed root cause analysis to prevent regressions.
  • Troubleshoot, evaluate and resolve operational challenges and support escalation
  • Maintain platforms after go live by measuring and monitoring their availability, performance and overall system health.
  • Documenting Knowledge
  • Scale systems through automation, improving change velocity and reliability
  • Leverage technical skills to partner with team members and be comfortable diving into a problem as needed
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Participate in system design consulting, platform management, and capacity planning
  • Work with feature teams on day-to-day design and development activities (e.g. review architectural changes and their impact on platform OA&M, challenge security decisions, provide feedback and propose improvements related to operational aspects of the applications)
  • Develop auxiliary tools automating or simplifying platform Ops
  • Take responsibility for platform availability, performance and overall system health; manage platform’s error budget
  • Recover platforms during production incidents to meet targeted SLOs; perform detailed root cause analysis to prevent regressions
  • Provide technical expertise on Idemia products and support processes to internal and external customers, including defining SLI/SLO acceptable by all involved parties
  • Validate readiness and maturity of new rollouts through development, execution and verification of automated smoke test suites

Profile & Other Information

 

Qualifications:

  • Bachelor’s degree in computer science or other highly technical, scientific discipline
  • Experience in one or more of the following: Java, Python, Go, Perl, or shell scripting
  • Experience with Azure Cloud services and Terraform experience.
  • Terraform experience is a plus
  • Experience with Azure and Unix/Linux operating systems internals and administration
  • Kubernetes, Docker, Helm, Ansible, Kong, OpenStack, Puppet, and other cloud-based deployment tools and services
  • Expertise in analyzing and troubleshooting large-scale distributed systems
  • Ability to debug and optimize a variety of code, languages, and automation tools
  • Knowledge and experience designing and developing applications that take into account scalability, reliability, extensibility, etc
  • Test automation experience with either unit/integration or functional API testing harnessed in a continuous delivery tool
  • Experience in production environments supporting mission-critical applications
  • Perform on-call rota on monthly bases
  • Strong communication skills with the ability to articulate technical details to different audiences
  • SRE culture approach and experience

 

Personality Profile:

  • Technology Genius approach
  • Self Starter
  • Problem Solver
  • Must be able to communicate effectively and professionally with internal and external customers and other third party companies
  • Able to provide trainings to the teams
  • Excellent attention to detail
  • Ability to work and interact effectively in an international team environment

 

By choosing to work at IDEMIA, you can join the journey of a unique tech company. You can seize all the opportunities of our fast-paced environment. You can add your distinctive qualities to our global community. You can contribute to a safer world.

 

We deliver cutting edge, future proof innovation that reach the highest technological standards. We’re well established, and yet still agile. We aren’t too big, and we aren’t too small. And we’re transforming, fast, to stay a leader in a world that’s changing fast, too.

 

At IDEMIA, people can develop their expertise and feel a sense of ownership and empowerment, in a global environment, as part of a company with the ambition and the ability to change the world.

 

Our teams are close and collaborative; maintaining a dialogue and developing human connections matter to us. We are truly international and we know that diversity is a key driver of innovation and performance. We welcome people from all walks of life, regardless of how they look, where they come from, who they love, or what they think.

 

Each of our locations has its own advantages to offer a collaborative and friendly work environment.

 

IDEMIA. Expect the unexpected. Join the journey of a unique tech company.