Tech lead- cloud reliability engineering

Werkgever:
Mendix
Regio:
Rotterdam
 
Functieomschrijving
About the Team: If you are an experienced developer and want to make a difference for tens of thousands of developers in our community, we have an opportunity for you! As a company, we’re in constant communication with our community, but we need your field experience to improve our product and take it to the next level. As the backbone for countless enterprises, the Mendix Cloud hosts tens of thousands of mission-critical customer applications. These include vital systems for insurance, comprehensive supply chain optimization, advanced real estate solutions, AI-driven decision-making tools, enterprise SaaS platforms, and sophisticated industrial automation, all relying on our cloud for unparalleled reliability, performance, and the agility to integrate next-generation technologies.The team is responsible for delivering and supporting a high quality, highly available public cloud platform where our customers can run their Mendix apps. We develop and run the Mendix Cloud infrastructure and services that offer deployment, operations and monitoring.About the Role:

You'll help drive digital innovation by:

  • Writing software/scripts to automate operations on our platform, reducing support requests and engineering time.
  • Solving operational issues for our customers, by both investigating technically and liaising between Mendix Support (1st line) and other development teams in R&D.
  • Providing out of hours support for critical customer issues on an on-call basis.
  • Creating and maintaining monitoring & alerting systems to provide real-time visibility into the performance and availability of the platform (SRE).
  • Developing and maintaining dashboards & reports to track key performance indicators and identify trends and issues.
  • You’re the innovator we need if:

  • You have experience with Site Reliability Engineering (SRE).
  • You have coding skills, ideally in Python; it’s a plus if you also have experience with Golang.
  • You have good knowledge of infrastructure (AWS).
  • You have experience with Infrastructure as Code (IaC), preferably Terraform or OpenTofu.
  • You have strong experience with containerization technologies, primarily Kubernetes.
  • You're comfortable writing a Python script to automate complex tasks to reduce manual effort.
  • You have excellent communication and people skills, both written and verbal.
  • You have the ability to spearhead, manage and explain complex technical issues and reduce them to a form that less technical customers & colleagues can understand.
  • A deep understanding of Cloud architecture/deployment and infrastructure services like web servers, load balancing, SSL/TLS/X509, etc.
  • You have experience with monitoring and logging tools such as CloudWatch, ELK, Grafana, Datadog or Prometheus.
  • We use Datadog, Prometheus, CloudWatch, PagerDuty and Grafana internally, but experience with other tools such as Splunk or New Relic is welcome too.
  • You have proven experience administering, developing against, or architecting on a cloud platform. AWS is the platform we use, but experience on GCP or Azure is OK too.
  • You have strong experience with containers and Linux/Unix systems.
  • You are familiar with SQL/databases (we primarily use PostgreSQL).
  • A passion for investigating complex issues and finding out the solution in a platform with many distributed applications.
  • It's nice (but not essential) if:

  • You have experience with the Mendix low-code platform!
  • Benefits

    We provide generous benefits that support our employees’ wellbeing, no matter where they’re located. We also offer region-specific benefits, like fully-stocked office kitchens, commuter perks, and more!

    Flexible, hybrid working

    Flexible paid time off

    Medical, Dental, and Vision Coverage

    401k matching

    Wellness benefits

    Siemens employee discounts

    Bravely career coaching

    Benefits may vary depending on location