Site Reliability Engineer (all genders)

Permanent employee, Full-time · Remote (Germany)

You as part of the tonies:

As a Site Reliability Engineer (all genders) within the Production Systems team at tonies, you will be responsible for ensuring the reliability, availability, and performance of our on-premise
bare-metal and cloud systems.

Your tasks and responsibilities will include:

Infrastructure Management: Design, deploy and manage our on-premise bare-metal and cloud infrastructures ensuring reliability, scalability and performance of our systems.
Deployment and Automation: Streamline and automate deployment processes, leveraging CI/CD pipelines and automation tools. Ensure reliable and consistent deployment of system services and custom applications across cloud and on-premise environments.
Cloud and On-Premise Services Optimization: Manage and optimise our hybrid infrastructure, including AWS cloud services and on-premise bare-metal systems. Ensure cost-effective and efficient operation of all infrastructure components.
Reliability Engineering: Design and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services ensuring they are met or exceeded. Implement error budgets and policies to balance reliability with feature development.
Incident Management: Lead incident response efforts, conduct post-incident reviews (PIRs), and identify opportunities for proactive improvements, covering incidents in both cloud and on-premise environments.
Monitoring and Alerting: Develop and maintain robust monitoring and alerting systems to detect and respond to issues in real time, covering all components of the hybrid infrastructure. Ensure early problem detection and resolution.
Collaboration: Collaborate closely with software engineers, and cross-functional teams to continuously improve the reliability and performance of our hybrid infrastructure through automated testing, monitoring, and proactive maintenance.
Documentation: Create and maintain documentation for system configurations, processes, and best practices, encompassing all infrastructure components. Facilitate knowledge sharing within the team.

What we are looking for:

5+ years of progressive experience in Reliability Engineering with proven experience in implementing SLOs, SLIs and other best practices from the SRE methodology
Proficiency in programming languages like Python, Go, or Rust
Proficiency in Linux-based systems administration and scripting
Experience with AWS cloud services and infrastructure
Experience with on-premise bare-metal infrastructure
Hands-on experience with CI/CD pipelines. (Gitlab CI/CD is a plus)
Knowledge of infrastructure-as-code (IaC) tools like Terraform or Ansible
Experience with containerization and orchestration technologies such as Docker and Kubernetes
Demonstrated ability to manage and optimise critical systems in challenging environments.
Expertise in monitoring, alerting, and logging tools
Strong problem-solving skills and a proactive approach to system reliability
Excellent communication and collaboration skills, very good knowledge of English
Leadership experience or a proven track record of guiding reliability improvements is a plus
Bachelor's degree in a relevant field or equivalent practical experience is a plus

Why tonies?

Global Teamwork: We collaborate across departmental and country borders on our vision to bring the Toniebox into every child's room in the world.
Come as you are: This applies not only to the dress code but also to everything else. Because only where you truly feel comfortable can you give your best.
Mobility: Choose the option that suits you best - a Deutschlandticket (public transport ticket) for unlimited mobility, a monthly contribution of fifty Euros for an office parking space, a leasing bicycle, or as a remote work subsidy.
Enhanced Security: Benefit from subsidies for company pension plans, occupational pension schemes, and occupational disability insurance.
Rest & Time Off: Enjoy 30 days of paid annual leave as well as three additional days off such as Rosenmontag, Christmas Eve, and New Year's Eve. After one year of employment, you can also use up to 10 "toniecation days" (unpaid leave days).
Flexible Working: Equipped with individual equipment, you can work remotely for up to 5 days in consultation with your team - depending on your area of responsibility. And if you're up for a workation, you can work from abroad for up to 4 weeks per year with us.
Continuous Learning: Benefit from our internal and external training opportunities as well as an individual learning budget to continuously expand your knowledge.
Language Learning & Relaxation: Improve your communication skills with the language learning app Babbel and find relaxation through our access to the meditation app Calm.
Discounts: Benefit from attractive discounts on our entire range of tonies products.

Good to know:

As part of our principles, we are committed to supporting inclusion and diversity at tonies®. We actively celebrate our colleagues’ different abilities, ethnicities, faith and gender. Everyone is welcome and supported in their development at all stages in their journey with us.

We look forward to hearing from you!

Esther Sommerfeld
Talent Acquisition Lead

Apply for this job

About us

tonies® is the world’s largest interactive audio platform for children, with more than 6.8 million Tonieboxes and 82 million Tonies sold. The intuitive and award-winning audio system has changed the way young children play and learn independently with its child-safe, wireless, and screen-free approach. Tonieboxes have been activated in over 100 countries, the content portfolio includes more than 1,100 Tonies figurines in several languages.

You as part of the tonies:

Your tasks and responsibilities will include:

Infrastructure Management: Design, deploy and manage our on-premise bare-metal and cloud infrastructures ensuring reliability, scalability and performance of our systems.
Deployment and Automation: Streamline and automate deployment processes, leveraging CI/CD pipelines and automation tools. Ensure reliable and consistent deployment of system services and custom applications across cloud and on-premise environments.
Cloud and On-Premise Services Optimization: Manage and optimise our hybrid infrastructure, including AWS cloud services and on-premise bare-metal systems. Ensure cost-effective and efficient operation of all infrastructure components.
Reliability Engineering: Design and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services ensuring they are met or exceeded. Implement error budgets and policies to balance reliability with feature development.
Incident Management: Lead incident response efforts, conduct post-incident reviews (PIRs), and identify opportunities for proactive improvements, covering incidents in both cloud and on-premise environments.
Monitoring and Alerting: Develop and maintain robust monitoring and alerting systems to detect and respond to issues in real time, covering all components of the hybrid infrastructure. Ensure early problem detection and resolution.
Collaboration: Collaborate closely with software engineers, and cross-functional teams to continuously improve the reliability and performance of our hybrid infrastructure through automated testing, monitoring, and proactive maintenance.
Documentation: Create and maintain documentation for system configurations, processes, and best practices, encompassing all infrastructure components. Facilitate knowledge sharing within the team.

What we are looking for:

5+ years of progressive experience in Reliability Engineering with proven experience in implementing SLOs, SLIs and other best practices from the SRE methodology
Proficiency in programming languages like Python, Go, or Rust
Proficiency in Linux-based systems administration and scripting
Experience with AWS cloud services and infrastructure
Experience with on-premise bare-metal infrastructure
Hands-on experience with CI/CD pipelines. (Gitlab CI/CD is a plus)
Knowledge of infrastructure-as-code (IaC) tools like Terraform or Ansible
Experience with containerization and orchestration technologies such as Docker and Kubernetes
Demonstrated ability to manage and optimise critical systems in challenging environments.
Expertise in monitoring, alerting, and logging tools
Strong problem-solving skills and a proactive approach to system reliability
Excellent communication and collaboration skills, very good knowledge of English
Leadership experience or a proven track record of guiding reliability improvements is a plus
Bachelor's degree in a relevant field or equivalent practical experience is a plus

Why tonies?

Global Teamwork: We collaborate across departmental and country borders on our vision to bring the Toniebox into every child's room in the world.
Come as you are: This applies not only to the dress code but also to everything else. Because only where you truly feel comfortable can you give your best.
Mobility: Choose the option that suits you best - a Deutschlandticket (public transport ticket) for unlimited mobility, a monthly contribution of fifty Euros for an office parking space, a leasing bicycle, or as a remote work subsidy.
Enhanced Security: Benefit from subsidies for company pension plans, occupational pension schemes, and occupational disability insurance.
Rest & Time Off: Enjoy 30 days of paid annual leave as well as three additional days off such as Rosenmontag, Christmas Eve, and New Year's Eve. After one year of employment, you can also use up to 10 "toniecation days" (unpaid leave days).
Flexible Working: Equipped with individual equipment, you can work remotely for up to 5 days in consultation with your team - depending on your area of responsibility. And if you're up for a workation, you can work from abroad for up to 4 weeks per year with us.
Continuous Learning: Benefit from our internal and external training opportunities as well as an individual learning budget to continuously expand your knowledge.
Language Learning & Relaxation: Improve your communication skills with the language learning app Babbel and find relaxation through our access to the meditation app Calm.
Discounts: Benefit from attractive discounts on our entire range of tonies products.

Good to know:

Apply for this job

Über uns

tonies® ist die weltweit größte interaktive Audioplattform für Kinder mit mehr als 6,8 Millionen verkauften Tonieboxen und 82 Millionen Tonies. Das intuitive und preisgekrönte Audiosystem hat mit seinem kindersicheren, kabellosen und bildschirmfreien Ansatz die Art und Weise verändert, wie kleine Kinder unabhängig spielen und lernen. Tonieboxen wurden in über 100 Ländern aktiviert, das Portfolio umfasst mehr als 1.100 Tonies-Figuren in mehreren Sprachen.

Apply for this job

We are looking forward to hearing from you!

Glad you are interested in the company behind the Tonies. Please fill out the following short form. If you have difficulties with the upload of your data, please contact us by email to jobs@tonies.com.