Job Description
- We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions.
- As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time.
- Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.
Objectives of this role:
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
- Provide primary operational support and engineering for multiple large-scale distributed software applications.
Responsibilities:
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service-level objectives.
Required skills and qualifications:
- Bachelor’s degree (or equivalent) in computer science or related discipline.
- Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript.
- Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn).
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
Preferred skills and qualifications:
- Previous success in technical engineering.
- Coding experience beyond simple scripts.
Job Category: and Amazon S3 Ceph dynamic resource management frameworks (Apache Mesos HDFS Kubernetes NFS Yarn)
Job Type: Full Time
Job Location: Hyderabad
Country: India
Experince: 10
