Metaforge

Site Reliability Engineer(SRE)

Job Description
  • We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions.
  • As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time.
  • Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

Objectives of this role:

  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Build software and systems to manage platform infrastructure and applications.
  • Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
  • Provide primary operational support and engineering for multiple large-scale distributed software applications.


Responsibilities:

  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and reliability with well-defined service-level objectives.


Required skills and qualifications:

  • Bachelor’s degree (or equivalent) in computer science or related discipline.
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript.
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn).
  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.


Preferred skills and qualifications:

  • Previous success in technical engineering.
  • Coding experience beyond simple scripts.
Job Category: and Amazon S3 Ceph dynamic resource management frameworks (Apache Mesos HDFS Kubernetes NFS Yarn)
Job Type: Full Time
Job Location: Hyderabad
Country: India
Experince: 10

Apply for this position

Allowed Type(s): .pdf, .doc, .docx