- Develop simple software components to address complex challenges in distributed machine learning training.
- Contribute to and maintain core tools by implementing new features, optimizing existing solutions, fixing bugs, and participating in architectural design.
- Conduct code reviews to uphold high standards of code quality.
- Measure and optimize existing components to ensure they meet end-user requirements effectively.
- Strong foundation in computer science fundamentals and a proven track record of shipping production-grade code.
- Willingness to learn Rust and proficiency in at least one other system-level language.
- Deep understanding of distributed systems or systems at scale.
- Strong foundation in operating systems, preferably Linux or macOS.
- Highly motivated individual with excellent verbal and written communication skills.
- Experience with UNIX APIs and networking protocols.
- Experience working in high-growth startup or scale-up environments.
- Experience working with metrics, spans, and traces.