Job Overview
Job Type
Full-time
Japanese Level
None Required
Category
Tech & Engineering
Description
**About the company:** Treasure Data Minato-ku, Tokyo Treasure Data is the only enterprise Customer Data Platform that harmonizes an organization’s data, insights, and engagement technology stacks to drive relevant, real-time customer experiences throughout the entire customer journey. **Responsibilities:** Designing, building, scaling, and maintaining the core services that power the Worker Platform, prioritizing reliability, high availability, resilience, and performance, while optimizing for cloud costs. Leading the modernization and simplification of complex legacy systems to improve maintainability and developer velocity. Leading and participating in system design discussions to help the team make the right tradeoffs for a large-scale, multi-tenant, distributed system. Collaborating with Tech Leads, Product Managers, and engineers from other teams (like Plazma, Integrations, and SRE) to break down complex projects into deliverable milestones. Mentoring and coaching other engineers through pairing, constructive code reviews, and technical discussions. Proactively identifying and solving platform challenges, contributing to the team’s operational excellence and long-term roadmap. Improving our engineering standards, tooling, and CI/CD processes, ensuring we can deliver value safely and quickly. Requirements A minimum of 5 years of professional experience building and operating large-scale, distributed systems in production. Strong software engineering fundamentals and proficiency in JVM-based languages (our primary language is Kotlin). Practical experience with concurrent programming, including a solid grasp of JVM-specific synchronization, thread-safety, and resource locking. Experience with cloud infrastructure (AWS preferred) and container orchestration patterns (e.g., Kubernetes, ECS), specifically regarding resource management and autoscaling. Strong background in Observability (e.g., Datadog, CloudWatch) to diagnose bottlenecks and drive data-driven decisions. Excellent communication and collaboration skills, with the ability to work effectively across time zones and language barriers. A proven ability to work both independently and collaboratively as part of a high-performing team. Nice to haves While not specifically required, tell us if you have any of the following. Experience with non-blocking I/O and modern JVM concurrency models, such as Kotlin Coroutines or Java Virtual Threads (Project Loom). Experience working in highly distributed teams, across large time zone differences. A deep understanding of the common failure modes in complex, distributed systems and experience conducting Root Cause Analysis (RCA). A “FinOps” mindset: a proven track record of reducing infrastructure costs by optimizing system throughput and resource utilization via efficient concurrency models. Are a student of complex systems theory and how to build resilient and adaptive systems. An interest in or experience with applying GenAI/LLMs to improve developer productivity. Have read and enjoyed books like “Designing Data-Intensive Applications” , “The Staff Engineer’s Path” , “Nonviolent Communication” , “High Output Management” , or “Systems Performance: Enterprise and the Cloud” . APPLY FOR THIS POSITION DO YOU NEED MORE INFO? ASK A QUESTION Meet Treasure Data's Developers Scaling ML Algorithms for Enterprise with David Landup David discusses how he enjoys switching hats between ML and software, and why he finds Treasure Data’s “extensive ecosystem” so much fun. Read their story... Overcoming Imposter Syndrome at Treasure Data with Tyler Welsh Tyler is a software engineer at Treasure Data working on their Data Clean Room product. He talks about how Treasure Data supports their team’s learning and growth, and how they invest in the quality and performance of their services. Read their story... **Requirements:** A minimum of 5 years of professional experience building and operating large-scale, distributed systems in production. Strong software engineering fundamentals and proficiency in JVM-based languages (our primary language is Kotlin). Practical experience with concurrent programming, including a solid grasp of JVM-specific synchronization, thread-safety, and resource locking. Experience with cloud infrastructure (AWS preferred) and container orchestration patterns (e.g., Kubernetes, ECS), specifically regarding resource management and autoscaling. Strong background in Observability (e.g., Datadog, CloudWatch) to diagnose bottlenecks and drive data-driven decisions. Excellent communication and collaboration skills, with the ability to work effectively across time zones and language barriers. A proven ability to work both independently and collaboratively as part of a high-performing team. **Nice to have:** While not specifically required, tell us if you have any of the following. Experience with non-blocking I/O and modern JVM concurrency models, such as Kotlin Coroutines or Java Virtual Threads (Project Loom). Experience working in highly distributed teams, across large time zone differences. A deep understanding of the common failure modes in complex, distributed systems and experience conducting Root Cause Analysis (RCA). A “FinOps” mindset: a proven track record of reducing infrastructure costs by optimizing system throughput and resource utilization via efficient concurrency models. Are a student of complex systems theory and how to build resilient and adaptive systems. An interest in or experience with applying GenAI/LLMs to improve developer productivity. Have read and enjoyed books like “Designing Data-Intensive Applications” , “The Staff Engineer’s Path” , “Nonviolent Communication” , “High Output Management” , or “Systems Performance: Enterprise and the Cloud” . APPLY FOR THIS POSITION DO YOU NEED MORE INFO? ASK A QUESTION Meet Treasure Data's Developers Scaling ML Algorithms for Enterprise with David Landup David discusses how he enjoys switching hats between ML and software, and why he finds Treasure Data’s “extensive ecosystem” so much fun. Read their story... Overcoming Imposter Syndrome at Treasure Data with Tyler Welsh Tyler is a software engineer at Treasure Data working on their Data Clean Room product. He talks about how Treasure Data supports their team’s learning and growth, and how they invest in the quality and performance of their services. Read their story...
Requirements
- A minimum of 5 years of professional experience building and operating large-scale, distributed systems in production.
- Strong software engineering fundamentals and proficiency in JVM-based languages (our primary language is Kotlin).
- Practical experience with concurrent programming, including a solid grasp of JVM-specific synchronization, thread-safety, and resource locking.
- Experience with cloud infrastructure (AWS preferred) and container orchestration patterns (e.g., Kubernetes, ECS), specifically regarding resource management and autoscaling.
- Strong background in Observability (e.g., Datadog, CloudWatch) to diagnose bottlenecks and drive data-driven decisions.
- Excellent communication and collaboration skills, with the ability to work effectively across time zones and language barriers.
- A proven ability to work both independently and collaboratively as part of a high-performing team.
