Type: Full-time Contract
Location: District 2, HCMC, Vietnam
Working mode: On-site
About DataXight
DataXight, headquartered in Mountain View, CA, is on a bold mission to transform healthcare through groundbreaking data-driven solutions. We're passionate about empowering our team to work on projects that not only push boundaries but also make a tangible difference in the lives of millions. Are you ready to be part of something greater?
The Role
We’re looking for a Mid-Senior Data Engineer to join our digital transformation team. You’ll build and optimize data pipelines, develop robust data models, and deliver reliable analytics layers that power insights across the company.
Key Responsibilities
Large-Scale Data Movement: Design and engineer highly efficient data pipelines that move petabytes of data reliably across systems. Focus on reducing I/O overhead by minimizing deep copies (e.g., leveraging shallow cloning, symlinks, or advanced data lake table formats).
Spark Workflow Optimization: Partner closely with BioInformatic to refactor, tune, and productionize their research code. Masterfully allocate distributed compute resources in Apache Spark to handle data skew, optimize shuffles, and reduce execution times.
CI/CD & Workflow Automation: Design and implement robust CI/CD pipelines to automate the testing, validation, and deployment of data science workflow templates. Specifically, build automated testing frameworks to validate Jupyter Notebooks and research scripts before they are promoted to production.
Data Platform Engineering: Orchestrate complex, resilient data workflows across AWS using modern cloud-native automation tools.
Architecture & Modeling: Collaborate with business and technology stakeholders to establish design principles. Develop scalable, fit-for-purpose data models (Delta Lake, Iceberg, or similar) that support both high-throughput analytics and strict compliance.
Data Integrity & Fallback Mechanisms: Build fault-tolerant systems that guarantee data delivery. Implement robust validation frameworks (e.g., automated MD5/checksum verification), dead-letter queues, and automated fallback/retry mechanisms to ensure zero data loss during transit.
Ownership & Documentation: Take end-to-end ownership of key infrastructure initiatives. Maintain clear, thorough documentation for data flows, disaster recovery protocols, and optimization standards.
Must-have Skills & Qualifications
Experience: From 3 - 5 years of professional experience in data engineering with a focus on high-volume environments.
Spark Mastery: Deep expertise in Apache Spark (PySpark or Scala). You must understand the Catalyst optimizer, memory management, how to read execution plans, and how to resolve common bottlenecks (spill, skew, excessive shuffling). Be able to custom Spark is big plus
CI/CD & Notebook Testing: Proven experience building automated testing and deployment pipelines for data workflows (e.g., using GitHub Actions). Hands-on experience parameterizing and testing Jupyter Notebooks in a programmatic pipeline (using tools like papermill or pytest).
Advanced Data Movement: Proven track record of optimizing data storage and transit. Experience with columnar file formats (Parquet, ORC) and implementing strategies that avoid unnecessary data duplication. Experience handling bioinformatic data file types (e.g .vcf, .gvcf, etc.).
Fault-Tolerant Engineering: Strong experience building automated integrity checks, checksum validations, and automated recovery/fallback pipelines.
Python & SQL: Exceptional proficiency in Python for scalable ETL/ELT workflows and advanced SQL for complex data manipulation
Source Control: Proficient with Git/GitHub for version control, code reviews, and collaborative development.
Global Collaboration: Experience staying aligned with a global team (US and Czech colleagues), capable of managing coordinated projects asynchronously in English.
Education: Degree in Computer Science, Data Engineering, or a related technical field.
Nice-to-have Qualifications
SaaS Experience: Experience as part of a SaaS provider leveraging consumption billing is highly desired.
FAIR Knowledge: Knowledge of FAIR (Findable, Accessible, Interoperable, and Reusable) principles.
Dashboards & Reporting: Create dashboards and reports (e.g., cloud spend, margins) using tools like Tableau, Superset, and Grafana.
Lifelong Skills
Inquisitive Mindset: Demonstrates a deep-rooted curiosity, constantly seeking to understand project details and underlying principles, indicative of a continuous learning approach.
Clarity and Insight: Actively seeks clarity and understanding of project requirements to ensure accurate and effective implementation.
Depth of Perception: Has talent for looking beyond initial requests, intuitively grasping and addressing the core needs and objectives underlying a project.
Implementation Insight: Skilled in outlining and describing the phases of project implementation, breaking down complex tasks into manageable steps.
Articulate Communication: Exhibits strong verbal and written communication skills, adept at tailoring messages effectively for different audiences, ensuring that it is not only clear but also contextually relevant and empathetic.
Why Join DataXight
This isn’t just another job; it’s an opportunity to join a team that's earnest about making a global impact in human health. We provide room for growth, a dynamic and collaborative work environment, and the chance to make your mark in an industry ripe for innovation. Join our team and immerse yourself in a dynamic, supportive, and friendly workplace that fosters collaboration and personal growth.
We Offer:
Competitive compensation
Flexible time off, including floating holidays
Private health insurance
Full contribution to social, health, and unemployment insurance
Professional development support
Company-provided lunches along with a selection of coffee, tea, and snacks
Other employee benefits, exceeding Vietnamese labor law, reinforcing our commitment to your growth and well-being
If you’re someone who thrives on solving complex problems and actively wants to transform healthcare, apply DataXight today. Send your resume, cover letter, and relevant work samples or GitHub profile to careers@dataxight.com. We look forward to reviewing your application and potentially welcoming you to our globally connected team!
