About
Highly accomplished Senior Data Engineer with 5 years of experience specializing in designing, building, and optimizing scalable data pipelines and robust data architectures. Proven ability to transform complex raw data into actionable insights, driving significant improvements in data accessibility, system performance, and business decision-making across diverse industries. Seeking to leverage advanced technical skills and leadership capabilities to deliver innovative data solutions in a challenging environment.
Skills
Programming Languages
Python, SQL, Scala, Java.
Cloud Platforms
AWS (S3, Redshift, Glue, Lambda, EMR, Kinesis), Azure (Data Lake, Synapse, Data Factory), Google Cloud Platform (BigQuery, Dataflow, Cloud Storage).
Big Data Technologies
Apache Spark, Apache Kafka, Hadoop, Hive, Presto.
Data Warehousing & Databases
Snowflake, Redshift, PostgreSQL, MySQL, NoSQL (MongoDB, Cassandra).
ETL/ELT & Orchestration
Apache Airflow, dbt, ETL/ELT Development, Data Integration.
DevOps & MLOps
Docker, Kubernetes, Terraform, Jenkins, Git, CI/CD.
Data Modeling & Governance
Dimensional Modeling, Data Quality, Data Lineage, Metadata Management.
Work
San Francisco, CA, US
→
Summary
Led the design, development, and optimization of enterprise-level data platforms and ETL/ELT pipelines, ensuring high availability and performance for critical business intelligence initiatives.
Highlights
Architected and implemented a new cloud-native data lake on AWS S3 and Redshift, improving data query performance by 40% and reducing storage costs by 25% within 12 months.
Developed and deployed robust real-time data streaming pipelines using Apache Kafka and Spark Streaming, enabling instantaneous analytics for fraud detection and increasing detection accuracy by 15%.
Mentored a team of 3 junior data engineers, fostering skill development in data modeling, pipeline orchestration (Airflow), and best practices for data governance, resulting in a 20% increase in team productivity.
Automated complex ETL processes for integrating data from 10+ disparate sources, reducing manual data preparation time by 80% and improving data refresh rates from daily to hourly.
Implemented CI/CD pipelines for data infrastructure using Terraform and Jenkins, decreasing deployment times by 50% and minimizing human error in production environments.
Seattle, WA, US
→
Summary
Designed, built, and maintained scalable data pipelines and data warehousing solutions to support business intelligence and machine learning initiatives across various departments.
Highlights
Developed and optimized SQL queries and stored procedures, improving report generation speeds by an average of 30% for key business dashboards.
Managed and maintained data warehouses (Snowflake) for over 50TB of data, ensuring data integrity and availability for a user base of 200+ analysts and data scientists.
Collaborated with data scientists to operationalize machine learning models by building data ingestion pipelines, reducing model deployment time by 25%.
Implemented data quality checks and monitoring systems, reducing data discrepancies by 90% and improving reliability of analytics reports.
Contributed to the migration of on-premise data infrastructure to Google Cloud Platform, successfully migrating 10TB of historical data with zero downtime.
Languages
English
Interests
Technology
AI/ML, Cloud Computing, Distributed Systems.
Hobbies
Hiking, Photography, Chess.