Experience
Senior Data & Platform Engineer
Aug 2023 - PresentPlume Design Inc
- • Led multi-cloud architecture to onboard ISP customers on GCP while maintaining AWS parity; enabled 15M customers.
- • Reduced infrastructure cost by 30% and data quality issues by 50% by migrating YugabyteDB ingestion from Lambda to Spark Streaming at 40 GB/hour.
- • Designed deep archival strategy for 50 PB of data; identified 20% cost reduction (~$150k/month).
- • Saved ~$2M in Databricks costs by shifting workloads to EKS-based clusters and optimizing DBUs.
- • Open-sourced Project Lumos: pluggable metadata and governance framework; saved 400+ hours/year and reduced bug triage effort across teams.
Senior Data & Platform Engineer
May 2021 - Jul 2023Morgan Stanley
- • Built self-service data platform for global trade settlements; reduced reporting turnaround by 1.5 weeks.
- • Improved OLTP SQL Server latency by 40% via hourly archival and optimized indexing on Type-2 SCD tables.
- • Saved 120 hours/week by automating ownership assignment of trade fails using a Drools-based CQRS pipeline.
- • Owned critical data warehouse across Snowflake/DB2/Sybase/CDC pipelines; saved $10M with governance and quality protocols.
- • Led an innovation project using Neo4J knowledge graphs for fraud detection; 3rd place among 70 teams.
Data Engineer
Apr 2019 - May 2021Fractal Analytics
- • Built COVID-19 India data platform with PMO/NITI AYOG/NASSCOM; improved prediction accuracy by 45% with district-level granularity.
- • Designed serverless ingestion and modeling on AWS Athena/Glue/Lambda for national dashboards.
- • Led cross-functional discussions with Mapbox and Infosys on privacy/security for movement data.
- • Delivered a European health tech data platform; reduced infra cost by 40% and improved pipeline runtime 3x.
- • Automated campaign analytics to save 24 hours/week across regions.
Data Engineer
Mar 2017 - Apr 2019Infosys Limited
- • Developed SQL ETLs and Spark-Scala pipelines on 500 PB clusters; optimized batch Hive loads for reliability.
About
Senior Data & Platform Engineer focused on multi-cloud architectures, scalable data platforms, and cost-aware systems design.
Programming
Scala
Python
Java
SQL
Cypher
Bash
Perl
Data Engineering Frameworks
Spark
Databricks
Hadoop
EMR
EC2
Lambda
MapReduce
Spark MLlib
YARN
Sqoop
Kafka
GCP Dataproc
BigQuery
Pub/Sub
Databases and Table Formats
Hive
Snowflake
Delta Lake
Glue
Athena
Neo4J
DB2
Sybase
SQL Server
MySQL
Yugabyte
Orchestration and DevOps
Step Functions
Shell Scripting
Git
Autosys
Liquibase
Jenkins
Jira
Azure DevOps
Bitbucket
Airflow
Cloud, Storage, and Formats
AWS
Azure
HDFS
S3
EBS
ADLS
Blob Storage
Cloud Storage
Parquet
ORC
Delta
Avro
JSON
XML
CSV
Columnar
Linux








