Bharath Thati
Architecture Showcase — click ← → or buttons to flip through systems I've built
Featured Projects
smriti — self-hosted habits / expenses / to-do
Live · /tracker →A personal tracker the author uses daily — and the system serving this very page. Built end-to-end: HTTP server, SQLite schema with foreign-key constraints, scrypt + session auth with rate limit and constant-time login, recurring habits engine, expense aggregation, to-do urgency scoring, Telegram + email notifications via cron.
Technical Skills — click a chip to filter experience bullets
- Cloud
- AWS · Lake Formation
- AWS Services
- S3, DynamoDB, Redshift, Aurora Postgres, MySQL, Lambda, EMR, ECS, Glue, Step Functions, Airflow, API Gateway, SNS, SQS, Kinesis, Cognito, SageMaker, IAM, VPC, CloudFormation, CDK
- Languages
- Python, PySpark, SQL, TypeScript, Node.js, Hive, Scala
- Data & Big Data
- Apache Spark, Hive, EMR, Airflow, dbt, ETL, Data Lakes, Distributed Processing
- Lakehouse & Streaming
- Apache Iceberg, Apache Hudi, Kinesis, Incremental Processing
- Frameworks
- React, Node.js (Express)
- Architecture
- Serverless, Event-Driven, Platform Engineering, IaC
- Ops
- CloudWatch, Git, CI/CD, Jira, On-call
- Analytics
- QuickSight
Professional Summary
Senior Data Engineer with 10+ years building enterprise-scale data platforms on AWS — across Fintech, Asset Management, Compliance, and E-commerce.
Deep hands-on experience with distributed ETL/ELT, data lakes, and warehouses using Python, PySpark, SQL, and the AWS data stack: Spark, EMR, Lambda, Step Functions, Airflow, ECS, Redshift, Aurora Postgres, DynamoDB, S3, dbt.
Built reusable platform frameworks, workflow orchestration systems, and event-driven architectures for mission-critical financial reporting and compliance workloads. Comfortable owning systems end-to-end: design, build, deploy, monitor, on-call.
Professional Experience
Goldman Sachs — Senior Data Engineer
Feb 2024 – Present- rave — Ingest 100+ financial feed types (Aladdin / BlackRock) — Node.js Lambdas triggered by S3, writing to DynamoDB; React UI + API Gateway + Cognito over 7+ repos.
- apollo — Owned a 200+ model dbt warehouse on Aurora Postgres. Step Functions orchestrate per-process extract → load → dbt-run on ECS; infra deployed via AWS CDK.
- nnip2statpro — Python Lambdas pull from internal APIs, write to S3; a third-party app moves files from S3 to external (non-Goldman) consumer drives.
- Production support across all three: monitoring, triage, root-cause analysis.
Amazon — Data Engineer
Aug 2017 – Jun 2023- yuno — Reusable AWS CDK construct library (TypeScript) adopted across data engineering teams. Users submit a spark-submit or PySpark file; yuno provisions EMR + Airflow/Step Functions + VPC/IAM/S3 with tracking, retries, and alarms built in.
- dejavu — Datalake of 60+ OFA tables. Daily EDX parcels → two PySpark EMR jobs (stage, then dedupe via rownum + partitioned Parquet).
- daxter — YAML-driven Python ETL framework. Multi-source / multi-destination (S3, MySQL, Redshift, WorkDocs, DataNet) with SQL transforms. Started on EC2, moved to ECS.
- docket — SNS → SQS → Lambda → S3 (CSV); failures to DLQ; daily Glue compaction → Parquet for reporting.
- purchase register — Audit-ready report across 60+ OFA tables. Built a Redshift Airflow operator and a DAG of 40+ tagged tasks; enabled accounting to claim $100M / month in Input Tax Credit.
- fintech fold — Python monitor for daxter / yuno / horizon job platforms via their APIs — health, retries, alarms.
- Designed and supported the broader Amazon Finance data platform on EMR + S3 with secured access. Built Airflow ETL pipelines generating monthly / daily compliance reports.
- Spark and Hive performance — partitioning strategies, large-scale tuning.
- Self-service analytics dashboards with QuickSight; partnered with IN / EU stakeholders for country-tax-officials reports.
- On-call: incidents, RCA, operational troubleshooting. Architecture reviews, code reviews, platform standardization.
Amazon — Support Engineer / Lead Generation Analyst
Aug 2014 – Aug 2017- Automated 35% of goal backfill by integrating internal ML attribute-prediction services — no manual audit needed.
- Built ML text-classification models (Scala, Python, AWS SageMaker) for attribute backfilling at scale.
- Backfilled missing / incorrect attributes for top-glance-view ASINs at scale.
- Built an internal website to lift auditor productivity; published weekly metrics on goal coverage and auditor performance.
- Lead generation: identified seller prospects via Google / Bing / DuckDuckGo web scraping, deduped against onboarded sellers, tuned a pattern-classifier for categorization.
Learning
- Apache Iceberg
- Apache Hudi
- Streaming Architectures with Kinesis
- AI Agents & Agentic Workflow Systems
- Modern Lakehouse Architectures
Education
- B.Tech (Mechanical Engineering) — NIT Surat — 2014
- Intermediate (MPC) — Sri Chaitanya Jr Kalashala — 2009
- Matriculation — Siddhartha Grammar School — 2007