Bharath Thati

India

Now Exploring Apache Iceberg · Hudi · Kinesis streaming · AI agents & agentic workflow systems · modern lakehouse architectures

Architecture Showcase — click `← →` or buttons to flip through systems I've built

rave · Goldman Sachs

Ingest 100+ financial feed types (liquidity stress, fund measures, etc.) from Aladdin → DynamoDB. React UI for query + scheduled reports.

Node.js LambdaDynamoDB API GatewayReact Cognito7+ repos

apollo · Goldman Sachs

200+ model dbt warehouse on Aurora Postgres. S3 PUT → CW Event Rule → router Lambda (builds input JSON from the filename) → per-process Step Function → ECS (extract / load / dbt). Infra via AWS CDK.

S3EventBridge LambdaStep FunctionsECS dbtAurora Postgres AWS CDK

yuno · Amazon

AWS CDK construct (TypeScript). After the central platform team wound down, yuno let 100+ data-engineering teams self-deploy and own their entire stack — orchestrator (Airflow or Step Functions), EMR compute, full plumbing (VPC + IAM + S3 buckets), and observability (CW Logs, metrics, dashboards, alarms). One CDK deploy, production-grade environment.

AWS CDKTypeScript MWAA / AirflowStep Functions LambdaEMR / Spark VPC + IAM + S3CloudWatch 100+ teams

dejavu · Amazon

Datalake of 60+ OFA tables. Daily EDX parcels feed two PySpark EMR jobs — first writes to staging, second dedupes (rownum) and writes partitioned Parquet.

EMRPySpark S3Parquet 60+ tables

daxter · Amazon

YAML-driven Python ETL framework. Reads from multiple sources, transforms via SQL, loads to multiple destinations. Originally on EC2, later moved to ECS.

PythonYAML SQLECS S3 / MySQL / Redshift / WorkDocs / DataNet

docket · Amazon

SNS → SQS → Lambda fan-in. Writes CSV to S3, failures to a DLQ. A daily Glue compaction job rolls CSVs into Parquet for reporting.

SNSSQS LambdaDLQ GlueS3 Parquet

smriti · personal · deploy: CloudFront + EC2

Domain registered outside AWS (cheaper). Route 53 hosts the zone; CloudFront for TLS + edge cache; single EC2 runs Express + cron + SQLite. Closest path to "live URL" — no app rewrite.

Route 53CloudFront EC2 (t4g.small)SQLite on EBS ~$10/mo

smriti · personal · deploy: S3 + API GW + Lambda + DDB

Serverless rewrite. CloudFront fans out: static résumé / app shell from S3; API requests to API Gateway → Lambda → DynamoDB. Scales to zero when idle. Needs a real SQLite → DDB rewrite.

Route 53CloudFront S3 (static)API Gateway LambdaDynamoDB ~$0–5/mo

Featured Projects

smriti — self-hosted habits / expenses / to-do

Live · /tracker →

A personal tracker the author uses daily — and the system serving this very page. Built end-to-end: HTTP server, SQLite schema with foreign-key constraints, scrypt + session auth with rate limit and constant-time login, recurring habits engine, expense aggregation, to-do urgency scoring, Telegram + email notifications via cron.

Node.js Express better-sqlite3 Vanilla JS cron Telegram Bot

Technical Skills — click a chip to filter experience bullets

Cloud: AWS · Lake Formation
AWS Services: S3, DynamoDB, Redshift, Aurora Postgres, MySQL, Lambda, EMR, ECS, Glue, Step Functions, Airflow, API Gateway, SNS, SQS, Kinesis, Cognito, SageMaker, IAM, VPC, CloudFormation, CDK
Languages: Python, PySpark, SQL, TypeScript, Node.js, Hive, Scala
Data & Big Data: Apache Spark, Hive, EMR, Airflow, dbt, ETL, Data Lakes, Distributed Processing
Lakehouse & Streaming: Apache Iceberg, Apache Hudi, Kinesis, Incremental Processing
Frameworks: React, Node.js (Express)
Architecture: Serverless, Event-Driven, Platform Engineering, IaC
Ops: CloudWatch, Git, CI/CD, Jira, On-call
Analytics: QuickSight

Professional Summary

Senior Data Engineer with 10+ years building enterprise-scale data platforms on AWS — across Fintech, Asset Management, Compliance, and E-commerce.

Deep hands-on experience with distributed ETL/ELT, data lakes, and warehouses using Python, PySpark, SQL, and the AWS data stack: Spark, EMR, Lambda, Step Functions, Airflow, ECS, Redshift, Aurora Postgres, DynamoDB, S3, dbt.

Built reusable platform frameworks, workflow orchestration systems, and event-driven architectures for mission-critical financial reporting and compliance workloads. Comfortable owning systems end-to-end: design, build, deploy, monitor, on-call.

Professional Experience

Goldman Sachs — Senior Data Engineer

Feb 2024 – Present

rave — Ingest 100+ financial feed types (Aladdin / BlackRock) — Node.js Lambdas triggered by S3, writing to DynamoDB; React UI + API Gateway + Cognito over 7+ repos.
apollo — Owned a 200+ model dbt warehouse on Aurora Postgres. Step Functions orchestrate per-process extract → load → dbt-run on ECS; infra deployed via AWS CDK.
nnip2statpro — Python Lambdas pull from internal APIs, write to S3; a third-party app moves files from S3 to external (non-Goldman) consumer drives.
Production support across all three: monitoring, triage, root-cause analysis.

Amazon — Data Engineer

Aug 2017 – Jun 2023

yuno — Reusable AWS CDK construct library (TypeScript) adopted across data engineering teams. Users submit a spark-submit or PySpark file; yuno provisions EMR + Airflow/Step Functions + VPC/IAM/S3 with tracking, retries, and alarms built in.
dejavu — Datalake of 60+ OFA tables. Daily EDX parcels → two PySpark EMR jobs (stage, then dedupe via rownum + partitioned Parquet).
daxter — YAML-driven Python ETL framework. Multi-source / multi-destination (S3, MySQL, Redshift, WorkDocs, DataNet) with SQL transforms. Started on EC2, moved to ECS.
docket — SNS → SQS → Lambda → S3 (CSV); failures to DLQ; daily Glue compaction → Parquet for reporting.
purchase register — Audit-ready report across 60+ OFA tables. Built a Redshift Airflow operator and a DAG of 40+ tagged tasks; enabled accounting to claim $100M / month in Input Tax Credit.
fintech fold — Python monitor for daxter / yuno / horizon job platforms via their APIs — health, retries, alarms.
Designed and supported the broader Amazon Finance data platform on EMR + S3 with secured access. Built Airflow ETL pipelines generating monthly / daily compliance reports.
Spark and Hive performance — partitioning strategies, large-scale tuning.
Self-service analytics dashboards with QuickSight; partnered with IN / EU stakeholders for country-tax-officials reports.
On-call: incidents, RCA, operational troubleshooting. Architecture reviews, code reviews, platform standardization.

Amazon — Support Engineer / Lead Generation Analyst

Aug 2014 – Aug 2017

Automated 35% of goal backfill by integrating internal ML attribute-prediction services — no manual audit needed.
Built ML text-classification models (Scala, Python, AWS SageMaker) for attribute backfilling at scale.
Backfilled missing / incorrect attributes for top-glance-view ASINs at scale.
Built an internal website to lift auditor productivity; published weekly metrics on goal coverage and auditor performance.
Lead generation: identified seller prospects via Google / Bing / DuckDuckGo web scraping, deduped against onboarded sellers, tuned a pattern-classifier for categorization.

Learning

Apache Iceberg
Apache Hudi
Streaming Architectures with Kinesis
AI Agents & Agentic Workflow Systems
Modern Lakehouse Architectures

Education

B.Tech (Mechanical Engineering) — NIT Surat — 2014
Intermediate (MPC) — Sri Chaitanya Jr Kalashala — 2009
Matriculation — Siddhartha Grammar School — 2007