Bharath Thati
K GitHub LinkedIn Download PDF Login

Bharath Thati

India
Now Exploring Apache Iceberg · Hudi · Kinesis streaming · AI agents & agentic workflow systems · modern lakehouse architectures

Architecture Showcase — click ← → or buttons to flip through systems I've built

Featured Projects

smriti — self-hosted habits / expenses / to-do

Live · /tracker →

A personal tracker the author uses daily — and the system serving this very page. Built end-to-end: HTTP server, SQLite schema with foreign-key constraints, scrypt + session auth with rate limit and constant-time login, recurring habits engine, expense aggregation, to-do urgency scoring, Telegram + email notifications via cron.

Node.js Express better-sqlite3 Vanilla JS cron Telegram Bot

Technical Skills — click a chip to filter experience bullets

Cloud
AWS · Lake Formation
AWS Services
S3, DynamoDB, Redshift, Aurora Postgres, MySQL, Lambda, EMR, ECS, Glue, Step Functions, Airflow, API Gateway, SNS, SQS, Kinesis, Cognito, SageMaker, IAM, VPC, CloudFormation, CDK
Languages
Python, PySpark, SQL, TypeScript, Node.js, Hive, Scala
Data & Big Data
Apache Spark, Hive, EMR, Airflow, dbt, ETL, Data Lakes, Distributed Processing
Lakehouse & Streaming
Apache Iceberg, Apache Hudi, Kinesis, Incremental Processing
Frameworks
React, Node.js (Express)
Architecture
Serverless, Event-Driven, Platform Engineering, IaC
Ops
CloudWatch, Git, CI/CD, Jira, On-call
Analytics
QuickSight

Professional Summary

Senior Data Engineer with 10+ years building enterprise-scale data platforms on AWS — across Fintech, Asset Management, Compliance, and E-commerce.

Deep hands-on experience with distributed ETL/ELT, data lakes, and warehouses using Python, PySpark, SQL, and the AWS data stack: Spark, EMR, Lambda, Step Functions, Airflow, ECS, Redshift, Aurora Postgres, DynamoDB, S3, dbt.

Built reusable platform frameworks, workflow orchestration systems, and event-driven architectures for mission-critical financial reporting and compliance workloads. Comfortable owning systems end-to-end: design, build, deploy, monitor, on-call.

Professional Experience

Goldman Sachs — Senior Data Engineer

Feb 2024 – Present
  • rave — Ingest 100+ financial feed types (Aladdin / BlackRock) — Node.js Lambdas triggered by S3, writing to DynamoDB; React UI + API Gateway + Cognito over 7+ repos.
  • apollo — Owned a 200+ model dbt warehouse on Aurora Postgres. Step Functions orchestrate per-process extract → load → dbt-run on ECS; infra deployed via AWS CDK.
  • nnip2statpro — Python Lambdas pull from internal APIs, write to S3; a third-party app moves files from S3 to external (non-Goldman) consumer drives.
  • Production support across all three: monitoring, triage, root-cause analysis.

Amazon — Data Engineer

Aug 2017 – Jun 2023
  • yuno — Reusable AWS CDK construct library (TypeScript) adopted across data engineering teams. Users submit a spark-submit or PySpark file; yuno provisions EMR + Airflow/Step Functions + VPC/IAM/S3 with tracking, retries, and alarms built in.
  • dejavu — Datalake of 60+ OFA tables. Daily EDX parcels → two PySpark EMR jobs (stage, then dedupe via rownum + partitioned Parquet).
  • daxter — YAML-driven Python ETL framework. Multi-source / multi-destination (S3, MySQL, Redshift, WorkDocs, DataNet) with SQL transforms. Started on EC2, moved to ECS.
  • docket — SNS → SQS → Lambda → S3 (CSV); failures to DLQ; daily Glue compaction → Parquet for reporting.
  • purchase register — Audit-ready report across 60+ OFA tables. Built a Redshift Airflow operator and a DAG of 40+ tagged tasks; enabled accounting to claim $100M / month in Input Tax Credit.
  • fintech fold — Python monitor for daxter / yuno / horizon job platforms via their APIs — health, retries, alarms.
  • Designed and supported the broader Amazon Finance data platform on EMR + S3 with secured access. Built Airflow ETL pipelines generating monthly / daily compliance reports.
  • Spark and Hive performance — partitioning strategies, large-scale tuning.
  • Self-service analytics dashboards with QuickSight; partnered with IN / EU stakeholders for country-tax-officials reports.
  • On-call: incidents, RCA, operational troubleshooting. Architecture reviews, code reviews, platform standardization.

Amazon — Support Engineer / Lead Generation Analyst

Aug 2014 – Aug 2017
  • Automated 35% of goal backfill by integrating internal ML attribute-prediction services — no manual audit needed.
  • Built ML text-classification models (Scala, Python, AWS SageMaker) for attribute backfilling at scale.
  • Backfilled missing / incorrect attributes for top-glance-view ASINs at scale.
  • Built an internal website to lift auditor productivity; published weekly metrics on goal coverage and auditor performance.
  • Lead generation: identified seller prospects via Google / Bing / DuckDuckGo web scraping, deduped against onboarded sellers, tuned a pattern-classifier for categorization.

Learning

  • Apache Iceberg
  • Apache Hudi
  • Streaming Architectures with Kinesis
  • AI Agents & Agentic Workflow Systems
  • Modern Lakehouse Architectures

Education

  • B.Tech (Mechanical Engineering) — NIT Surat — 2014
  • Intermediate (MPC) — Sri Chaitanya Jr Kalashala — 2009
  • Matriculation — Siddhartha Grammar School — 2007
Copied!