Skip to content

Deep Learning Budget Forecasting

TensorFlow DNN predicting infrastructure budgets at 94% R² across 5000+ projects - deployed serverless on AWS Lambda

94%R² Score

Problem

Infrastructure project managers were estimating budgets manually, leading to frequent overruns. A rules-based estimator couldn't capture the non-linear interactions between project scale, region, material costs, and timeline. And 35% of historical records had incomplete data.

Solution

  • Processed 30K+ historical records from S3 CSVs — Pandas merge on ProjectId, outlier detection, and missing-value imputation on 35% incomplete records.
  • Designed a TensorFlow/Keras DNN with multi-hot encoding and custom preprocessing layers for cost normalization.
  • Achieved 94% R² on holdout with 90/10 train-test-validation split.
  • Deployed as a containerized AWS Lambda API (ECR) with sub-500ms latency and JWT authorizer (JWE decryption).
  • S3 model storage with SSM Parameter Store versioning and weekly automated retraining via Airflow DAG on K8s with rollback safeguards.
  • 60% infrastructure cost reduction vs server-based deployment.

System Flow

Data

S3 CSVs
Pandas Merge

Preprocessing

Feature Eng
Normalization

Model

TF/Keras DNN
Train-Test Split

Deployment

Lambda + ECR
JWT Authorizer

Retraining

Airflow DAG
K8s Pod

Architecture

  • 01TensorFlow/Keras DNN with multi-hot encoding and custom preprocessing layers
  • 0230K+ records from S3 CSVs - Pandas merge, outlier detection + imputation on 35% incomplete data
  • 0390/10 train-test-validation split
  • 04Containerized AWS Lambda API (ECR) - sub-500ms latency, JWT authorizer
  • 05S3 model storage with SSM Parameter Store versioning + automated retraining
  • 06Weekly automated retraining via Airflow DAG on K8s with rollback safeguards

Impact

  • 94% R² on holdout set across 5000+ projects
  • 40% reduction in infrastructure estimation errors
  • Sub-500ms inference latency on Lambda - 1000+ monthly predictions
  • 60% infrastructure cost reduction vs server-based deployment

Tech Stack

PythonTensorFlowKerasscikit-learnAWS LambdaS3SSMECREKSDockerJWTPostgreSQL