Data Engineer, Data Scientist & Web Developer.
Specializing in productionized machine learning, MLOps, data analytics, and data engineering. Reach out to me using the contact form or email me at email@example.com
Projects that I have completed in various domains.
Machine Learning Operations (Mlops) On Aws Design And Development
Designed and developed a machine learning operations (MLOps) solution on AWS learn more
End-To-End Forecasting Pipelines For Supply Chain Of A Large Restaurant Franchise
machine learning, mlops
Forecasting inventory management solution for a large restaurant franchise to estimate ingredient usage up to a week in advance so appropriate stock can be ordered. learn more
Document Classification Pipelines With Retraining And Deployment To Inference Endpoint
machine learning, mlops
NLP model development, testing, and productionization through MLOps practice to programatically retrain, deploy and monitor performance
for classifying documents uploaded through a digital mailroom for a national mortgage lender. learn more
Replicating Data From The Data Warehouse Into Domain Specific Redshift
Data replication from the central data warehouse Redshift cluster to department specific Redshift instances using Apache Airflow as the ELT pipeline orchestrator for a large restaurant franchise. learn more
Hands on skills using popular cloud technologies to solve data and machine learning challenges in production at scale
Interested parties may view my resume to see if I am a good fit for what you're looking for
I like to build awesome things. Software engineering is my passion. I am a pragmatist rather than an idealist. A working solution is more important than a perfect one.
Master of Computer Science
2019 - 2021
Georgia Institute of Technology, Atlanta, Georgia
Specialization in machine learning
Senior Data Engineer
2021 - Present
- I am building DAGs that ingest data from Athena/Redshift, perform business logic to transform the data, run data preprocessing (normalizing values, flagging outliers), model training, and model inference. Inference is being done with batch processing rather than real-time. The client has no need for forecasting models to be real-time.
- For another client, I used the Azure Machine Learning studio to create DAGs for data ingestion, data preprocessing, model training, and model deployment. Deployment was done on a Kubernetes cluster (AKS) with the model exposed as a Dockerized REST API for real-time inference.
- I built a data mart for the client to lay the foundation for the enterprise data that is needed to build all the data sets used by supply chain forecasting models. I constructed ELT pipelines with Airflow to replicate data from multiple datalake sources into a supply chain data mart. I then created materialized views on top of the raw data so that the datasets are ready to be consumed by data scientists and other machine learning engineers. I utilized Flyway to create all tables and views so that there is version-controlled schema evolution. I optimized Redshift table definition Distribution Key and Sort Key configurations based on looking at the queries that are run most frequently so that performance is orders of magnitude faster. I created tables that model what is needed to implement a phase 2 approach where the client can utilize Arize and MLFlow to run automatic model monitoring, retraining, deployment, and A/B testing
- I am working on architectures for scaling the batch forecasts to be more performant. The simplest architecture uses AWS glue (pyspark) and utilizes distributed worker nodes performing tasks on partitions of data. Kafka or other streaming technology is on the table for online real-time forecasting.
Machine Learning Engineer
2020 - 2021
United Wholesale Mortgage
- Developed, tested, and integrated into production an end-to-end multilabel machine learning classification model that processes on average 1,000,000 requests per day asynchronously with 91% (+/- 5%) average daily accuracy.
- Models are deployed as a microservice on-prem and consumed by the document management tools. Concurrency, scalability, fault tolerance, and network security are built into the microservice.
Machine Learning Engineer
2017 - 2020
Ford Motor Company
- Analytics from IOT sensor data to predict machine failure in global plants. Lead the development of the machine learning time-series predictions, data ingestion/ETL pipeline, provisioning of PCF/AE5 platform for deployment, and updates to a web dashboard displaying refreshed sensor data with summary statistics.
- Developed machine learning pipelines built with Python to classify text data using NLP (gensim, spacy, NLTK).
Consulting on a broad range of topics.
I can build classification models for text data.
I can help you setup a data warehouse.
Using Airflow or similar orchestration tools I can build pipelines for ETL and ELT
I can develop object detection, segmentation, and custom vision models
Using a combination of SQL, Python, and visualization packages, I can setup dynamic dashboards that provide insights into the business operations.
I can build models from time series data constructing features that are tightly coupled to the domain your business is in.
You can reach out to me using the email below.
Alternatively, fill in the form below to send me a message.