Malav
Patel.

I build scalable data pipelines and real-time analytics systems. MS Computer Science at UMass Dartmouth.

proof_of_work.json
351K
rows promoted to Silver analytics layer
926K
MTA bus arrivals analyzed
14%↑
sales growth driven with SQL
256GB
genomic data processed
PySpark
Delta Lake
Microsoft Fabric
Azure Functions
Power BI
Python
SQL
Azure Event Hub
KQL (Kusto)
Medallion Architecture
Star Schema
Docker
PySpark
Delta Lake
Microsoft Fabric
Azure Functions
Power BI
Python
SQL
Azure Event Hub
KQL (Kusto)
Medallion Architecture
Star Schema
Docker
About

Pipelines, patterns,
and a bit of precision.

I am a Data Engineer & Analyst completing my MS in Computer Science at the University of Massachusetts Dartmouth (May 2026). I specialize in end-to-end data pipelines, real-time streaming architectures, and analytics platforms built to scale.

"I want to make data infrastructure invisible — so the insights are all people ever have to think about."

My work spans cloud-native engineering on Azure and Microsoft Fabric, machine learning applied to genomics and healthcare, and business intelligence that connects raw numbers to real decisions.

Outside of data: published an app with 10,000+ downloads, competed in national hackathons, and represented my school in state and national-level sports.

2021
Started B.Tech in IT — Charusat University, India
Dived into C, C++, Python. Built a note-taking app with OTP authentication.
2022
Regional Hackathon Win
Built a smart bus service prototype for Surat Municipal Corporation.
2023
Data Scientist Intern — Genomics Research
Processed 256 GB of genomic data for personalized kidney transplant medication at MPUH.
2024
Graduated B.Tech in IT (9.07 CGPA) · Charusat University, India
Began MS in CS at UMass Dartmouth. Data Analytics internship at Atliq Technologies.
Oct 2024
Dual Hackathon Awards
Best Data Extraction Accuracy + Best Collaboration at the Intelligent Data Discovery Hackathon.
May 2026
Graduated MS in CS · University of Massachusetts Dartmouth, USA · Open to Work
Seeking full-time Data Engineering roles. Available immediately.
Skills

Tools I reach for
when the data is messy.

Languages
Python
SQL
C++
Java
C
Data Engineering
PySpark
Delta Lake
Microsoft Fabric
Medallion Architecture
Apache Spark
ETL / ELT Design
Data Factory
Hadoop
Cloud & Streaming
Azure Functions
Azure Event Hub
Azure Blob Storage
KQL (Kusto)
AWS
BigQuery
Databases
MySQL
Star Schema Design
MongoDB
DuckDB
Visualization & BI
Power BI
DAX
Matplotlib
Seaborn
Plotly
ML & Science
TensorFlow / Keras
Scikit-learn
PyTorch
Pandas
NumPy
Experience

Where I have shipped
something real.

Data Analyst Intern
Atliq Technologies · Vadodara, India
Jan 2024 — Apr 2024
+
SQLPower BIDAXKPI AnalysisStakeholder Reporting
  • Conducted in-depth user behavior & store performance analysis — identified insights that boosted sales by 14.21%
  • Evaluated 2 years of sales data via SQL to surface KPIs: total profit, average basket size, and spending patterns
  • Identified the top 3 employees responsible for 27% of total sales; recommended performance-based incentive model
  • Prepared and delivered weekly + final reports distilling technical findings for non-technical stakeholders
Case Study →
Data Scientist Intern
Muljibhai Patel Urological Hospital · Nadiad, India
May 2023 — Jun 2023
+
PythonKerasNumPy / PandasPharmacogenomicsManhattan Plots
  • Conducted genome research under Chief Scientist Sachchidanand Pandey — pharmacogenomics for kidney transplant patients
  • Processed and analyzed 256 GB of unstructured genomic data, using Manhattan plots to identify genetic variants affecting drug response
  • Built ML models (Keras) to predict optimal medication dosing, improving personalized treatment precision
  • Collaborated with medical experts to align data-driven models with clinical requirements
Letter of Recommendation ↗
Projects

Systems I designed
and built myself.

02
Data Engineering · Transit Analytics · Real-Time
NYC MTA Bus Reliability Tracker
Real-time pipeline analyzing 926K+ bus arrivals. Detects ghost buses, delays, and bunching across 4 NYC routes. 64.6% system on-time rate. 88% false positive reduction. 40 pytest tests passing.
PythonDuckDBStreamlitDockerMTA APIpytest
03
Big Data · Azure · Real-Time Streaming
Real-Time Traffic Data Analytics
Azure-native pipeline collecting traffic data every 10 minutes from HERE API + live weather scraping. 1,440+ daily data points. Zero manual effort. Interactive Power BI dashboard across 3 cities.
Azure FunctionsEvent HubBlob StorageKQLPower BIBeautifulSoup
Recognition

Awards &
Certifications.

Best Data Extraction Accuracy
Intelligent Data Discovery Hackathon · FUCHS & UMass Dartmouth
OCT 2024
View Post ↗
Best Collaboration & Teamwork
Intelligent Data Discovery Hackathon · FUCHS & UMass Dartmouth
OCT 2024
View Post ↗
Grand Finale Selection
Azadi ka Amrit Mahotsav Hackathon
OCT 2022
View Post ↗
Google Data Analytics Professional Certificate Verify ↗
Machine Learning — DeepLearning.AI Verify ↗
Get in touch

Let us build
something together.