Nikhil Dhiman - Data Engineer

Danish Bank Spark Batch ETL

End to End Apache Spark batch ETL pipeline to read data from HDFS, transform and load it into target dimensions and facts on MYSQL.


  • Engineered a dimensional data model comprising four dimensions and one fact table to efficiently analyze 2.5 million ATM transactions enriched with real-time weather data.
  • Transformed and integrated raw transactional and weather data from 113 ATMs into a scalable analytical schema for deep insight extraction.
  • Enabled time-series, location-based, and card-type-specific insights by architecting normalized dimensions for ATM, location, date, and card type.

Tech Stack: Apache Spark with Scala, Sqoop, EMR, AWS, RDS








Real-Time Trip Insights Engine

Modern data pipeline built with Apache Kafka, Spark, and Snowflake to process user interactions and destination trends for real-time travel insights and dashboarding.


  • Integrated external data sources including weather APIs and geolocation metadata to enrich user interaction datasets, enhancing the accuracy of travel trend and behavior analytics.
  • Streamed and transformed user interaction events using Apache Kafka and Spark Structured Streaming, ensuring low-latency data delivery into Snowflake for near real-time analytics.
  • Empowered business intelligence through curated data marts and dashboards, enabling location-based, time-series, and preference-driven insights for travel trend analysis.
  • Orchestrated end-to-end ETL workflows using Apache Airflow to automate data ingestion, transformation, and loading pipelines with dependency management and failure recovery.

Tech Stack: Apache Spark with Python (PySpark), Kafka, Snowflake, Apache Airflow








Customer Acquisition Analytics Pipeline

Data pipeline built with Apache Spark, dbt, and Google BigQuery to unify multi-channel marketing, ad performance, and CRM data for real-time customer acquisition insights and campaign optimization.


  • Integrated external data sources including Google Ads, Meta Ads, and HubSpot CRM to consolidate marketing performance data and customer journey metrics into a centralized analytics platform.
  • Processed ad engagement and conversion streams using Apache Flink for real-time transformation and enrichment, enabling low-latency visibility into CAC, ROI, and conversion funnels.
  • Automated ETL workflows and attribution logic using Apache Airflow, ensuring reliable daily updates and real-time pipelines with failure recovery and lineage tracking.

Tech Stack: Apache Spark with Python (PySpark), dbt, Google BigQuery, Looker








Sqoop Automation

Ingestion Automation Framework


  • Automated ingestion framework using Apache Sqoop and Shell scripting to streamline data transfer between MySQL and Hadoop HDFS with support for job creation, query execution, and conditional imports.
  • Developed a modular Bash-based interface to automate full and incremental Sqoop imports, exports, job execution, and code generation for multiple MySQL tables.
  • Streamlined ingestion workflows by integrating dynamic user inputs, metadata handling, and result logging, reducing manual overhead and improving reusability.

Tech Stack: Apache Sqoop, Bash Shell Scripting








Java-Based HDFS File Processing Tool

Standalone Java utility to read, process, and display file contents stored in Hadoop Distributed File System (HDFS) using the Hadoop FileSystem API.


  • Developed a Java-based CLI application to connect with HDFS, access target files, and display their contents in a structured format using core Hadoop libraries.
  • Implemented custom exception handling and file system validation to ensure robust interaction with HDFS and avoid runtime errors in distributed environments.
  • Enabled flexible file path input and streamlined HDFS connectivity for local testing and integration into larger Hadoop-based data processing pipelines.

Tech Stack: Java, MapReduce, EMR
Tool

My Data Engineering Expertise Includes, but Is Not Limited To

Apache Spark

Apache Spark

Hands-on experience with Spark actions and transformations using Scala and Python to process large-scale datasets efficiently.

Data Warehouse

Skilled in building data warehouse solutions using Apache Hive, leveraging partitioning and bucketing for efficient querying and analytics.

Data Orchestration

Experienced in managing data pipelines with Apache Airflow, building robust DAGs for scheduling and monitoring ETL workflows.

Cloud Data Engineer

Cloud-native data engineering on AWS, GCP & Azure Services.

My Reviews

What people say about
working with me

Dr. Armando Beltran

Assistant Professor of Computer Science

Nikhil has consistently distinguished himself through his exceptional programming skills, critical thinking, and advanced mathematical knowledge, which are essential for solving complex problems in artificial intelligence. His academic performance in these courses has been outstanding, and based on his achievements, I would rank him in the top 1% of all students I have taught at Cal State LA.

Chandrapal Singh

Director & Co-Founder

I’m happy to working with Nikhil, he is having a good problem solving skill. I recommend him and his team.

Abichal Jha

iOS Developer @ Gartner

Nikhil is one of the best among all the people I have ever worked with. As I remember,he was a very productive person, hardworking, broad-minded and forward thinking individual. Intelligent, ambitious, energetic and proactive perfectionist. Desire for proficiency and education makes him a valuable asset to the team. Working with him is a signature of success.

Upendra Kumar Tiwari

Data Scientist | Machine Learning Engineer

You have been the hard working and sincere student of mine. whichever assignment was given to you, you completed them within time. The characteristic that you posses is your hardwork, always trying to learn new technology and your simplicity will always help to reach your goal.

Michael Hassey

Chief Technical Officer

I had the pleasure of working with Nikhil on the MPC project, where his technical expertise, proactive problem-solving, and collaborative mindset were truly remarkable. He consistently demonstrated a deep understanding of complex systems, offering innovative solutions that drove the project's success. Nikhil's ability to communicate effectively and his dedication to delivering high-quality results made him an invaluable team member. It was an absolute privilege to work alongside someone so skilled and committed, and I highly recommend him for any future opportunities.

Counters

Driven by passion, defined by results

A results-driven engineer with 4+ years of experience turning complex problems into simple, impactful solutions across software, mobile, data, and AI.

150+

Global collaboration across roles and domains, fueled by cross-functional teamwork.

10+

Awards & recognitions for innovation, dedication, and impact acros projects and teams.

12+

Successfully completed projects spanning multiple domains and industries

8+

Client and Employer reviews reflecting trust, satisfaction, and success

Contact Me

Got a question or an idea? Feel free to reach out! Fill out the form below, and I’ll get back to you soon.

  • United States of America (USA)

  • hello@nikhildhiman.me