Data Scientist | Data Analyst
Overview: Spearheaded the development of a sophisticated E-commerce sales forecast model, harnessing cutting-edge time-series techniques such as SARIMAX, FB Prophet, XGBoost, and LSTM neural networks. To empower our stakeholders with actionable insights, I meticulously visualized the model’s findings through dynamic Tableau dashboards.
Tools :
Category : Sales Forecast, Prediction
Year : Feb 2023
Figure 1.1 Snapshot From Tableau Public Dashbaord - Product Popularity By Month
Link to my Tableau Public Profile
Overview: Utilized advanced content-based filtering with cosine similarity for recipe recommendations to suggest most similar recipes based on user input preference be it recipe name or ingredients. Built it on Recipe Box, which contains ~125K scraped recipes.
Tools :
Category : Recommender System, Unsupervised NLP based Content-based filtering
Year : Sept 2023
Figure 2.1 Word Cloud - Ingredients
Overview: Embarked on a transformative journey with a mission to make a difference. My focus: U.S. mass shooting data. Armed with analytical tools and unwavering resolve, I explored this critical issue, unearthing insights that could spark change. Didn’t stop at analysis; I also built a dynamic Dash application enriched with interactive Plotly charts.
Tools :
Category : Data Analysis
Year : Jul 2023
Overview: In this project, I applied the skills to analyze a vast real-world dataset. The dataset, Google Ngrams, represents a substantial portion of digitized books throughout history, hosted on Amazon S3. My workflow involved setting up an EMR cluster with Hadoop, Spark, Hive, Jupyterhub, and Livy. I loaded, filtered, and visualized the data, which was stored in CSV format on S3. Using PySpark, I performed data exploration, filtering, and analysis, including plotting token occurrences over time. Additionally, I compared Hadoop and Spark as distributed file systems, highlighting their key advantages and differences.
Tools :
Category : Big Data Analysis
Year : Feb 2023
Figure 4.1 : Project flow diagram
Overview: Utilized NLP to preprocess the hotel review text data, did data wrangling and feature engineering. Employed different machine learning algorithms such as KNN, Logistic Regression, Decision Tree to predict the likelihood of rating. Evaluated the model predictions using appropriate evaluation metrics and visualized the confusion matrix.
Tools :
Category : Classification
Year : Jan 2023
Overview: Used StatsModel modules (chi2 test, ANOVA, linear and logistic regression) to perform statistical EDA and interpret the correlation between variables and the WNV presence in Chicago between 2008 and 2019.
Tools :
Category : Hypothesis Testing, Statistical Analysis
Year : Dec 2023
Overview: Manipulated the BIXI data and performed EDA using MySQL and Tableau to understand and communicate meaningful findings about the influential factors on bike usage volume and overall company growth via an interactive dashboard.
Tools :
Category : Data Analysis using SQL, Dashboards
Year : Nov 2023
Figure 7.1 Tableau dashboard on Bixi Bike usage
Figure - Evaluation Metrics