Kulwinder Kaur

Logo

Data Scientist | Data Analyst

View My GitHub Profile

View My LinkedIn Profile

Contact me

Portfolio

Projects

Project 1 - E-commerce Startup Sales Forecast Model Assessment

Overview: Spearheaded the development of a sophisticated E-commerce sales forecast model, harnessing cutting-edge time-series techniques such as SARIMAX, FB Prophet, XGBoost, and LSTM neural networks. To empower our stakeholders with actionable insights, I meticulously visualized the model’s findings through dynamic Tableau dashboards.

Tools : Python Jupyter Notebook Plotly TensorFlow Scikit-Learn SARIMAX LSTM FB Prophet XGBoost Tableau
Category : Sales Forecast, Prediction
Year : Feb 2023
Figure 1.1 Snapshot From Tableau Public Dashbaord - Product Popularity By Month
snap-tableau Link to my Tableau Public Profile

Project 2 - NLP Based Recipe Recommendation System

Overview: Utilized advanced content-based filtering with cosine similarity for recipe recommendations to suggest most similar recipes based on user input preference be it recipe name or ingredients. Built it on Recipe Box, which contains ~125K scraped recipes.

Tools : Python Jupyter Notebook NLP NLTK Gensim TFIDF-Vectorizer Word2Vec LDA
Category : Recommender System, Unsupervised NLP based Content-based filtering
Year : Sept 2023
Figure 2.1 Word Cloud - Ingredients
snap-recipes

Project 3 - Empowering Change through Data Analysis, Dash, and Interactive Charts

Overview: Embarked on a transformative journey with a mission to make a difference. My focus: U.S. mass shooting data. Armed with analytical tools and unwavering resolve, I explored this critical issue, unearthing insights that could spark change. Didn’t stop at analysis; I also built a dynamic Dash application enriched with interactive Plotly charts.

Tools : Python Jupyter Notebook Plotly Dash
Category : Data Analysis
Year : Jul 2023

Project 4 - Big Data Wrangling with Google Books Ngrams

Overview: In this project, I applied the skills to analyze a vast real-world dataset. The dataset, Google Ngrams, represents a substantial portion of digitized books throughout history, hosted on Amazon S3. My workflow involved setting up an EMR cluster with Hadoop, Spark, Hive, Jupyterhub, and Livy. I loaded, filtered, and visualized the data, which was stored in CSV format on S3. Using PySpark, I performed data exploration, filtering, and analysis, including plotting token occurrences over time. Additionally, I compared Hadoop and Spark as distributed file systems, highlighting their key advantages and differences.

Tools : Python Jupyter Notebook Pyspark Spark SQL HDFS Boto AWS S3 Bucket EMR Cluster
Category : Big Data Analysis
Year : Feb 2023
Figure 4.1 : Project flow diagram
flow-dia

Project 5 - Hotel Review Sentiment Analysis

Overview: Utilized NLP to preprocess the hotel review text data, did data wrangling and feature engineering. Employed different machine learning algorithms such as KNN, Logistic Regression, Decision Tree to predict the likelihood of rating. Evaluated the model predictions using appropriate evaluation metrics and visualized the confusion matrix.

Tools : Python Jupyter Notebook NLP KNN Logistic Regression Decision Tree Grid Search
Category : Classification
Year : Jan 2023

Project 6 - West Nile Virus Dataset Statistical Analysis

Overview: Used StatsModel modules (chi2 test, ANOVA, linear and logistic regression) to perform statistical EDA and interpret the correlation between variables and the WNV presence in Chicago between 2008 and 2019.

Tools : Python Jupyter Notebook Stats Model Chi2 Linear Regression Logistic Regression
Category : Hypothesis Testing, Statistical Analysis
Year : Dec 2023

Project 7 - Bixi Data Exploration using MySQL and Tableau

Overview: Manipulated the BIXI data and performed EDA using MySQL and Tableau to understand and communicate meaningful findings about the influential factors on bike usage volume and overall company growth via an interactive dashboard.

Tools : MySQL Tableau
Category : Data Analysis using SQL, Dashboards
Year : Nov 2023
Figure 7.1 Tableau dashboard on Bixi Bike usage
bixi-tableau

Hackathon

Enhancing Customer Engagement : Aeroplan Air Canada

Figure - Evaluation Metrics evaluation-metrics