Hi! Welcome to my portfolio!

Shahriar Namvar

Data Analyst | Chemical Engineer

About me

I’m a Ph.D. candidate in Chemical Engineering with hands-on experience in data analytics and engineering, developed through experimental research and internships at LyondellBasell. I integrate analytical tools such as Python, SQL, and data visualization with deep expertise in experimental research and advanced materials development.

Skills

Data analytics

Python
SQL
Matlab
Power BI
Excel
Tableau
Git
Data Modeling
Data Visualization
Database Design

Education

Doctor of Philosophy (Ph.D.)

Chemical Engineering, University of Illinois Chicago, 2025 (Expected)

Master of Science

Mechanical engineering, University of Illinois Chicago, 2024

Bachelor of Science

Chemical engineering, 2019

Projects

Python

Generator Temperature Monitoring Dashboard

An interactive dashboard in Python using Dash and Plotly packages to monitor time-series data.

Python & sql

Library management system Database

Database design using Python and PostgreSQL to manage and track library resources and client interactions.

SQL & Power BI

Pizza sales analysis

Data cleaning and running queries from pizza sales data using SQL and dashboard design with Power BI.

Python

Life expectancy data analysis and forcasting

Exploratory data analysis of WHO life expectancy data with Python

power BI

Ecommerce sales dashboard

Created an interactive dashboard in power BI to demonstrate the sales of an Ecommerce business in US in 2020

Power BI and Python

Chicago crimes analysis

Exploratory data analysis of Chicago crimes between
2003 - 2018 with Python and dashboard design with Power BI.

Python

Web scrapping using Python

Developed a code in python to obtain food name, recipe, serving size, etc. from Tesco.com

Python

Most Streamed Spotify Songs 2023 EDA

Analyzing the Hottest Spotify Tracks in 2023 with Python.

Thanks!

I appreciate you taking the time to visit my portfolio!
If you'd like to chat about me joining your data team, feel free to email me!

Back

Python | Life expectancy data analysis and forcasting

In this project, we are analyzing factors that impact life expectancy collected from 193 countries. We first deep dive into data and conduct a thoroguh data cleaning and processing, and then build a regression model for life expectancy prediction.

Data Source and description

This data was obtained from Kaggle data and the original data was collected by Global Health Observatory (GHO) data repository under World Health Organization (WHO). The dataset includes various information such as adult mortality, alcohol consumption, BMI, population, etc. for 193 countries. The total number of columns and rows are 22 and 2937, respectively.
The source code can be found in the link below:

Github

key takeaways

Results reveal that average life expectancy is increasing overtime.

To fill the null values, correlation matrix was constructed to find the variable that has the highest correlation with the specific column for optimal filling of null values. Based on correlation matrix results, linear regression was employed to find the best estimate for null values.

We plotted histograms to get a better understanding of the distribution of variables. the histogram reveals noticeable outliers in certain columns, such as Population. Specifically, the outlier in the Population column stems from the significantly higher population of India compared to other countries.

We used barplots to detect outliers.

The data highlights higher life expectancy in developed countries.

Unexpectedly, the consumption of alcohol is notably higher in developed countries, where life expectancy is also higher, compared to developing countries.

In the examination of factors influencing life expectancy, it is observed that years of schooling tend to be higher in developed countries as opposed to developing countries.

Back

Python | Chicago crime explarotary data analysis

For a variety of reasons, crime in Chicago is an intriguing topic to investigate. Personally, I've lived in Chicago for almost three years, and crime is usually a topic of discussion with friends and family. Another factor is the abundance of publicly available (high quality) crime datasets available for data scientists to mine and study, such as this one.

Data Source and description

This data was obtained from Kaggle data and the original data was collected by the city of Chicago for public use, pertaining to all crime data which occurred between 2001 and 2023. The data contains the date of the crime, where it was committed as well as the type of crime committed. The total number of columns and rows are 22 and 7784664, respectively.
The Python code can be found at the link below:

Github

Here is the dashboard I created in Power BI, as well as the results from data analysis in Python.

key takeaways

Results indicate that only 25% of all criminals have been arrested since 2001.

There is a decreasing trend in overall number of crimes and arrests from 2011 to 2018. This could mean that as opposed to common belief, Chicago is actually becoming a safer place to live in!

Here we found the distribution of primary crimes. Based on the chart, Theft, Battery, and Criminal damage have had the highest frequency.

To get further insight into thefts as the most occured crime, we found the distrubtion of number of thefts versus each hour in a day. Results indicate that almost 105,000 crimes happend around noon!

To get an idea about the safety status or crime occurence in each district, we calculated the total number of crimes per each district. According to the results, district 8 has the highest numebr of crimes. This district is close to Midway airport which is on the south side of the city.

Back

Python | Web scrapping project

As a data geek and a foodie at heart, I set out to create a comprehensive database of food names, recipes, and related information from a popular culinary website (realfood.tesco.com). Through the process of web scraping, I endeavored to gather, organize, and analyze this wealth of culinary knowledge, presenting it in an accessible format for fellow food enthusiasts, aspiring chefs, and data-driven culinarians alike

Data Source and description

This website has a comprehensive food data for breakfast, lunch, snacks, salads, and dinner. For now, I only scrapped the lunch food ideas. Using for loops, I iteretaviley looped through each page and obtained the neede information.
Source code can be found in the link below:

Github

key takeaways

First, I employed beautifulsoup and requests package to connect to the website and use their html code.

This is where the game begin! This chunk of code shows the procedure of how to make the first table that includes food name, rating, and serving size.

Here is a snapshot of the final output.

I further extracted more information for each food such as cal/serving, preparation time, cooking time, and the categories each food would fall into.

Here are the outputs of the code.

Finally, as a simple recommendation system, the user enters as many ingredients as desired, and the code will generate a list of foods that includes the users requested ingredients.

Back

Python | Most streamed Spotify songs

In this project, I conducted an in-depth analysis of the most streamed songs on Spotify in 2023. Exploring the musical landscape of the year, I uncovered top artists and songs, key features of successful songs that captivated listeners the most. From genres to popularity metrics, the analysis provides a comprehensive view of the music that defined the streaming scene in 2023.

Data Source and description

This data was obtained from Kaggle. The dataset includes various information such as artist name, song name, mode, key, bpm, etc.
The source code can be found in the link below:

Github

key takeaways

Here are the top 10 artists:

The top 20 songs with highest number of streams in 2023:

The distribution of solo artists and collaborative songs.

Distribution of keys in songs. C# proves to be the most popular one!

As it could be possible expected, the songs written in major mode outnumber the minor songs.

Let's see the distribution of danceability in the songs. It seems that a fairly high number of songs have a relatively high degree of danceability.

A quick glance into all of the data columns in histograms.

Back

SQL & Power-BI | Pizza sales analysis and visualization

This portfolio project centers around a comprehensive analysis of pizza sales, leveraging the power of SQL for robust data querying and manipulation and Power-BI for dashboard visualization.

Data Source and description

The dataset for this project was obtained from Maven analytics. Data consists of 4 tables, orders, order details, pizza, and pizza types.

Methodology

Part of data manipulation and processing was done using SQL, and for dashboard design in addition to SQL queries, DAX funcitons were also employed for further analysis.
SQL code can be found here:

Github

Key takeaways

The top 3 revenue-driving pizza styles are all Chicken (Thai Chicken, Barbeque Chicken, and California Chicken) with a cumulative sum of $127.61K. This leads me to believe that the most popular pizza category is actually Chicken and should be considered when creating promotions to increase revenue in the future.

Classic pizzas produced the highest revenue of $220.05K. Supreme pizzas produced the 2nd highest revenue of $208.20K. Chicken pizzas produced the 3rd highest revenue of $195.92K. Veggie pizzas produced the lowest revenue of $193.69K.

Large size pizzas has the highest number of sales followed by medium and small. This trend shows a clear preference among customers for large pizzas.

Fridays happen to have the highest sales compared to other days of the week. One root cause might be the start of weekend and people would get together mostly on Fridays. There should be more promotions and specials on Friday.

We also found the sales trend among all months and July has the highest sales in 2015 followed by May, Also, September and October sales are lower than average monthly sales.

I also calculated the ingredients with highest number of use in terms of pizza type, and garlic and tomatoes are the most frequently used ones. So, the restaurant should consider renegotiate wholesale prices on these ingredients to increase margins.

Back

Power BI | E-commerce sales analysis

This interactive dashboard is designed for a US based e-commerce sales company to mionitor several metrics such as year-to-date (YTD) sales, YTD profit, YTD profit margin, etc. for each KPI.

Data Source and description

The dataset for this project was obtained from Kaggle. Data consists of two tables one including the sales information such as customer ID, order date, delivery status, and another table including geographical information of US states.

Methodology

For this project, Excel was utilized for data cleaning and manipulation, and the dashboard design and further exploratory data analysis was carried out in Power BI using DAX and power query.

Questions for the analysis

What are the trends for YTD sales, profit ,and profit margin?

What are the top 5 products with highest YTD sales?

How is the sales trend per region?

Key takeaways

The data shows an overall positive trend in YTD profit. However, the quantity of items sold is ~7% lower than previous year.

The furniture category is the only category with positive year-over-year (YOY) while office supply and technology categories have undergone a decline.

YTD sales by region data demonstrates that the west coast holds the highest share among the four regions, with California having the highest YTD sales among all states.

Fridays happen to have the highest sales compared to other days of the week. One root cause might be the start of weekend and people would get together mostly on Fridays. There should be more promotions and specials on Friday.

Among the shipping types, the standard shipping by far has the highest YTD sales among the all shipping types.

Tools used

Power query in Power BI for loading and transforming the data

Creating date table

Dax functions to create measures for calculating YTD and YOY

Data modeling and connecting the 3 tables together

Data visualization and creating dashboard

Back

Python & PostgreSQL | Library Management System Database

This Library Management System project was developed using Python and PostgreSQL to manage and track library resources and client interactions. Using PostgreSQL, we implemented a relational database to store and manage data, and Python to handle application logic, data queries, and interactions with the database, enabling efficient library operations.

Methodology

We began by gathering and analyzing the requirements for a typical library management system. This step involved understanding the entities, relationships, and functions essential to managing library resources and client interactions. The requirements were documented, identifying the main entities such as Document, Author, Publisher, Copy, and Client. Here's a link to the GitHub Repository:

GitHub

Entity-Relationship (er) dIAGRAM

based on the requirements, we designed an Entity-Relationship (ER) diagram to define the structure of the PostgreSQL database. Key relationships were established as decipted below:

Relational schema

The ER diagram was then translated into the relational schema to define tables, primary and foreign keys.

Data definition language (ddl) commands

Python script

Python was used as the backend programming language to interact with the PostgreSQL database. Using libraries such as psycopg2, we developed functions to handle common operations in the system. Here's the link to the Python code:

GitHub

Tools used

Python: Used as the primary programming language to implement the backend logic of the library management system.

PostgreSQL: This relational database system was used to create, store, and manage the library's data.

psycopg2: A PostgreSQL adapter for Python, psycopg2 enabled Python to connect to the PostgreSQL database, execute SQL queries, and manage transactions.

Lucidchart: Used to create the Entity-Relationship (ER) diagram, which visually represented the database structure, entities, and relationships.

Git and GitHub: Version control was managed using Git, which allowed for tracking code changes, collaborating effectively.

Back

Python | Generatore Temperature Monitoring Dashboard

This Generator Temperature Prediction Dashboard was developed using Python, leveraging Dash and Plotly to create an interactive and user-friendly interface. The dashboard enables users to monitor and predict generator temperatures by adjusting model parameters and selecting specific time frames via a date picker. The project integrates both backend and frontend functionalities within Python, using a variety of packages to handle data processing, model predictions, and visualizations.

Methodology

We identified the key parameters influencing generator temperature to effectively predict overheating and potential failure. Ambient temperature and electrical current were selected as primary inputs due to their strong impact on generator temperature dynamics. The dashboard was then designed to provide users with intuitive controls, allowing them to adjust parameters and select specific date ranges for focused analysis within any desired timeframe.
Here's the link to the python code on GitHub:

GitHub

Dashboard demo

Below is a short demo showing the functionalities of the dashboard.

Python script

Image below shows a snippet of a part of the Python code. The full code is provided on the GitHub link below:

GitHub

Tools used

Jupyter Notebook: For developing and prototyping Python code for the dashboard.

Dash & Plotly: To create the dashboard layout and interactive visualizations.

HTML & CSS: For customizing the frontend design and layout.

Pandas: Used for data processing and data analysis.

Scikit-Learn: To build and train the predictive model for temperature forecasting.