Pandas GUI

Pandas
DataScience
Visualization
Tools
Productivity
author avatar
Rishit Javia Data Scientist @ Infocusp
9 min read  .  13 January 2025

blog banner

PandasGUI: A Visual Playground for Data Exploration

You have 10 hours to analyze a dataset. Would you spend 40% of that time on repetitive tasks like cleaning data and writing basic code? Or would you prefer to get those tasks done efficiently and use 95% of your time on meaningful analysis? PandasGUI provides a straightforward interface to handle the groundwork quickly, allowing you to focus on generating valuable insights.

What is PandasGUI?

PandasGUI is a Python library that provides a graphical user interface (GUI) for exploring and analyzing Pandas DataFrames. It's designed to simplify data exploration tasks, making it accessible to both beginner and experienced data scientists.

Key Features

  • Intuitive Interface: A simple and intuitive interface that requires minimal coding.
  • Interactive Exploration: Dynamically explore your data by zooming, panning, and filtering.
  • Data Manipulation: Perform data cleaning, transformation, and reshaping operations.
  • Statistical Summaries: Gain insights into your data's distribution, central tendency, and variability.

Getting Started with PandasGUI

Installation: Installing PandasGUI is as simple as running a single command in your terminal:

pip install pandasgui

Importing and Displaying Data:

  1. Import necessary libraries:
import pandas as pd
from pandasgui import show  
  1. Load your data:
    We will use this dataset, Kaggle - Ship Fuel Consumption for the hands-on tutorial
df = pd.read_csv(ship_fuel_efficiency.csv)
  1. Launch the GUI:
show(df)

Let’s understand PandasGUI with this sample data. The given dataset provides comprehensive data on fuel consumption and CO2 emissions for diverse ship types operating in Nigerian waterways. It includes crucial parameters like routes, engine efficiency, and monthly CO2 emissions, enabling in-depth analysis for various applications.

Pandas Dataframe View

Exploring the PandasGUI Interface

Once you launch PandasGUI, you'll see four main tabs:

DataFrame:

View dataframe:

Upon running show(df) you will land on this page, which displays the entire dataframe as it is.

Pandas GUI Dataframe View

Edit raw data:

Update the data values

Edit the data

As shown in the example, the column ship_type had a single anomalous value , Tanker. To make it align with other similar values, we can update the record directly from the UI to fix the outlier, and it’ll be reflected in the explorative analysis immediately. As shown, Tanker was there in the bar chart but then it disappears and got merged with the Tanker ship bar

Filter the data:

Isolate relevant information based on specific criteria.

  • Let's say we want to see data for Tanker Ship only and during a specific period.
    • For that we can create a filter with simple query language, in the Filters section :

Filters Example

Sort, Delete columns, Parse dates:

  • Upon right-clicking on the column, it shows up a menu with lots of options such as order, deletion, and parsing a column with date values.
  • It enables you to delete any unwanted columns that are not relevant for the analysis
  • Sorting data based on a column is a very simple yet useful operation. To sort the data based on any column, right click on the column head to see the menu with lots of options.
  • It also lets you convert text strings into date and time formats, if it has matching format.

Various Options

Statistics:

Here we have enough records with numerical data. Lets check what is the longest distance a ship covered in a single trip? And what is the average engine efficiency? The total number of ships is also an important factor we should know to get an overall idea while delving deeper into the analysis.

To see the statical summary, click on second tab on the top, named Statistics

Statistical Summary

  • It shows similar output as df.describe().
  • It helps to quickly check the crucial stats such as Min, Max, Mean, Unique count as well as total count. Along with the data type.
  • Standard Deviation helps in understanding the spread. For example, Fuel_Consumption has a very big standard deviation with respect to mean value, that implies a very wide range in fuel consumption.
  • However, to get a deeper understanding, we can take the help of various graphs.

Grapher:

As we previously sought to understand the distribution of Fuel_Consumption:

  • What would be the range?
  • How could it be related to the distance covered by a ship
  • Does it depend on fuel type/ ship type/ weather conditions ?
  • What type of ships goes on very long trips?
  • Is there any pattern between month of the year and distance covered or ship type?
  • What about CO2_emissions, another crucial factor here?

We should be able to answer lots of questions about the data before we make any business decision or build any subsequent model or data pipeline. Lets answer these questions using various graphs.

Here is a basic guide to choosing a plot based on the objective. Importantly, plot choice heavily depends on the given data and the specific objective.

Which visualization to use?

[Image taken from this page]

To start playing around with the plots, click on Grapher to launch this view.

Grapher Landing Page

To see the correlation between distance and fuel consumption, a scatter plot will be the best choice. Upon creating a scatter plot with x axis as distance and fuel consumption on the y axis, you will get an obvious question: “Why are there some points with very long distance making a clear pattern? Is it because of the ship type or fuel type?”

Scatter Plot

Now we know that only Tanker ships cover distances greater than 200 units. This increases curiosity to explore more about tanker ships. You can isolate the tanker ship data by putting a filter.

Another column we are interested in is CO2_emissions. Lets see how much CO2 these ships emit every month?

Line Plot

  • It was very well expected that tanker ships emit CO2 in the highest amount but the interesting thing to observe is that it is declining over time, while others CO2 emission remains steady.
  • However before making any conclusion we should check if the number of trips are also decreasing over time? If yes then the decline in CO2 emission is expected. Initially, we might consider creating a bar chart with months on the x-axis and the number of shipments on the y-axis. However, this approach is not feasible with PandasGUI if the data lacks a dedicated ‘Shipment Count’ column. That's one of the limitations of this tool.

Now going back to our basic question we wanted to answer, let's check CO2_emissions by fuel type. To do this, a bar chart could be the most helpful

As you can see, HFO (heavy fuel oil) emits ~20% less CO2 compared to Diesel, on average.

Bar Chart

Going forward with our basic exploration, let's see what is the range a ship covers in a trip. This could be studied by a box plot that gives information about min, max, median and a few quartiles.

Box Plot

  • This implies that tanker ships are used for trips of more than ~150 units, with some exceptions. This leads to the spread of the fuel consumption question, our first objective.

  • While we are at checking fuel consumption, let's also see if there is any effect of Diesel/ HFO ships.

    Grouped Box Plot

  • With this plot we could see that there is no Surfer Boat with HFO fuel type, a very good point to note.

After playing around with the data we learnt that there could be some relation between Fuel Consumption, Distance and CO2 emission. This can be analysed with the help of 3D scatter plot

3D Plot

  • As shown in the GIF, Fuel Consumption and CO2 emission are directly correlated. We can also notice a 2D plane formed making a clear pattern between these variables.

One thing you would have noticed is that the axis labels, titles of the charts don't match with the column name or what we want to represent. They are default values set by PandasGUI but we can change it according to our requirements by passing custom arguments.

In fact you can also pass custom arguments such as number of bins for a histogram as shown in the below example

Custom Argument

Export visualizations: Save plots in different formats.

  • PandasGUI allows users to download the figure in high resolution, export the python code and export arguments too.

Reshaper:

Pivot tables:

You can use this to summarize and aggregate data. It is super easy to pivot a dataframe and analyze it further according to your requirements. It will be added as a new dataframe in your pandasGUI session and you can use it with the original one by merging/concatenating. However the limitation here is that you can have only one index and one column in pivot.

Melt/Unpivot:

Reshape data from wide to long format.

Pivot-Unpivot

[Image taken from this page]

Merge and concatenate:

Combine DataFrames based on common columns or rows.

Pivoting

Why Choose PandasGUI?

  1. User-Friendly Interface: No coding required for basic exploration.
  2. Powerful Visualization Tools: Create stunning visualizations with ease.
  3. Data Manipulation Capabilities: Reshape and transform data effortlessly.
  4. Open-source and Free: Accessible to everyone.

Limitations to Consider:

  1. Limited Visualization Flexibility: While it provides a solid foundation for data exploration, it may not be the best choice for complex visualizations. Features like secondary y-axes and subplots, which are essential for in-depth analysis, are currently unavailable.
  2. Performance Issues with Large Datasets: When working with extensive datasets,it can experience performance bottlenecks, leading to freezing or slow response times. This can significantly hinder productivity and make it difficult to make timely insights.

In data analysis, time is your most valuable resource. With PandasGUI, you can reclaim the hours spent on repetitive groundwork and redirect them toward meaningful analysis. It’s more than just a tool—it’s your ally in simplifying data wrangling and creating essential visualizations effortlessly. Whether you have 10 hours or 10 minutes, PandasGUI will help you to make the most of your time and find valuable insights with ease.

References: