📕
Dan Fitz's Notes
  • README
  • Ai
    • Supervised Machine Learning
      • Introduction To Machine Learning
      • Regression With Multiple Input Variables
      • Classification
  • Csharp
    • C Sharp Advanced
      • Generics
      • Delegates
      • Lambda Expressions
      • Events
    • C Sharp Fundamentals
      • Intro To C
      • Primitive Types And Expressions
      • Non Primitive Types
      • Control Flow
      • Arrays And Lists
      • Working With Dates
      • Working With Text
      • Working With Files
      • Debugging Applications
    • C Sharp Intermediate
      • Classes
      • Association Between Classes
      • Inheritance
      • Polymorphism
      • Interfaces
  • Java
    • Inheritance Data Structures Java
      • Inheritance Polymorphism Using Overriding And Access Modifiers
      • Abstract Classes And Debugging
      • File I O And Exceptions
      • Collections Maps And Regular Expressions
    • Intro To Java
      • Introduction To Java Classes And Eclipse
      • Unit Testing Arrays And Array Lists
      • Static Variables Methods And Polymorphism Using Overloading
  • Javascript
    • Algorithms Data Structures
      • Big O Notation
      • Analyzing Performance Of Arrays And Objects
      • Problem Solving Approach
      • Problem Solving Patterns
      • Recursion
      • Searching Algorithms
      • Bubble Selection And Insertion Sort
      • Merge Sort
      • Quick Sort
      • Radix Sort
      • Data Structures Introduction
      • Singly Linked Lists
      • Doubly Linked Lists
      • Stacks And Queues
      • Binary Search Trees
      • Tree Traversal
      • Binary Heaps
    • Complete Nodejs
      • Understanding Node.js
      • REST AP Is And Mongoose
      • API Authentication And Security
      • Node.js Module System
      • File System And Command Line Args
      • Debugging Node.js
      • Asynchronous Node.js
      • Web Servers
      • Accessing API From Browser
      • Application Deployment
      • Mongo DB And Promises
    • Complete React Native
      • Working With Content
      • Building Lists
      • Navigating Users Between Screens
      • State Management
      • Handling Screen Layout
      • Setting Up An App
      • More On Navigation
      • Advanced Statement Management With Context
      • Building A Custom Express API
      • In App Authentication
    • Epic React
      • React Fundamentals
      • React Hooks
      • Advanced React Hooks
      • Advanced React Patterns
      • React Performance
    • Fireship Firestore
      • Firestore Queries And Data Modeling Course
      • Model Relational Data In Firestore No SQL
    • Functional Light Javascript
      • Intro
      • Function Purity
      • Argument Adapters
      • Point Free
      • Closure
      • Composition
      • Immutability
      • Recursion
      • List Operations
      • Transduction
      • Data Structure Operations
      • Async
    • Js Weird Parts
      • Execution Contexts And Lexical Environments
      • Types And Operators
      • Objects And Functions
      • Object Oriented Java Script And Prototypal Inheritance
      • Defining Objects
    • Mastering Chrome Dev Tools
      • Introduction
      • Editing
      • Debugging
      • Networking
      • Auditing
      • Node.js Profiling
      • Performance Monitoring
      • Image Performance
      • Memory
    • React Complete Guide
      • What Is React
      • React Basics
      • Rendering Lists And Conditionals
      • Styling React Components
      • Debugging React Apps
      • Component Deep Dive
      • Building A React App
      • Reaching Out To The Web
      • Routing
    • React Testing
      • Intro To Jest Enzyme And TDD
      • Basic Testing
      • Redux Testing
      • Redux Thunk Testing
    • Serverless Bootcamp
      • Introduction
      • Auction Service Setup
      • Auction Service CRUD Operations
      • Auction Service Processing Auctions
    • Testing Javascript
      • Fundamentals Of Testing
      • Static Analysis Testing
      • Mocking Fundamentals
      • Configuring Jest
      • Test React Components With Jest And React Testing Library
    • Typescript Developers Guide
      • Getting Started With Type Script
      • What Is A Type System
      • Type Annotations In Action
      • Annotations With Functions And Objects
      • Mastering Typed Arrays
      • Tuples In Type Script
      • The All Important Interface
      • Building Functionality With Classes
    • Web Performance With Webpack
      • Intro
      • Code Splitting
      • Module Methods Magic Comments
  • Other
    • Algo Expert
      • Defining Data Structures And Complexity Analysis
      • Memory
      • Big O Notation
      • Logarithm
      • Arrays
      • Linked Lists
      • Hash Tables
      • Stacks And Queues
      • Strings
      • Graphs
      • Trees
    • Aws Solutions Architect
      • AWS Fundamentals IAM EC 2
    • Fundamentals Math
      • Numbers And Negative Numbers
      • Factors And Multiples
      • Fractions
    • Mysql Bootcamp
      • Overview And Installation
      • Creating Databases And Tables
      • Inserting Data
      • CRUD Commands
      • The World Of String Functions
      • Refining Our Selections
      • The Magic Of Aggregate Functions
    • Random Notes
      • Understanding React Hooks
  • Python
    • Data Analysis Using Python
      • Loading Querying And Filtering Data Using The Csv Module
      • Loading Querying Joining And Filtering Data Using Pandas
      • Summarizing And Visualizing Data
    • Intro To Python
      • Course Introduction Intro To Programming And The Python Language Variables Conditionals Jupyter Notebook And IDLE
      • Intro To Lists Loops And Functions
      • More With Lists Strings Tuples Sets And Py Charm
      • Dictionaries And Files
Powered by GitBook
On this page
  • pandas
  • Loading data
  • Inspecting data
  • Querying data
  • Casting data
  • Cleaning data
  • Joining data
  • Filtering data
  • Computing data
  • Updating/creating data
  1. Python
  2. Data Analysis Using Python

Loading Querying Joining And Filtering Data Using Pandas

pandas

pandas is a library that provides a lot of data analysis functionality.

Loading data

pandas allows you to interact with databases and Excel files.

Here's how you'd load an Excel file:

import pandas as pd
xls = pd.ExcelFile('file.xlsx')
df = xls.parse('sheet_name') # creates a DataFrame for a specific sheet

Inspecting data

Here are some useful attributes and methods to give you details about your data:

len(df) # number of rows
df.shape # size of rows and columns
df.count() # count of values in each column
df.columns # access column headers
df.dtypes # view data types for each column

df.describe() # built-in summary statistics for numerical values
df.head() # first 5 rows
df.head(100) # first 100 rows

df = df.drop_duplicates() # removes all duplicate rows (ALL cells identical)

Querying data

Here are some useful operations to query data:

# Returns column values for every row
df["column_name"]

# Returns specified column values for every row
atts = ["col1", "col2", "col3"]
df[atts]

Casting data

Sometimes the data type for a DataFrame column is not what we expect. So we may need to cast the values into another data type.

pd.to_numeric method:

df['salary'] = pd.to_numeric(df['salary'], errors='coerce')
# The errors argument determines how to handle data that fails casting

astype method:

df['name'] = df['name'].astype(str) # casts to string

Cleaning data

If your data in a column is dirty and you need to clean it up, you can use data type-related methods to mutate the values.

# This example removes trailing whitespace
df['name'] = df['name'].str.strip()

If, on the other hand, you have missing values in a column, you can either drop those rows...

df.dropna(inplace = True)

or you can fill in those missing values with some default.

df.fillna('N/A')

Joining data

Given two DataFrames, you can join data together based on some shared key:

pd.merge(left=df1, right=df2, how="inner", left_on="key1", right_on="key2")

In the example above, we inner join (only keeps data where match is found) df2 into df1 using a shared key. That means the data in df2 gets merged into df1.

Filtering data

You can get a slice of a DataFrame's rows:

second_100_rows = df[100:200]
last_row = df[-1 : len(df)-11]
last_row_again = df.iloc[-1]

You can also filter a DataFrame using boolean indexing:

# Creates a Series containing True/False values for each row
# to determine which passes the logic check
conditionA = df["age"] >= 30
conditionB = df["city"].isin(["Toronto", "Edmonton"])

# Returns only those rows where the row was True in the Series
df[conditionA]
df[conditionB]

# Setting multiple conditions
cond1 = df["name"] = "Dan"
cond2 = df["city"] = "Toronto"
df[conditionA | conditionB] # "|" represents bitwise OR
df[conditionA & conditionB] # "&" represents bitwise AND

Computing data

A column gives you access to the following methods:

df["income"].sum() # returns total income
df["rating"].mean() # returns average rating
df["city"].unique() # filters out duplicates
df["city"].value_counts() # returns number of rows for each city
df["city"].nunique() # returns number of unique values?

Updating/creating data

You can mutate a DataFrame by adding/updating columns:

df["new_column"] = <some_value>

df["concat_column"] = df["col1"].str.concat(df["col2"], sep=",")

def add_two(x):
  return x + 2
df["amount"] = df["amount"].apply(add_two)
PreviousLoading Querying And Filtering Data Using The Csv ModuleNextSummarizing And Visualizing Data

Last updated 3 years ago