5.1 NumPy Basics
NumPy is the fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays.
import numpy as np
# Creating arrays
a = np.array([1, 2, 3])
b = np.zeros((2, 2))
# Vectorized operations
c = a * 2 # [2, 4, 6]
5.2 Pandas for Data Manipulation
Pandas provides high-performance, easy-to-use data structures and data analysis tools. The DataFrame is the primary data structure.
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
# Filtering
adults = df[df['Age'] > 18]
5.3 Data Visualization with Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title("Simple Plot")
plt.show()
5.4 Exploratory Data Analysis (EDA)
EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.
🎯 Practical Exercise
Load a CSV file using Pandas, clean missing values, and create a simple bar chart visualizing the distribution of a categorical column.