NumPy Basics

Efficient Numerical Computing Library — The Foundation of Pandas

What is NumPy?

NumPy (Numerical Python) is the foundation library for scientific computing in Python.

Why learn NumPy?

Pandas is built on NumPy
10-100 times faster than Python lists
Matrix operations (linear algebra)

Analogies:

Stata: Built-in matrix functionality
R: Vector operations
Python: NumPy arrays

Installation and Import

bash

pip install numpy

python

import numpy as np  # Standard alias

Creating Arrays

1. Creating from Lists

python

import numpy as np

# 1D array
ages = np.array([25, 30, 35, 40])
print(ages)  # [25 30 35 40]
print(type(ages))  # <class 'numpy.ndarray'>

# 2D array (matrix)
data = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print(data)
# [[1 2 3]
#  [4 5 6]]

2. Special Arrays

python

# All zeros
zeros = np.zeros(5)  # [0. 0. 0. 0. 0.]

# All ones
ones = np.ones((2, 3))  # 2x3 matrix of ones

# Sequence
seq = np.arange(0, 10, 2)  # [0 2 4 6 8]

# Linear spacing
lin = np.linspace(0, 1, 5)  # [0.   0.25 0.5  0.75 1.  ]

# Random numbers
rand = np.random.rand(5)  # 5 random numbers between 0-1
randn = np.random.randn(5)  # 5 standard normal random numbers

Array Attributes

python

data = np.array([[1, 2, 3], [4, 5, 6]])

print(data.shape)   # (2, 3) - shape
print(data.ndim)    # 2 - dimensions
print(data.size)    # 6 - total elements
print(data.dtype)   # int64 - data type

Array Operations

Vectorized Operations (Faster than Loops)

python

incomes = np.array([50000, 60000, 75000, 80000])

# Vectorized (fast)
after_tax = incomes * 0.75
log_incomes = np.log(incomes)

# Loop (slow)
after_tax = []
for income in incomes:
    after_tax.append(income * 0.75)

Basic Operations

python

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print(a + b)   # [11 22 33 44]
print(a * b)   # [10 40 90 160]
print(a ** 2)  # [1 4 9 16]
print(a > 2)   # [False False  True  True]

Statistical Functions

python

scores = np.array([85, 92, 78, 90, 88])

print(scores.mean())   # 86.6 - mean
print(scores.std())    # 5.08 - standard deviation
print(scores.min())    # 78 - minimum
print(scores.max())    # 92 - maximum
print(scores.sum())    # 433 - sum

Practical Examples

Example 1: Standardizing Data

python

# Raw income data
incomes = np.array([50000, 60000, 75000, 80000, 95000])

# Z-score standardization
mean = incomes.mean()
std = incomes.std()
incomes_std = (incomes - mean) / std

print(f"Standardized: {incomes_std}")
# Standardized: [-1.38 -0.74  0.21  0.53  1.38]

Example 2: Batch Calculations

python

# Income of 100 respondents
np.random.seed(42)
incomes = np.random.normal(70000, 15000, 100)  # Mean 70k, std 15k

# Calculate after-tax income (25% tax rate)
after_tax = incomes * 0.75

# Statistics
print(f"Average after-tax income: ${after_tax.mean():,.0f}")
print(f"Median: ${np.median(after_tax):,.0f}")
print(f"Standard deviation: ${after_tax.std():,.0f}")

Example 3: Conditional Filtering

python

ages = np.array([22, 35, 45, 28, 55, 30, 48])

# Filter ages 30-50
mask = (ages >= 30) & (ages <= 50)
middle_aged = ages[mask]
print(middle_aged)  # [35 45 30 48]

# Statistics
print(f"Ages 30-50: {len(middle_aged)} people")
print(f"Percentage: {len(middle_aged)/len(ages)*100:.1f}%")

NumPy vs Python Lists

python

import time

# Create large array
size = 1000000
py_list = list(range(size))
np_array = np.arange(size)

# Python list (slow)
start = time.time()
result = [x * 2 for x in py_list]
print(f"List: {time.time() - start:.4f}s")

# NumPy array (fast)
start = time.time()
result = np_array * 2
print(f"NumPy: {time.time() - start:.4f}s")

# NumPy is typically 10-100 times faster!

Common Functions

Mathematical Functions

python

x = np.array([1, 4, 9, 16, 25])

np.sqrt(x)    # Square root
np.log(x)     # Natural logarithm
np.log10(x)   # Base-10 logarithm
np.exp(x)     # e^x
np.abs(x)     # Absolute value

Aggregation Functions

python

data = np.array([85, 92, 78, 90, 88, 76, 95])

np.mean(data)      # Mean
np.median(data)    # Median
np.std(data)       # Standard deviation
np.var(data)       # Variance
np.min(data)       # Minimum
np.max(data)       # Maximum
np.percentile(data, 25)  # 25th percentile

NumPy in Data Analysis

Application 1: Correlation Coefficient

python

# Age and income
ages = np.array([25, 30, 35, 40, 45, 50])
incomes = np.array([50000, 60000, 75000, 80000, 90000, 95000])

# Calculate correlation coefficient
correlation = np.corrcoef(ages, incomes)[0, 1]
print(f"Correlation: {correlation:.3f}")  # 0.995 (strong positive)

Application 2: Group Statistics

python

# Income by gender
male_incomes = np.array([60000, 75000, 80000, 90000])
female_incomes = np.array([55000, 70000, 72000, 85000])

print(f"Male average: ${male_incomes.mean():,.0f}")
print(f"Female average: ${female_incomes.mean():,.0f}")
print(f"Gender gap: ${male_incomes.mean() - female_incomes.mean():,.0f}")

Practice Exercises

python

# Exercise 1: Creating and manipulating arrays
# Create an array from 1-100, filter even numbers, calculate sum of squares

# Exercise 2: Statistical analysis
ages = np.array([22, 25, 28, 30, 35, 40, 45, 50, 55, 60])
# Calculate: mean, median, standard deviation, 25th and 75th percentiles

# Exercise 3: Data standardization
scores = np.array([65, 72, 85, 90, 78, 88, 92, 70, 95, 82])
# Perform Min-Max standardization to 0-1 range

Next Steps

NumPy is the foundation. In the next section, we'll learn Pandas (a data analysis library built on NumPy), which is the core tool for social science data analysis!

Keep going!

NumPy Basics ​

What is NumPy? ​

Installation and Import ​

Creating Arrays ​

1. Creating from Lists ​

2. Special Arrays ​

Array Attributes ​

Array Operations ​

Vectorized Operations (Faster than Loops) ​

Basic Operations ​

Statistical Functions ​

Practical Examples ​

Example 1: Standardizing Data ​

Example 2: Batch Calculations ​

Example 3: Conditional Filtering ​

NumPy vs Python Lists ​

Common Functions ​

Mathematical Functions ​

Aggregation Functions ​

NumPy in Data Analysis ​

Application 1: Correlation Coefficient ​

Application 2: Group Statistics ​

Practice Exercises ​

Next Steps ​

NumPy Basics

What is NumPy?

Installation and Import

Creating Arrays

1. Creating from Lists

2. Special Arrays

Array Attributes

Array Operations

Vectorized Operations (Faster than Loops)

Basic Operations

Statistical Functions

Practical Examples

Example 1: Standardizing Data

Example 2: Batch Calculations

Example 3: Conditional Filtering

NumPy vs Python Lists

Common Functions

Mathematical Functions

Aggregation Functions

NumPy in Data Analysis

Application 1: Correlation Coefficient

Application 2: Group Statistics

Practice Exercises

Next Steps