Variables and Data Types
From Stata/R to Python — Understanding the Essence of Variables
What are Variables?
In all programming languages, variables are "containers" for storing data.
Comparative Understanding
| Concept | Stata | R | Python |
|---|---|---|---|
| Create variable | gen age = 25 | age <- 25 or age = 25 | age = 25 |
| Variable type | Auto-inferred | Auto-inferred | Auto-inferred |
| View variable | display age | print(age) | print(age) |
Basic Variable Usage in Python
1. Creating Variables (Assignment)
python
# Create variables
age = 25
name = "Alice"
income = 50000.5
is_student = True
# Print variables
print(age) # Output: 25
print(name) # Output: Alice
print(income) # Output: 50000.5
print(is_student) # Output: TruePython Rules:
- Variable names can contain letters, numbers, underscores
- Variable names must start with a letter or underscore
- Variable names cannot contain spaces or special characters (except underscores)
- Variable names cannot be Python reserved words (like
if,for,class)
Good Variable Names:
python
student_age = 25
avg_income = 50000
is_employed = True
gdp_growth_rate = 0.03Bad Variable Names:
python
a = 25 # Too short, unclear meaning
StudentAge = 25 # Python convention uses lowercase + underscores
student age = 25 # Syntax error: contains space
2020_data = 100 # Syntax error: starts with number2. Variable Reassignment
python
# Initial value
income = 50000
print(income) # 50000
# Modify value
income = 60000
print(income) # 60000
# Modify based on old value
income = income + 5000
print(income) # 65000
# Shorthand syntax
income += 5000 # Equivalent to income = income + 5000
print(income) # 70000Stata Comparison:
stata
* Stata
gen income = 50000
replace income = 60000
replace income = income + 5000Basic Python Data Types
1. Numeric Types
(1) Integers (int)
python
age = 25
population = 1400000000
year = 2024
print(type(age)) # <class 'int'>(2) Floats (float)
python
gpa = 3.85
income = 50000.5
interest_rate = 0.05
print(type(gpa)) # <class 'float'>(3) Numeric Operations
python
# Basic operations
x = 10
y = 3
print(x + y) # 13 (addition)
print(x - y) # 7 (subtraction)
print(x * y) # 30 (multiplication)
print(x / y) # 3.333... (division, result is float)
print(x // y) # 3 (floor division)
print(x % y) # 1 (modulus)
print(x ** y) # 1000 (exponentiation)Stata/R Comparison:
stata
* Stata
gen x = 10
gen y = 3
gen sum = x + y
gen division = x / y
gen power = x^yr
# R
x <- 10
y <- 3
sum <- x + y
division <- x / y
power <- x^y2. Strings (String)
python
# Create strings (single or double quotes work)
name = "Alice"
major = 'Economics'
country = "China"
# String concatenation
full_name = "Alice" + " " + "Smith"
print(full_name) # Alice Smith
# String repetition
laugh = "ha" * 3
print(laugh) # hahaha
# String length
print(len("Hello")) # 5
# String methods
text = "hello world"
print(text.upper()) # HELLO WORLD
print(text.capitalize()) # Hello world
print(text.replace("world", "Python")) # hello PythonPractical Example (Social Science Context):
python
# Create variable label
variable_label = "respondent_age_in_years"
cleaned_label = variable_label.replace("_", " ").title()
print(cleaned_label) # Respondent Age In Years
# Extract country code
country_code = "USA_2020_data"
code = country_code[:3] # USA (slicing)
year = country_code[4:8] # 2020
print(f"Country: {code}, Year: {year}")3. Booleans (Boolean)
python
# Booleans have only two values: True and False
is_student = True
has_degree = False
is_employed = True
print(type(is_student)) # <class 'bool'>
# Boolean operations
print(True and False) # False (and)
print(True or False) # True (or)
print(not True) # False (not)
# Comparison operations (return boolean)
age = 25
print(age > 18) # True
print(age == 25) # True (note: two equal signs)
print(age != 30) # True (not equal)Social Science Example:
python
age = 25
income = 50000
# Check survey eligibility
is_eligible = (age >= 18) and (age <= 65) and (income > 0)
print(f"Eligible for survey: {is_eligible}") # Eligible for survey: True
# Check if high income
is_high_income = income > 100000
print(f"High income: {is_high_income}") # High income: False4. None (Null Value)
python
# None represents "no value" (similar to Stata's . or R's NA)
missing_data = None
print(missing_data) # None
print(type(missing_data)) # <class 'NoneType'>
# Check if None
if missing_data is None:
print("Data is missing")Type Conversion
python
# String to number
age_str = "25"
age_int = int(age_str)
print(age_int + 5) # 30
income_str = "50000.5"
income_float = float(income_str)
print(income_float) # 50000.5
# Number to string
age = 25
age_str = str(age)
print("Age: " + age_str) # Age: 25
# Other types to boolean
print(bool(0)) # False
print(bool(1)) # True
print(bool("")) # False (empty string)
print(bool("Hi")) # True
print(bool(None)) # FalseCommon Error:
python
# Error example
age = "25"
result = age + 5 # TypeError: can only concatenate str to str
# Correct approach
age = int("25")
result = age + 5 # 30Practical Example: Social Science Data Processing
Example 1: Calculate BMI
python
# Respondent information
name = "Alice"
height_cm = 170 # centimeters
weight_kg = 65 # kilograms
# Calculate BMI
height_m = height_cm / 100
bmi = weight_kg / (height_m ** 2)
print(f"{name}'s BMI: {bmi:.2f}") # Alice's BMI: 22.49
# Determine weight status
if bmi < 18.5:
status = "Underweight"
elif bmi < 25:
status = "Normal"
elif bmi < 30:
status = "Overweight"
else:
status = "Obese"
print(f"Weight status: {status}") # Weight status: NormalExample 2: Income Grouping
python
# Respondent income
respondent_id = 1001
annual_income = 75000 # annual income (USD)
# Income quartiles
if annual_income < 30000:
income_quartile = "Q1 (Low income)"
elif annual_income < 60000:
income_quartile = "Q2 (Lower-middle income)"
elif annual_income < 100000:
income_quartile = "Q3 (Upper-middle income)"
else:
income_quartile = "Q4 (High income)"
print(f"Respondent {respondent_id}: {income_quartile}")
# Output: Respondent 1001: Q3 (Upper-middle income)Example 3: Education Years Encoding
python
# Education level (text)
education_level = "Bachelor's Degree"
# Convert to education years (numeric)
education_mapping = {
"High School": 12,
"Associate Degree": 14,
"Bachelor's Degree": 16,
"Master's Degree": 18,
"Doctoral Degree": 22
}
years_of_education = education_mapping.get(education_level, 0)
print(f"Years of education: {years_of_education} years") # Years of education: 16 yearsStata vs R vs Python Data Type Comparison
| Python Type | Stata Type | R Type | Example |
|---|---|---|---|
int | numeric (integer) | integer | age = 25 |
float | numeric (decimal) | numeric | gpa = 3.85 |
str | string (str#) | character | name = "Alice" |
bool | numeric (0/1) | logical | is_student = True |
None | . (missing) | NA | missing = None |
Note:
- Stata doesn't have a true boolean type (uses 0/1 instead)
- Python's
None≈ R'sNA≈ Stata's.
Formatted Output (f-string)
Python 3.6+ supports f-strings, which are very convenient:
python
name = "Alice"
age = 25
gpa = 3.856
# Basic usage
print(f"Name: {name}, Age: {age}")
# Output: Name: Alice, Age: 25
# Format numbers
print(f"GPA: {gpa:.2f}") # 2 decimal places
# Output: GPA: 3.86
# Format large numbers
income = 1234567.89
print(f"Income: ${income:,.2f}")
# Output: Income: $1,234,567.89
# Percentages
growth_rate = 0.0523
print(f"Growth rate: {growth_rate:.2%}")
# Output: Growth rate: 5.23%Practical Template:
python
# Generate survey report
respondent_id = 1001
age = 35
gender = "Male"
income = 85000
education = "Master's"
report = f"""
Respondent Report
====================
ID: {respondent_id}
Age: {age} years
Gender: {gender}
Income: ${income:,}
Education: {education}
====================
"""
print(report)Common Errors
Error 1: Variable Name Typo
python
age = 25
print(aeg) # NameError: name 'aeg' is not definedError 2: Mixing Strings and Numbers
python
result = "25" + 5 # TypeError
result = int("25") + 5 # 30Error 3: Confusing = and ==
python
age = 25 # Assignment
if age = 25: # SyntaxError
if age == 25: # ComparisonPractice Exercises
Exercise 1: Basic Variable Operations
Create the following variables and perform calculations:
python
# 1. Create variables
base_salary = 50000
bonus = 8000
tax_rate = 0.25
# 2. Calculate after-tax income
# Hint: after_tax_income = (base_salary + bonus) * (1 - tax_rate)
# 3. Format and output
# Output format: "After-tax income: $X,XXX.XX"Exercise 2: Type Conversion
python
# Given strings
age_str = "35"
income_str = "75000.50"
# 1. Convert to numbers
# 2. Check if high income (> 100000)
# 3. Check if middle-aged (30-50 years)Exercise 3: Formatted Output
python
# Given data
country = "China"
gdp_per_capita = 12000.45
population = 1400000000
growth_rate = 0.062
# Output format:
# Country: China
# GDP per capita: $12,000.45
# Population: 1,400,000,000
# Growth rate: 6.20%Next Steps
In the next section, we'll learn about operators, including arithmetic, comparison, and logical operators, laying the foundation for conditional statements and loops.
Keep going!