Skip to content

Variables and Data Types

From Stata/R to Python — Understanding the Essence of Variables


What are Variables?

In all programming languages, variables are "containers" for storing data.

Comparative Understanding

ConceptStataRPython
Create variablegen age = 25age <- 25 or age = 25age = 25
Variable typeAuto-inferredAuto-inferredAuto-inferred
View variabledisplay ageprint(age)print(age)

Basic Variable Usage in Python

1. Creating Variables (Assignment)

python
# Create variables
age = 25
name = "Alice"
income = 50000.5
is_student = True

# Print variables
print(age)        # Output: 25
print(name)       # Output: Alice
print(income)     # Output: 50000.5
print(is_student) # Output: True

Python Rules:

  • Variable names can contain letters, numbers, underscores
  • Variable names must start with a letter or underscore
  • Variable names cannot contain spaces or special characters (except underscores)
  • Variable names cannot be Python reserved words (like if, for, class)

Good Variable Names:

python
student_age = 25
avg_income = 50000
is_employed = True
gdp_growth_rate = 0.03

Bad Variable Names:

python
a = 25              # Too short, unclear meaning
StudentAge = 25     # Python convention uses lowercase + underscores
student age = 25    # Syntax error: contains space
2020_data = 100     # Syntax error: starts with number

2. Variable Reassignment

python
# Initial value
income = 50000
print(income)  # 50000

# Modify value
income = 60000
print(income)  # 60000

# Modify based on old value
income = income + 5000
print(income)  # 65000

# Shorthand syntax
income += 5000  # Equivalent to income = income + 5000
print(income)  # 70000

Stata Comparison:

stata
* Stata
gen income = 50000
replace income = 60000
replace income = income + 5000

Basic Python Data Types

1. Numeric Types

(1) Integers (int)

python
age = 25
population = 1400000000
year = 2024

print(type(age))  # <class 'int'>

(2) Floats (float)

python
gpa = 3.85
income = 50000.5
interest_rate = 0.05

print(type(gpa))  # <class 'float'>

(3) Numeric Operations

python
# Basic operations
x = 10
y = 3

print(x + y)   # 13  (addition)
print(x - y)   # 7   (subtraction)
print(x * y)   # 30  (multiplication)
print(x / y)   # 3.333... (division, result is float)
print(x // y)  # 3   (floor division)
print(x % y)   # 1   (modulus)
print(x ** y)  # 1000 (exponentiation)

Stata/R Comparison:

stata
* Stata
gen x = 10
gen y = 3
gen sum = x + y
gen division = x / y
gen power = x^y
r
# R
x <- 10
y <- 3
sum <- x + y
division <- x / y
power <- x^y

2. Strings (String)

python
# Create strings (single or double quotes work)
name = "Alice"
major = 'Economics'
country = "China"

# String concatenation
full_name = "Alice" + " " + "Smith"
print(full_name)  # Alice Smith

# String repetition
laugh = "ha" * 3
print(laugh)  # hahaha

# String length
print(len("Hello"))  # 5

# String methods
text = "hello world"
print(text.upper())       # HELLO WORLD
print(text.capitalize())  # Hello world
print(text.replace("world", "Python"))  # hello Python

Practical Example (Social Science Context):

python
# Create variable label
variable_label = "respondent_age_in_years"
cleaned_label = variable_label.replace("_", " ").title()
print(cleaned_label)  # Respondent Age In Years

# Extract country code
country_code = "USA_2020_data"
code = country_code[:3]  # USA (slicing)
year = country_code[4:8]  # 2020
print(f"Country: {code}, Year: {year}")

3. Booleans (Boolean)

python
# Booleans have only two values: True and False
is_student = True
has_degree = False
is_employed = True

print(type(is_student))  # <class 'bool'>

# Boolean operations
print(True and False)  # False (and)
print(True or False)   # True  (or)
print(not True)        # False (not)

# Comparison operations (return boolean)
age = 25
print(age > 18)        # True
print(age == 25)       # True (note: two equal signs)
print(age != 30)       # True (not equal)

Social Science Example:

python
age = 25
income = 50000

# Check survey eligibility
is_eligible = (age >= 18) and (age <= 65) and (income > 0)
print(f"Eligible for survey: {is_eligible}")  # Eligible for survey: True

# Check if high income
is_high_income = income > 100000
print(f"High income: {is_high_income}")  # High income: False

4. None (Null Value)

python
# None represents "no value" (similar to Stata's . or R's NA)
missing_data = None
print(missing_data)  # None
print(type(missing_data))  # <class 'NoneType'>

# Check if None
if missing_data is None:
    print("Data is missing")

Type Conversion

python
# String to number
age_str = "25"
age_int = int(age_str)
print(age_int + 5)  # 30

income_str = "50000.5"
income_float = float(income_str)
print(income_float)  # 50000.5

# Number to string
age = 25
age_str = str(age)
print("Age: " + age_str)  # Age: 25

# Other types to boolean
print(bool(0))      # False
print(bool(1))      # True
print(bool(""))     # False (empty string)
print(bool("Hi"))   # True
print(bool(None))   # False

Common Error:

python
# Error example
age = "25"
result = age + 5  # TypeError: can only concatenate str to str

# Correct approach
age = int("25")
result = age + 5  # 30

Practical Example: Social Science Data Processing

Example 1: Calculate BMI

python
# Respondent information
name = "Alice"
height_cm = 170  # centimeters
weight_kg = 65   # kilograms

# Calculate BMI
height_m = height_cm / 100
bmi = weight_kg / (height_m ** 2)

print(f"{name}'s BMI: {bmi:.2f}")  # Alice's BMI: 22.49

# Determine weight status
if bmi < 18.5:
    status = "Underweight"
elif bmi < 25:
    status = "Normal"
elif bmi < 30:
    status = "Overweight"
else:
    status = "Obese"

print(f"Weight status: {status}")  # Weight status: Normal

Example 2: Income Grouping

python
# Respondent income
respondent_id = 1001
annual_income = 75000  # annual income (USD)

# Income quartiles
if annual_income < 30000:
    income_quartile = "Q1 (Low income)"
elif annual_income < 60000:
    income_quartile = "Q2 (Lower-middle income)"
elif annual_income < 100000:
    income_quartile = "Q3 (Upper-middle income)"
else:
    income_quartile = "Q4 (High income)"

print(f"Respondent {respondent_id}: {income_quartile}")
# Output: Respondent 1001: Q3 (Upper-middle income)

Example 3: Education Years Encoding

python
# Education level (text)
education_level = "Bachelor's Degree"

# Convert to education years (numeric)
education_mapping = {
    "High School": 12,
    "Associate Degree": 14,
    "Bachelor's Degree": 16,
    "Master's Degree": 18,
    "Doctoral Degree": 22
}

years_of_education = education_mapping.get(education_level, 0)
print(f"Years of education: {years_of_education} years")  # Years of education: 16 years

Stata vs R vs Python Data Type Comparison

Python TypeStata TypeR TypeExample
intnumeric (integer)integerage = 25
floatnumeric (decimal)numericgpa = 3.85
strstring (str#)charactername = "Alice"
boolnumeric (0/1)logicalis_student = True
None. (missing)NAmissing = None

Note:

  • Stata doesn't have a true boolean type (uses 0/1 instead)
  • Python's None ≈ R's NA ≈ Stata's .

Formatted Output (f-string)

Python 3.6+ supports f-strings, which are very convenient:

python
name = "Alice"
age = 25
gpa = 3.856

# Basic usage
print(f"Name: {name}, Age: {age}")
# Output: Name: Alice, Age: 25

# Format numbers
print(f"GPA: {gpa:.2f}")  # 2 decimal places
# Output: GPA: 3.86

# Format large numbers
income = 1234567.89
print(f"Income: ${income:,.2f}")
# Output: Income: $1,234,567.89

# Percentages
growth_rate = 0.0523
print(f"Growth rate: {growth_rate:.2%}")
# Output: Growth rate: 5.23%

Practical Template:

python
# Generate survey report
respondent_id = 1001
age = 35
gender = "Male"
income = 85000
education = "Master's"

report = f"""
Respondent Report
====================
ID: {respondent_id}
Age: {age} years
Gender: {gender}
Income: ${income:,}
Education: {education}
====================
"""
print(report)

Common Errors

Error 1: Variable Name Typo

python
age = 25
print(aeg)  # NameError: name 'aeg' is not defined

Error 2: Mixing Strings and Numbers

python
result = "25" + 5  # TypeError
result = int("25") + 5  # 30

Error 3: Confusing = and ==

python
age = 25       # Assignment
if age = 25:   # SyntaxError
if age == 25:  # Comparison

Practice Exercises

Exercise 1: Basic Variable Operations

Create the following variables and perform calculations:

python
# 1. Create variables
base_salary = 50000
bonus = 8000
tax_rate = 0.25

# 2. Calculate after-tax income
# Hint: after_tax_income = (base_salary + bonus) * (1 - tax_rate)

# 3. Format and output
# Output format: "After-tax income: $X,XXX.XX"

Exercise 2: Type Conversion

python
# Given strings
age_str = "35"
income_str = "75000.50"

# 1. Convert to numbers
# 2. Check if high income (> 100000)
# 3. Check if middle-aged (30-50 years)

Exercise 3: Formatted Output

python
# Given data
country = "China"
gdp_per_capita = 12000.45
population = 1400000000
growth_rate = 0.062

# Output format:
# Country: China
# GDP per capita: $12,000.45
# Population: 1,400,000,000
# Growth rate: 6.20%

Next Steps

In the next section, we'll learn about operators, including arithmetic, comparison, and logical operators, laying the foundation for conditional statements and loops.

Keep going!

Released under the MIT License. Content © Author.