JSON Data Processing
The Standard Format for Modern Web Data
What is JSON?
JSON (JavaScript Object Notation) is a lightweight data interchange format.
Features:
- Human-readable
- Easy for machines to parse
- Widely used for APIs and web data
Structure similar to Python dictionaries:
json
{
"name": "Alice",
"age": 25,
"major": "Economics"
}Basic Operations
1. Import Module
python
import json2. Python Object → JSON String
python
import json
# Python dictionary
data = {
'respondent_id': 1001,
'age': 30,
'income': 75000,
'interests': ['economics', 'data science']
}
# Convert to JSON string
json_str = json.dumps(data, indent=2, ensure_ascii=False)
print(json_str)Output:
json
{
"respondent_id": 1001,
"age": 30,
"income": 75000,
"interests": [
"economics",
"data science"
]
}3. JSON String → Python Object
python
json_str = '{"name": "Alice", "age": 25, "income": 50000}'
data = json.loads(json_str)
print(data['name']) # Alice
print(data['age']) # 254. Reading and Writing JSON Files
python
import json
# Write to file
data = {'id': 1001, 'age': 30, 'income': 75000}
with open('respondent.json', 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
# Read from file
with open('respondent.json', 'r', encoding='utf-8') as f:
data = json.load(f)
print(data)Practical Cases
Case 1: Save Survey Data
python
import json
survey_data = {
'survey_name': '2024 Income Survey',
'start_date': '2024-01-01',
'end_date': '2024-12-31',
'responses': [
{'id': 1001, 'age': 30, 'income': 75000},
{'id': 1002, 'age': 35, 'income': 85000},
{'id': 1003, 'age': 28, 'income': 65000}
],
'metadata': {
'region': 'National',
'sample_size': 1000,
'response_rate': 0.85
}
}
# Save
with open('survey_2024.json', 'w', encoding='utf-8') as f:
json.dump(survey_data, f, indent=2, ensure_ascii=False)Case 2: Fetch JSON Data from API
python
import requests
import json
# Fetch data (example API)
response = requests.get('https://api.example.com/data')
# Parse JSON
data = response.json() # Equivalent to json.loads(response.text)
# Process data
for item in data['results']:
print(f"{item['name']}: {item['value']}")Case 3: JSON and Pandas Interconversion
python
import pandas as pd
import json
# Pandas → JSON
df = pd.DataFrame({
'id': [1, 2, 3],
'age': [25, 30, 35],
'income': [50000, 75000, 85000]
})
# Method 1: Convert to JSON string
json_str = df.to_json(orient='records', force_ascii=False)
# Method 2: Save directly to file
df.to_json('data.json', orient='records', indent=2, force_ascii=False)
# JSON → Pandas
df = pd.read_json('data.json')
print(df)orient parameter:
python
df.to_json(orient='records') # [{'col1': val1, 'col2': val2}, ...]
df.to_json(orient='index') # {'0': {'col1': val1}, '1': {...}}
df.to_json(orient='columns') # {'col1': {'0': val1, '1': val2}, ...}Complex JSON Processing
Nested JSON
python
import json
# Complex nested structure
data = {
'survey': {
'name': 'Income Survey',
'metadata': {
'year': 2024,
'region': 'Beijing'
}
},
'respondents': [
{
'id': 1001,
'demographics': {
'age': 30,
'gender': 'Male'
},
'responses': {
'income': 75000,
'satisfaction': 4
}
}
]
}
# Access nested data
print(data['survey']['metadata']['year']) # 2024
print(data['respondents'][0]['demographics']['age']) # 30JSON Lines Format (One JSON per Line)
python
import json
# Write JSONL
respondents = [
{'id': 1001, 'age': 30},
{'id': 1002, 'age': 35},
{'id': 1003, 'age': 28}
]
with open('data.jsonl', 'w', encoding='utf-8') as f:
for resp in respondents:
f.write(json.dumps(resp, ensure_ascii=False) + '\n')
# Read JSONL
data = []
with open('data.jsonl', 'r', encoding='utf-8') as f:
for line in f:
data.append(json.loads(line))
print(f"Read {len(data)} records")Best Practices
1. Handle Chinese Text
python
# Preserve Chinese characters
json.dumps(data, ensure_ascii=False)
# Chinese becomes \uXXXX
json.dumps(data, ensure_ascii=True)2. Pretty-print Output
python
# Format with 2-space indentation
json.dumps(data, indent=2, ensure_ascii=False)3. Handle Non-serializable Objects
python
from datetime import datetime
import json
# Date objects cannot be serialized directly
data = {'date': datetime.now()}
# json.dumps(data) # TypeError
# Custom serialization
def json_serializer(obj):
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(f"Type {type(obj)} not serializable")
json_str = json.dumps(data, default=json_serializer)JSON vs CSV
| Feature | JSON | CSV |
|---|---|---|
| Structure | Nested structure | Flat table |
| Readability | Good | Better |
| File size | Larger | Smaller |
| Use cases | API, configuration | Tabular data |
Practice Exercises
python
# Exercise 1: Configuration File
# Create a configuration JSON file containing:
# - database: {host, port, username}
# - analysis: {min_age, max_age, sample_size}
# Save and read
# Exercise 2: Data Conversion
# Read CSV file
# Convert to JSON format (one object per row)
# Save as both .json and .jsonl formatsModule 7 Summary
You have now mastered:
- Text file reading and writing
- CSV/Excel processing
- Stata file reading and writing
- JSON data processing
Next module: Exception handling
Keep going!