11.1 Best Practices and Professional Tools

From Writing Code to Writing Good Code — Code Standards, Debugging, Version Control

Module Overview

Writing code is easy; writing good code is hard. This module teaches essential skills for professional developers: code standards (PEP 8), efficient debugging, performance optimization, and Git version control. These skills will make your code more readable, maintainable, and professional—they're also foundational for team collaboration and open-source contributions.

Important Note: This module focuses on engineering practices. Students doing solo data analysis may learn selectively. However, if you plan to collaborate with others, publish code, or pursue code quality, this module is essential.

Learning Objectives

After completing this module, you will be able to:

Follow Python code standards (PEP 8)
Write readable, maintainable code
Use advanced debugging techniques
Perform code profiling and optimization
Use Git for version control
Manage projects on GitHub
Collaborate with others on development

Module Contents

01 - Python Code Style

Core Question: What makes code "good code"?

Core Content:

PEP 8: Python's Official Style Guide

Naming conventions:

python

# Good naming
student_age = 25
total_income = 50000
calculate_mean()
class StudentRecord:
    pass

# Poor naming
s_age = 25          # Too brief
totalIncome = 50    # camelCase (not Python style)
CalculateMean()     # Functions don't use PascalCase

Indentation and spacing: 4 spaces, spaces around operators
Line length: 79 characters maximum
Import order: standard library → third-party → local modules

Docstrings:

python

def calculate_bmi(weight, height):
    """Calculate BMI index

    Parameters:
        weight (float): Weight in kilograms
        height (float): Height in meters

    Returns:
        float: BMI value

    Examples:
        >>> calculate_bmi(70, 1.75)
        22.86
    """
    return weight / (height ** 2)

Comment Best Practices:
- Explain "why" not "what"
- Comment complex logic
- Avoid obvious comments
Code Formatting Tools:
- Black: automatic code formatting
- autopep8: automatic PEP 8 compliance
- isort: automatic import organization

Why It Matters?

Improves readability: you'll thank yourself in 6 months
Facilitates collaboration: unified style reduces friction
Professional image: demonstrates code literacy

Practical Comparison:

python

# Poor code
def f(x,y):
    if x>0:
        return x*y
    else:return 0

# Good code
def calculate_product(x, y):
    """Calculate product of two numbers (only when x is positive)"""
    if x > 0:
        return x * y
    else:
        return 0

02 - Debugging and Profiling

Core Question: How to make code faster and more stable?

Core Content:

Advanced Debugging Techniques:

Breakpoint debugging (IDE integration)
Conditional breakpoints: pause only under specific conditions
Watch variables: view values in real-time

Debugging Pandas operations:

python

# Chain operation debugging
df_clean = (df
    .pipe(lambda x: print(f"Original: {len(x)} rows") or x)
    .dropna()
    .pipe(lambda x: print(f"After dropna: {len(x)} rows") or x)
    .query('age >= 18')
    .pipe(lambda x: print(f"After filter: {len(x)} rows") or x)
)

Performance Profiling:

python

# Measure execution time
import time

start = time.time()
# Your code
end = time.time()
print(f"Elapsed: {end - start:.2f} seconds")

# Jupyter magic commands
%timeit df.apply(lambda x: x ** 2)
%prun slow_function()  # Detailed profiling

Performance Optimization Techniques:

Vectorization vs loops:

python

# Slow (loop)
for i in range(len(df)):
    df.loc[i, 'squared'] = df.loc[i, 'value'] ** 2

# Fast (vectorized)
df['squared'] = df['value'] ** 2

Use NumPy functions
Avoid repeated calculations
Use .values to convert to NumPy arrays (faster)

Memory Optimization:
- Choose appropriate data types (int32 vs int64)
- Read large files in chunks (chunksize)
- Drop unnecessary columns

Logging:

python

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info("Starting data cleaning")
logger.warning("Found 100 missing values")
logger.error("File not found")

Performance Comparison:

python

# Slow method (loop): 10 seconds
result = []
for x in data:
    result.append(x ** 2)

# Fast method (NumPy): 0.01 seconds
result = np.array(data) ** 2

03 - Git Basics

Core Question: How to manage code versions and collaborate?

Core Content:

What is Git?
- Version control system: records every code change
- Collaboration tool: multiple developers simultaneously
- GitHub: code hosting platform

Basic Workflow:

bash

# Initialize repository
git init

# Check status
git status

# Add files to staging area
git add script.py
git add .  # Add all files

# Commit
git commit -m "Add data cleaning script"

# View history
git log

Branch Management:

bash

# Create and switch to new branch
git checkout -b feature-analysis

# Switch branches
git checkout main

# Merge branches
git merge feature-analysis

Remote Repository (GitHub):

bash

# Add remote repository
git remote add origin https://github.com/username/repo.git

# Push to remote
git push -u origin main

# Pull updates
git pull

# Clone repository
git clone https://github.com/username/repo.git

Collaboration Workflow:
1. Fork project to your account
2. Clone to local machine
3. Create branch and make changes
4. Commit and push
5. Create Pull Request

Ignore Files (.gitignore):

# Python
__pycache__/
*.pyc
.ipynb_checkpoints/

# Data files
*.csv
*.dta
data/

# Environment
venv/
.env

Why It Matters?

Never lose code again
Rollback to any historical version
Essential for team collaboration
Showcase your projects (academic GitHub)

Practical Scenario:

bash

# Scenario: Made changes but found errors, want to revert

# View history
git log --oneline
# a1b2c3d Add regression analysis
# e4f5g6h Complete data cleaning
# i7j8k9l Initial commit

# Rollback to data cleaning version
git checkout e4f5g6h

# To permanently rollback
git reset --hard e4f5g6h

Amateur vs Professional Code

Dimension	Amateur Code	Professional Code
Naming	`x`, `data1`	`student_age`, `clean_survey_df`
Comments	None or excessive	Moderate, explain "why"
Structure	Single file, no functions	Modular, functional
Error Handling	Let program crash	try-except graceful handling
Version Control	None	Git + GitHub
Testing	Manual testing	Automated tests
Documentation	None	README + Docstrings

How to Learn This Module?

Learning Path

Day 1 (2 hours): Code Style

Read 01 - Python Code Style
Install Black formatting tool
Refactor an old script to comply with PEP 8

Day 2 (3 hours): Debugging and Optimization

Read 02 - Debugging and Profiling
Learn profiling tools
Optimize a slow script

Days 3-4 (6 hours): Git Basics

Read 03 - Git Basics
Install and configure Git
Create GitHub account
Upload existing project to GitHub
Practice basic workflow

Total Time: 11 hours (1 week)

Minimal Learning Path

For individual data analysis, priorities are:

Must Learn (basic literacy, 3 hours):

01 - Code Style (naming, comments, docstrings)
Basic debugging techniques

Important (team collaboration, 6 hours):

03 - Git Basics (init, add, commit, push)
GitHub usage

Optional (advanced skills):

Performance optimization
Git branch management
Unit testing

Learning Recommendations

Code Standards are Habits, Not Burdens
- Will feel cumbersome at first
- Use automatic formatting tools (Black)
- You'll thank yourself in 6 months
Git Has a Steep Learning Curve, But Worth It
- First 2 hours are most painful
- Simple once you master basic commands
- 3 most important commands:
  bash
```
git add .
git commit -m "message"
git push
```
  1
  2
  3
Start with Existing Projects
- Don't wait for the "perfect moment"
- Choose an existing script, upload to GitHub
- Learn by doing

Practice Project: Build Academic GitHub

my-research/
├── README.md          # Project description
├── requirements.txt   # Dependencies
├── .gitignore         # Ignore files
├── data/             # Data folder (git ignore)
├── scripts/          # Analysis scripts
│   ├── 01_data_cleaning.py
│   ├── 02_descriptive_stats.py
│   └── 03_regression.py
├── outputs/          # Output results
│   ├── tables/
│   └── figures/
└── notebooks/        # Jupyter notebooks
    └── exploratory_analysis.ipynb

Common Questions

Q: Why follow code standards? My code works! A:

In 6 months, you'll forget the logic
Standard code is like "writing a letter to your future self"
If collaborating with others, standards are foundational

Q: Git is too complex, can I skip it? A:

You can learn just the basics (add, commit, push)
But strongly recommended—benefits are huge:
- Never lose code again
- Rollback to any version
- GitHub is your "academic business card"

Q: Should I upload all code to GitHub? A:

Upload: cleaned scripts, reproducible analyses
Don't upload: raw data (privacy), API keys, unfinished code
Use .gitignore to exclude sensitive files

Q: Is performance optimization important? My code is fast enough. A:

Small data (< 100K rows): not important
Large data (> 1M rows): very important
Repeatedly run code: worth optimizing

Q: How to cite GitHub code in papers? A:

Code and data available at:
https://github.com/username/project-name

Or use Zenodo for DOI:
DOI: 10.5281/zenodo.1234567

Next Steps

After completing this module, you'll have mastered:

Python code standards and best practices
Efficient debugging and performance optimization
Git version control and GitHub usage
Professional developer workflow

Congratulations! You've completed all 11 modules of the Python Fundamentals Tutorial!

Next, you can:

Deepen Pandas and data analysis skills (practice on real projects)
Learn statistical modeling (regression analysis, causal inference)
Explore machine learning and LLM applications
Contribute to open-source projects, enhance skills

From zero to data analyst—you've taken a solid first step! Keep going!

11.1 Best Practices and Professional Tools ​

Module Overview ​

Learning Objectives ​

Module Contents ​

01 - Python Code Style ​

02 - Debugging and Profiling ​

03 - Git Basics ​

Amateur vs Professional Code ​

How to Learn This Module? ​

Learning Path ​

Minimal Learning Path ​

Learning Recommendations ​

Common Questions ​

Next Steps ​

Quick Links ​

11.1 Best Practices and Professional Tools

Module Overview

Learning Objectives

Module Contents

01 - Python Code Style

02 - Debugging and Profiling

03 - Git Basics

Amateur vs Professional Code

How to Learn This Module?

Learning Path

Minimal Learning Path

Learning Recommendations

Common Questions

Next Steps

Quick Links