Online Python Environments
Zero Configuration Start Coding — Best Choices for Running Python in the Cloud
Why Use Online Environments?
Advantages
✅ Zero Configuration: Ready to use in browser ✅ Free Resources: Some platforms provide free GPU/TPU ✅ Anywhere Anytime: Code on any device, from anywhere ✅ Easy Collaboration: Easily share notebooks with colleagues ✅ Consistent Environment: Avoid "works on my machine" problems
Disadvantages
❌ Requires stable internet connection ❌ Some features limited (e.g., large file processing) ❌ Free resources have time limits
Mainstream Online Python Environment Comparison
| Platform | Free GPU | Storage | Collaboration | Use Cases |
|---|---|---|---|---|
| Google Colab | ✅ (Limited) | Google Drive | ✅ | Deep learning, teaching, prototyping |
| Kaggle Notebooks | ✅ (30h/week) | 20GB | ✅ | Data competitions, machine learning |
| This Website | ❌ | Browser | ❌ | Learning Python basics |
| Deepnote | ❌ | 5GB | ✅ | Team collaboration |
| JupyterLite | ❌ | Browser | ❌ | Lightweight learning |
1️⃣ Google Colab (Most Recommended)
Quick Start
- Visit colab.research.google.com
- Sign in with Google account
- Click "New Notebook"
- Start coding!
Basic Operations
Create First Cell
# This is a code cell
print("Hello from Colab!")
import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
dfRun: Click ▶ on left or press Shift + Enter
Upload Data Files
Method 1: Temporary Upload (deleted after session ends)
from google.colab import files
uploaded = files.upload()
# Then read
import pandas as pd
df = pd.read_csv('your_file.csv')Method 2: Read from Google Drive (Recommended)
from google.colab import drive
drive.mount('/content/drive')
# Read file from Drive
df = pd.read_csv('/content/drive/MyDrive/data/survey_data.csv')Colab-Specific Features
1. Use Free GPU
Steps:
- Menu bar:
Runtime→Change runtime type - Hardware accelerator: Select GPU or TPU
- Save
Check if GPU is available:
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU Model: {torch.cuda.get_device_name(0)}")2. Forms Feature
#@title Enter your age { run: "auto" }
age = 25 #@param {type:"slider", min:18, max:100, step:1}
name = "Alice" #@param {type:"string"}
gender = "Female" #@param ["Male", "Female", "Other"]
print(f"{name}, {age} years old, {gender}")After running, generates interactive forms!
3. Extended Magic Commands
# Install new library
!pip install seaborn
# Run shell commands
!ls -la
# View GPU info
!nvidia-smi
# Download file
!wget https://example.com/data.csvHands-on Example: Complete Data Analysis Workflow
# ========== 1. Environment Setup ==========
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
# ========== 2. Data Loading ==========
# Method 1: Load directly from URL (Recommended)
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
df = pd.read_csv(url)
# Method 2: Load from Google Drive
# from google.colab import drive
# drive.mount('/content/drive')
# df = pd.read_csv('/content/drive/MyDrive/data.csv')
print("✅ Data loaded successfully!")
df.head()
# ========== 3. Data Exploration ==========
print("\n📊 Basic Data Information:")
print(df.info())
print("\n📈 Descriptive Statistics:")
display(df.describe())
# ========== 4. Data Visualization ==========
# Tip distribution
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(df['tip'], bins=20, edgecolor='black', alpha=0.7)
plt.xlabel('Tip Amount')
plt.ylabel('Frequency')
plt.title('Distribution of Tips')
plt.subplot(1, 2, 2)
df.groupby('day')['tip'].mean().plot(kind='bar')
plt.xlabel('Day of Week')
plt.ylabel('Average Tip')
plt.title('Average Tip by Day')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# ========== 5. Statistical Analysis ==========
from scipy import stats
# Test: Weekend vs weekday tip differences
weekend_tips = df[df['day'].isin(['Sat', 'Sun'])]['tip']
weekday_tips = df[~df['day'].isin(['Sat', 'Sun'])]['tip']
t_stat, p_value = stats.ttest_ind(weekend_tips, weekday_tips)
print(f"\n📊 T-test Results:")
print(f" Weekend average tip: ${weekend_tips.mean():.2f}")
print(f" Weekday average tip: ${weekday_tips.mean():.2f}")
print(f" p-value: {p_value:.4f}")
if p_value < 0.05:
print(" ✅ Difference is significant (p < 0.05)")
else:
print(" ❌ Difference is not significant")
# ========== 6. Save Results ==========
# Save to Google Drive
# df.to_csv('/content/drive/MyDrive/analysis_results.csv', index=False)
# Or download to local
# from google.colab import files
# df.to_csv('results.csv', index=False)
# files.download('results.csv')2️⃣ Kaggle Notebooks
Quick Start
- Visit kaggle.com
- Register/login
- Click "Create" → "New Notebook"
Kaggle's Unique Advantages
1. Rich Datasets
# Directly access Kaggle datasets
import pandas as pd
# Example: Titanic dataset
df = pd.read_csv('/kaggle/input/titanic/train.csv')
df.head()2. 30 Hours Free GPU Per Week
- GPU: NVIDIA Tesla P100
- TPU: Also available
- Time limit: 30 hours/week
3. Community and Competitions
- View others' excellent notebooks
- Participate in data science competitions
- Learn best practices
Kaggle Hands-on Example
# ========== Kaggle Competition Template ==========
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 1. Load data
train = pd.read_csv('/kaggle/input/titanic/train.csv')
test = pd.read_csv('/kaggle/input/titanic/test.csv')
# 2. Data preprocessing
def preprocess(df):
df = df.copy()
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df['Fare'].fillna(df['Fare'].median(), inplace=True)
# Encode categorical variables
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)
return df
train_processed = preprocess(train)
test_processed = preprocess(test)
# 3. Feature selection
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare',
'Embarked_Q', 'Embarked_S']
X = train_processed[features]
y = train_processed['Survived']
# 4. Model training
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# 5. Evaluation
val_pred = model.predict(X_val)
print(f"Validation accuracy: {accuracy_score(y_val, val_pred):.3f}")
# 6. Predict and submit
test_pred = model.predict(test_processed[features])
submission = pd.DataFrame({
'PassengerId': test['PassengerId'],
'Survived': test_pred
})
submission.to_csv('submission.csv', index=False)
print("✅ Submission file generated!")3️⃣ This Website's Python Environment
Features
- Instant Run: No login required
- Learning Friendly: Designed for teaching
- Lightweight: Based on Pyodide (Python in browser)
Limitations
- Doesn't support all libraries (e.g., TensorFlow, PyTorch)
- Limited performance
- Cannot access local files
Use Cases
- Learning Python basic syntax
- Practicing Pandas, NumPy
- Simple data visualization
4️⃣ Other Online Platforms
Deepnote (Team Collaboration)
Website: deepnote.com
Advantages:
- Real-time collaboration (like Google Docs)
- Version control
- Integrated database connections
- Beautiful data visualization
Use: Team projects, corporate data analysis
Pricing:
- Free tier: 750 hours/month compute time
- Team: $33/user/month
Replit (General Programming)
Website: replit.com
Advantages:
- Supports multiple programming languages
- Can deploy web applications
- Real-time collaboration
Use: Learning programming, rapid prototyping
Paperspace Gradient
Website: gradient.paperspace.com
Advantages:
- Powerful GPU choices (including A100)
- Persistent storage
- Supports long-running tasks
- Jupyter and VS Code interfaces
Use: Deep learning research, large-scale training
Pricing:
- Free tier: Limited GPU time
- Growth: Starting at $8/month
SageMaker Studio Lab (AWS)
Website: studiolab.sagemaker.aws
Advantages:
- Free Jupyter environment from Amazon
- 12 hours/session, free GPU
- 15GB persistent storage
- No credit card required
Use: Machine learning experiments, teaching
Binder
Website: mybinder.org
Advantages:
- Completely free
- Create environment directly from GitHub repository
- One-click sharing of reproducible analysis
- Supports custom environments (requirements.txt)
Use: Sharing research code, teaching demonstrations
Usage Example:
1. Upload Jupyter Notebook to GitHub
2. Visit mybinder.org
3. Enter GitHub repository URL
4. Generate sharing linkProfessional Researchers' Cloud Workflow
Scenario 1: Academic Paper Reproduction
Problem: How to let reviewers and readers reproduce your results?
Solution:
- Put code and data on GitHub
- Add Binder badge in README
- Click badge to run online
Example README.md:
# My Research: Impact of Education on Income
[](https://mybinder.org/v2/gh/yourname/yourrepo/main)
Click the badge above to reproduce my analysis online.
## Environment Requirements
See requirements.txtScenario 2: Large-Scale Experiments (GPU Requirements)
Requirements Comparison:
| Task | Recommended Platform | GPU Type | Free Time |
|---|---|---|---|
| BERT Fine-tuning | Colab Pro+ | A100 | Paid $50/month |
| Image Classification (ResNet) | Kaggle | P100 | 30h/week |
| Large-scale Training | Paperspace | A100/V100 | Paid |
| Teaching Demo | SageMaker Studio Lab | T4 | 12h/session |
Scenario 3: Team Collaborative Research
Best Practices:
Tool Combination:
Data Storage: Google Drive / Dropbox
Code Version: GitHub
Collaboration Environment: Deepnote / Colab
Communication: Slack / TeamsDeepnote Collaboration Example:
# Code written by Team Member A
def clean_data(df):
"""Data cleaning function"""
df = df.dropna()
df['log_income'] = np.log(df['income'])
return df
# Team Member B can see and use in real-time
cleaned_df = clean_data(raw_df)
# Can add comments and suggestions
# @member_A: Should we handle outliers here?In-Depth Cloud Platform Comparison
Computing Resources Comparison
| Platform | CPU | RAM | GPU (Free) | GPU (Paid) | Storage |
|---|---|---|---|---|---|
| Colab | 2-core | 12-13GB | T4 | V100/A100 | Temporary |
| Colab Pro | High priority | 25GB | T4/P100 | V100/A100 | 200GB Drive |
| Kaggle | 4-core | 30GB | P100/T4 (30h/week) | N/A | 20GB persistent |
| Paperspace | On-demand | On-demand | Limited time | A100/V100 | Pay-as-you-go |
| SageMaker Lab | 2-core | 16GB | T4 (12h) | N/A | 15GB persistent |
Features Comparison
| Feature | Colab | Kaggle | Deepnote | Paperspace |
|---|---|---|---|---|
| Real-time Collaboration | Limited | ❌ | ✅ | ✅ |
| Version Control | Manual | Auto | Git integrated | Git integrated |
| Database Connection | Needs config | Needs config | Built-in | Needs config |
| Scheduled Tasks | ❌ | ❌ | ✅ | ✅ |
| API Deployment | Needs 3rd party | ❌ | ✅ | ✅ |
| Private Environment | ❌ | ❌ | ✅ | ✅ |
Academic Research Cloud Best Practices
1. Reproducibility Checklist
Before submitting paper:
# ========== Environment Information ==========
import sys
import pandas as pd
import numpy as np
print(f"Python: {sys.version}")
print(f"Pandas: {pd.__version__}")
print(f"NumPy: {np.__version__}")
# ========== Random Seed ==========
np.random.seed(42)
# ========== Data Source ==========
# Document data source and acquisition method in comments
# Data Source: World Bank Open Data
# URL: https://data.worldbank.org/...
# Download Date: 2024-01-15
# ========== Complete Workflow ==========
# 1. Data cleaning
# 2. Descriptive statistics
# 3. Main regression
# 4. Robustness checks2. Big Data Processing Strategies
# ========== Strategy 1: Chunking ==========
import pandas as pd
# For large files (>2GB)
chunksize = 10000
chunks = []
for chunk in pd.read_csv('large_file.csv', chunksize=chunksize):
# Process each chunk
processed = chunk.groupby('category')['value'].mean()
chunks.append(processed)
# Merge results
result = pd.concat(chunks)
# ========== Strategy 2: Use Dask (Distributed Computing) ==========
import dask.dataframe as dd
# Dask can handle data exceeding memory
ddf = dd.read_csv('huge_file.csv')
result = ddf.groupby('category')['value'].mean().compute()
# ========== Strategy 3: Sampling Analysis ==========
# First develop code with 10% of data
sample = pd.read_csv('large_file.csv', nrows=10000)
# Develop and test code...
# After confirmation, use complete data3. GPU Usage Best Practices
# ========== Check GPU Availability ==========
import torch
if torch.cuda.is_available():
device = torch.device("cuda")
print(f"Using GPU: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
device = torch.device("cpu")
print("Using CPU")
# ========== Free GPU Memory ==========
import gc
import torch
# Delete unused models
del model
gc.collect()
torch.cuda.empty_cache()
# ========== Monitor GPU Usage ==========
# In Colab
!nvidia-smi
# Or use Python
from pynvml import *
nvmlInit()
handle = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(handle)
print(f"Used: {info.used / 1e9:.2f} GB")
print(f"Free: {info.free / 1e9:.2f} GB")4. Data Security and Privacy
Sensitive Data Handling:
# ❌ Don't hardcode sensitive data
password = "my_secret_password" # Dangerous!
# ✅ Use environment variables (Colab)
from google.colab import userdata
password = userdata.get('MY_PASSWORD')
# ✅ Use Colab Secrets (Recommended)
# Left sidebar → 🔑 Secrets → Add key
# Then in code:
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY')Data Anonymization:
import hashlib
# Hash sensitive IDs
def anonymize_id(id_string):
return hashlib.sha256(str(id_string).encode()).hexdigest()[:10]
df['anonymous_id'] = df['user_id'].apply(anonymize_id)Cost Optimization Strategies
Maximize Free Resources
Strategy 1: Multi-platform Combination
Monday-Wednesday: Use Colab (free)
Thursday-Friday: Use Kaggle (free 30h GPU)
Weekend: Local computing or SageMaker Studio LabStrategy 2: Priority Tasks
Lightweight tasks (data cleaning, EDA): Free platforms
GPU-intensive tasks (deep learning): Paid platforms or local
Long-running tasks: Local serverStrategy 3: Code Optimization to Reduce Runtime
# ❌ Inefficient code (multiple loops)
for i in range(len(df)):
df.loc[i, 'new_col'] = df.loc[i, 'old_col'] * 2
# ✅ Vectorized operation (100x faster)
df['new_col'] = df['old_col'] * 2Platform Selection Guide
By Scenario
| Scenario | Recommended Platform |
|---|---|
| Learning Python Basics | This website |
| Data Analysis | Google Colab |
| Machine Learning | Google Colab / Kaggle |
| Deep Learning | Google Colab (GPU) / Kaggle |
| Team Collaboration | Deepnote |
| Competitions | Kaggle |
| Rapid Prototyping | Replit |
By Requirements
| Requirement | Recommended Platform |
|---|---|
| Need GPU | Colab / Kaggle |
| Need Large Datasets | Kaggle |
| Need Long Runtime | Local environment (Jupyter) |
| Need Team Collaboration | Deepnote / Colab |
| Need Version Control | Deepnote / Kaggle |
Best Practices
1. Data Management
Colab:
# Use Google Drive for data storage
from google.colab import drive
drive.mount('/content/drive')
# Project structure
# /content/drive/MyDrive/
# ├── projects/
# │ └── my_analysis/
# │ ├── data/
# │ ├── notebooks/
# │ └── outputs/Kaggle:
# Use Kaggle datasets
# Or upload private datasets (Settings → Data)2. Environment Management
# List all dependencies at notebook beginning
!pip install pandas==1.5.3
!pip install scikit-learn==1.2.2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(f"Pandas version: {pd.__version__}")3. Version Saving
Colab:
File→Save a copy in DriveFile→Revision history(View version history)
Kaggle:
- Auto-saves version after each run
- Can restore to any historical version
Hands-on Exercises
Exercise 1: Complete Data Analysis in Colab
- Open Google Colab
- Load data from URL:
url = "https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv"
df = pd.read_csv(url)- Analyze COVID-19 trends for different countries
- Save notebook to Google Drive
Exercise 2: Explore Datasets on Kaggle
- Visit Kaggle
- Search for "House Prices" dataset
- Create new notebook
- Conduct exploratory data analysis (EDA)
Exercise 3: Compare Local vs Cloud
Run same code on:
- This website environment
- Google Colab
- Local Jupyter
Compare speed and functionality differences
Next Steps
Now you have mastered:
- Jupyter Notebook (local interactive environment)
- VS Code (professional development environment)
- Online environments (cloud execution)
In the next module, we will start learning Python Basic Syntax and officially enter the programming world!
Ready? Let's continue!