Skip to content

Online Python Environments

Zero Configuration Start Coding — Best Choices for Running Python in the Cloud


Why Use Online Environments?

Advantages

Zero Configuration: Ready to use in browser ✅ Free Resources: Some platforms provide free GPU/TPU ✅ Anywhere Anytime: Code on any device, from anywhere ✅ Easy Collaboration: Easily share notebooks with colleagues ✅ Consistent Environment: Avoid "works on my machine" problems

Disadvantages

❌ Requires stable internet connection ❌ Some features limited (e.g., large file processing) ❌ Free resources have time limits


Mainstream Online Python Environment Comparison

PlatformFree GPUStorageCollaborationUse Cases
Google Colab✅ (Limited)Google DriveDeep learning, teaching, prototyping
Kaggle Notebooks✅ (30h/week)20GBData competitions, machine learning
This WebsiteBrowserLearning Python basics
Deepnote5GBTeam collaboration
JupyterLiteBrowserLightweight learning

Quick Start

  1. Visit colab.research.google.com
  2. Sign in with Google account
  3. Click "New Notebook"
  4. Start coding!

Basic Operations

Create First Cell

python
# This is a code cell
print("Hello from Colab!")

import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
df

Run: Click ▶ on left or press Shift + Enter

Upload Data Files

Method 1: Temporary Upload (deleted after session ends)

python
from google.colab import files
uploaded = files.upload()

# Then read
import pandas as pd
df = pd.read_csv('your_file.csv')

Method 2: Read from Google Drive (Recommended)

python
from google.colab import drive
drive.mount('/content/drive')

# Read file from Drive
df = pd.read_csv('/content/drive/MyDrive/data/survey_data.csv')

Colab-Specific Features

1. Use Free GPU

Steps:

  1. Menu bar: RuntimeChange runtime type
  2. Hardware accelerator: Select GPU or TPU
  3. Save

Check if GPU is available:

python
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Model: {torch.cuda.get_device_name(0)}")

2. Forms Feature

python
#@title Enter your age { run: "auto" }
age = 25 #@param {type:"slider", min:18, max:100, step:1}
name = "Alice" #@param {type:"string"}
gender = "Female" #@param ["Male", "Female", "Other"]

print(f"{name}, {age} years old, {gender}")

After running, generates interactive forms!

3. Extended Magic Commands

python
# Install new library
!pip install seaborn

# Run shell commands
!ls -la

# View GPU info
!nvidia-smi

# Download file
!wget https://example.com/data.csv

Hands-on Example: Complete Data Analysis Workflow

python
# ========== 1. Environment Setup ==========
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

# ========== 2. Data Loading ==========
# Method 1: Load directly from URL (Recommended)
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
df = pd.read_csv(url)

# Method 2: Load from Google Drive
# from google.colab import drive
# drive.mount('/content/drive')
# df = pd.read_csv('/content/drive/MyDrive/data.csv')

print("✅ Data loaded successfully!")
df.head()

# ========== 3. Data Exploration ==========
print("\n📊 Basic Data Information:")
print(df.info())

print("\n📈 Descriptive Statistics:")
display(df.describe())

# ========== 4. Data Visualization ==========
# Tip distribution
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(df['tip'], bins=20, edgecolor='black', alpha=0.7)
plt.xlabel('Tip Amount')
plt.ylabel('Frequency')
plt.title('Distribution of Tips')

plt.subplot(1, 2, 2)
df.groupby('day')['tip'].mean().plot(kind='bar')
plt.xlabel('Day of Week')
plt.ylabel('Average Tip')
plt.title('Average Tip by Day')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

# ========== 5. Statistical Analysis ==========
from scipy import stats

# Test: Weekend vs weekday tip differences
weekend_tips = df[df['day'].isin(['Sat', 'Sun'])]['tip']
weekday_tips = df[~df['day'].isin(['Sat', 'Sun'])]['tip']

t_stat, p_value = stats.ttest_ind(weekend_tips, weekday_tips)
print(f"\n📊 T-test Results:")
print(f"   Weekend average tip: ${weekend_tips.mean():.2f}")
print(f"   Weekday average tip: ${weekday_tips.mean():.2f}")
print(f"   p-value: {p_value:.4f}")

if p_value < 0.05:
    print("   ✅ Difference is significant (p < 0.05)")
else:
    print("   ❌ Difference is not significant")

# ========== 6. Save Results ==========
# Save to Google Drive
# df.to_csv('/content/drive/MyDrive/analysis_results.csv', index=False)

# Or download to local
# from google.colab import files
# df.to_csv('results.csv', index=False)
# files.download('results.csv')

2️⃣ Kaggle Notebooks

Quick Start

  1. Visit kaggle.com
  2. Register/login
  3. Click "Create""New Notebook"

Kaggle's Unique Advantages

1. Rich Datasets

python
# Directly access Kaggle datasets
import pandas as pd

# Example: Titanic dataset
df = pd.read_csv('/kaggle/input/titanic/train.csv')
df.head()

2. 30 Hours Free GPU Per Week

  • GPU: NVIDIA Tesla P100
  • TPU: Also available
  • Time limit: 30 hours/week

3. Community and Competitions

  • View others' excellent notebooks
  • Participate in data science competitions
  • Learn best practices

Kaggle Hands-on Example

python
# ========== Kaggle Competition Template ==========
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# 1. Load data
train = pd.read_csv('/kaggle/input/titanic/train.csv')
test = pd.read_csv('/kaggle/input/titanic/test.csv')

# 2. Data preprocessing
def preprocess(df):
    df = df.copy()
    df['Age'].fillna(df['Age'].median(), inplace=True)
    df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
    df['Fare'].fillna(df['Fare'].median(), inplace=True)

    # Encode categorical variables
    df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
    df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)

    return df

train_processed = preprocess(train)
test_processed = preprocess(test)

# 3. Feature selection
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare',
            'Embarked_Q', 'Embarked_S']
X = train_processed[features]
y = train_processed['Survived']

# 4. Model training
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# 5. Evaluation
val_pred = model.predict(X_val)
print(f"Validation accuracy: {accuracy_score(y_val, val_pred):.3f}")

# 6. Predict and submit
test_pred = model.predict(test_processed[features])
submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': test_pred
})
submission.to_csv('submission.csv', index=False)
print("✅ Submission file generated!")

3️⃣ This Website's Python Environment

Features

  • Instant Run: No login required
  • Learning Friendly: Designed for teaching
  • Lightweight: Based on Pyodide (Python in browser)

Limitations

  • Doesn't support all libraries (e.g., TensorFlow, PyTorch)
  • Limited performance
  • Cannot access local files

Use Cases

  • Learning Python basic syntax
  • Practicing Pandas, NumPy
  • Simple data visualization

4️⃣ Other Online Platforms

Deepnote (Team Collaboration)

Website: deepnote.com

Advantages:

  • Real-time collaboration (like Google Docs)
  • Version control
  • Integrated database connections
  • Beautiful data visualization

Use: Team projects, corporate data analysis

Pricing:

  • Free tier: 750 hours/month compute time
  • Team: $33/user/month

Replit (General Programming)

Website: replit.com

Advantages:

  • Supports multiple programming languages
  • Can deploy web applications
  • Real-time collaboration

Use: Learning programming, rapid prototyping

Paperspace Gradient

Website: gradient.paperspace.com

Advantages:

  • Powerful GPU choices (including A100)
  • Persistent storage
  • Supports long-running tasks
  • Jupyter and VS Code interfaces

Use: Deep learning research, large-scale training

Pricing:

  • Free tier: Limited GPU time
  • Growth: Starting at $8/month

SageMaker Studio Lab (AWS)

Website: studiolab.sagemaker.aws

Advantages:

  • Free Jupyter environment from Amazon
  • 12 hours/session, free GPU
  • 15GB persistent storage
  • No credit card required

Use: Machine learning experiments, teaching

Binder

Website: mybinder.org

Advantages:

  • Completely free
  • Create environment directly from GitHub repository
  • One-click sharing of reproducible analysis
  • Supports custom environments (requirements.txt)

Use: Sharing research code, teaching demonstrations

Usage Example:

1. Upload Jupyter Notebook to GitHub
2. Visit mybinder.org
3. Enter GitHub repository URL
4. Generate sharing link

Professional Researchers' Cloud Workflow

Scenario 1: Academic Paper Reproduction

Problem: How to let reviewers and readers reproduce your results?

Solution:

  1. Put code and data on GitHub
  2. Add Binder badge in README
  3. Click badge to run online

Example README.md:

markdown
# My Research: Impact of Education on Income

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/yourname/yourrepo/main)

Click the badge above to reproduce my analysis online.

## Environment Requirements
See requirements.txt

Scenario 2: Large-Scale Experiments (GPU Requirements)

Requirements Comparison:

TaskRecommended PlatformGPU TypeFree Time
BERT Fine-tuningColab Pro+A100Paid $50/month
Image Classification (ResNet)KaggleP10030h/week
Large-scale TrainingPaperspaceA100/V100Paid
Teaching DemoSageMaker Studio LabT412h/session

Scenario 3: Team Collaborative Research

Best Practices:

Tool Combination:

Data Storage: Google Drive / Dropbox
Code Version: GitHub
Collaboration Environment: Deepnote / Colab
Communication: Slack / Teams

Deepnote Collaboration Example:

python
# Code written by Team Member A
def clean_data(df):
    """Data cleaning function"""
    df = df.dropna()
    df['log_income'] = np.log(df['income'])
    return df

# Team Member B can see and use in real-time
cleaned_df = clean_data(raw_df)

# Can add comments and suggestions
# @member_A: Should we handle outliers here?

In-Depth Cloud Platform Comparison

Computing Resources Comparison

PlatformCPURAMGPU (Free)GPU (Paid)Storage
Colab2-core12-13GBT4V100/A100Temporary
Colab ProHigh priority25GBT4/P100V100/A100200GB Drive
Kaggle4-core30GBP100/T4 (30h/week)N/A20GB persistent
PaperspaceOn-demandOn-demandLimited timeA100/V100Pay-as-you-go
SageMaker Lab2-core16GBT4 (12h)N/A15GB persistent

Features Comparison

FeatureColabKaggleDeepnotePaperspace
Real-time CollaborationLimited
Version ControlManualAutoGit integratedGit integrated
Database ConnectionNeeds configNeeds configBuilt-inNeeds config
Scheduled Tasks
API DeploymentNeeds 3rd party
Private Environment

Academic Research Cloud Best Practices

1. Reproducibility Checklist

Before submitting paper:

python
# ========== Environment Information ==========
import sys
import pandas as pd
import numpy as np

print(f"Python: {sys.version}")
print(f"Pandas: {pd.__version__}")
print(f"NumPy: {np.__version__}")

# ========== Random Seed ==========
np.random.seed(42)

# ========== Data Source ==========
# Document data source and acquisition method in comments
# Data Source: World Bank Open Data
# URL: https://data.worldbank.org/...
# Download Date: 2024-01-15

# ========== Complete Workflow ==========
# 1. Data cleaning
# 2. Descriptive statistics
# 3. Main regression
# 4. Robustness checks

2. Big Data Processing Strategies

python
# ========== Strategy 1: Chunking ==========
import pandas as pd

# For large files (>2GB)
chunksize = 10000
chunks = []

for chunk in pd.read_csv('large_file.csv', chunksize=chunksize):
    # Process each chunk
    processed = chunk.groupby('category')['value'].mean()
    chunks.append(processed)

# Merge results
result = pd.concat(chunks)

# ========== Strategy 2: Use Dask (Distributed Computing) ==========
import dask.dataframe as dd

# Dask can handle data exceeding memory
ddf = dd.read_csv('huge_file.csv')
result = ddf.groupby('category')['value'].mean().compute()

# ========== Strategy 3: Sampling Analysis ==========
# First develop code with 10% of data
sample = pd.read_csv('large_file.csv', nrows=10000)
# Develop and test code...
# After confirmation, use complete data

3. GPU Usage Best Practices

python
# ========== Check GPU Availability ==========
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    device = torch.device("cpu")
    print("Using CPU")

# ========== Free GPU Memory ==========
import gc
import torch

# Delete unused models
del model
gc.collect()
torch.cuda.empty_cache()

# ========== Monitor GPU Usage ==========
# In Colab
!nvidia-smi

# Or use Python
from pynvml import *
nvmlInit()
handle = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(handle)
print(f"Used: {info.used / 1e9:.2f} GB")
print(f"Free: {info.free / 1e9:.2f} GB")

4. Data Security and Privacy

Sensitive Data Handling:

python
# ❌ Don't hardcode sensitive data
password = "my_secret_password"  # Dangerous!

# ✅ Use environment variables (Colab)
from google.colab import userdata
password = userdata.get('MY_PASSWORD')

# ✅ Use Colab Secrets (Recommended)
# Left sidebar → 🔑 Secrets → Add key
# Then in code:
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY')

Data Anonymization:

python
import hashlib

# Hash sensitive IDs
def anonymize_id(id_string):
    return hashlib.sha256(str(id_string).encode()).hexdigest()[:10]

df['anonymous_id'] = df['user_id'].apply(anonymize_id)

Cost Optimization Strategies

Maximize Free Resources

Strategy 1: Multi-platform Combination

Monday-Wednesday: Use Colab (free)
Thursday-Friday: Use Kaggle (free 30h GPU)
Weekend: Local computing or SageMaker Studio Lab

Strategy 2: Priority Tasks

Lightweight tasks (data cleaning, EDA): Free platforms
GPU-intensive tasks (deep learning): Paid platforms or local
Long-running tasks: Local server

Strategy 3: Code Optimization to Reduce Runtime

python
# ❌ Inefficient code (multiple loops)
for i in range(len(df)):
    df.loc[i, 'new_col'] = df.loc[i, 'old_col'] * 2

# ✅ Vectorized operation (100x faster)
df['new_col'] = df['old_col'] * 2

Platform Selection Guide

By Scenario

ScenarioRecommended Platform
Learning Python BasicsThis website
Data AnalysisGoogle Colab
Machine LearningGoogle Colab / Kaggle
Deep LearningGoogle Colab (GPU) / Kaggle
Team CollaborationDeepnote
CompetitionsKaggle
Rapid PrototypingReplit

By Requirements

RequirementRecommended Platform
Need GPUColab / Kaggle
Need Large DatasetsKaggle
Need Long RuntimeLocal environment (Jupyter)
Need Team CollaborationDeepnote / Colab
Need Version ControlDeepnote / Kaggle

Best Practices

1. Data Management

Colab:

python
# Use Google Drive for data storage
from google.colab import drive
drive.mount('/content/drive')

# Project structure
# /content/drive/MyDrive/
# ├── projects/
# │   └── my_analysis/
# │       ├── data/
# │       ├── notebooks/
# │       └── outputs/

Kaggle:

python
# Use Kaggle datasets
# Or upload private datasets (Settings → Data)

2. Environment Management

python
# List all dependencies at notebook beginning
!pip install pandas==1.5.3
!pip install scikit-learn==1.2.2

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(f"Pandas version: {pd.__version__}")

3. Version Saving

Colab:

  • FileSave a copy in Drive
  • FileRevision history (View version history)

Kaggle:

  • Auto-saves version after each run
  • Can restore to any historical version

Hands-on Exercises

Exercise 1: Complete Data Analysis in Colab

  1. Open Google Colab
  2. Load data from URL:
python
url = "https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv"
df = pd.read_csv(url)
  1. Analyze COVID-19 trends for different countries
  2. Save notebook to Google Drive

Exercise 2: Explore Datasets on Kaggle

  1. Visit Kaggle
  2. Search for "House Prices" dataset
  3. Create new notebook
  4. Conduct exploratory data analysis (EDA)

Exercise 3: Compare Local vs Cloud

Run same code on:

  • This website environment
  • Google Colab
  • Local Jupyter

Compare speed and functionality differences


Next Steps

Now you have mastered:

  • Jupyter Notebook (local interactive environment)
  • VS Code (professional development environment)
  • Online environments (cloud execution)

In the next module, we will start learning Python Basic Syntax and officially enter the programming world!

Ready? Let's continue!

Released under the MIT License. Content © Author.