The Right Tools Matter
After years of building AI systems, I've refined my toolkit to maximize productivity while minimizing friction. These are the tools I use daily, from development to deployment.
Development Environment
Core Setup
- VS Code: Primary IDE with AI-powered extensions
- PyCharm: For complex Python projects requiring deep debugging
- Jupyter Lab: Interactive experimentation and visualization
- Neovim: Quick edits and remote work
VS Code Extensions
{
"essential_extensions": [
"ms-python.python", // Python language support
"ms-toolsai.jupyter", // Jupyter notebooks
"GitHub.copilot", // AI pair programming
"eamodio.gitlens", // Git supercharged
"ms-vscode-remote.remote-ssh", // Remote development
"ms-azuretools.vscode-docker", // Docker integration
"redhat.vscode-yaml", // YAML support
"tamasfe.even-better-toml" // TOML support
]
}
Machine Learning Frameworks
💡 Framework Selection
- PyTorch: Primary choice - flexibility and research-friendly
- TensorFlow: Production deployments with TF Serving
- JAX: High-performance numerical computing
- Hugging Face: Transformers and NLP tasks
- FastAI: Rapid prototyping and education
My Base Environment
# Create conda environment
conda create -n ml python=3.10
conda activate ml
# Core ML stack
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers datasets accelerate
pip install lightning tensorboard wandb
# Computer Vision
pip install opencv-python pillow albumentations timm
# NLP
pip install tokenizers sentencepiece sacremoses
# Data & Viz
pip install pandas numpy scikit-learn matplotlib seaborn plotly
# Experiment tracking
pip install mlflow wandb tensorboard
# Development
pip install black isort flake8 mypy pytest ipython
Experiment Tracking & MLOps
Weights & Biases Setup
import wandb
# Initialize experiment
wandb.init(
project="my-project",
config={
"learning_rate": 0.001,
"architecture": "ResNet50",
"dataset": "ImageNet",
"epochs": 100,
}
)
# Log metrics during training
for epoch in range(epochs):
train_loss = train()
val_loss, val_acc = validate()
wandb.log({
"epoch": epoch,
"train_loss": train_loss,
"val_loss": val_loss,
"val_accuracy": val_acc
})
# Log model
wandb.save("model.pth")
Data Management
Data Tools
- DVC: Version control for datasets and models
- Label Studio: Data annotation
- Pandera: Data validation
- Great Expectations: Data quality testing
DVC Workflow
# Initialize DVC
dvc init
# Add remote storage (S3)
dvc remote add -d storage s3://my-bucket/dvc-store
# Track dataset
dvc add data/train.csv
git add data/train.csv.dvc data/.gitignore
# Track model
dvc add models/best_model.pth
# Push to remote
dvc push
# Pull on another machine
dvc pull
Deployment & Infrastructure
Docker Template
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Expose port
EXPOSE 8000
# Run server
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
FastAPI for Model Serving
from fastapi import FastAPI, File, UploadFile
from pydantic import BaseModel
import torch
app = FastAPI()
# Load model
model = torch.load("model.pth")
model.eval()
class PredictionRequest(BaseModel):
data: list
@app.post("/predict")
async def predict(request: PredictionRequest):
"""Make prediction."""
input_tensor = torch.tensor(request.data)
with torch.no_grad():
prediction = model(input_tensor)
return {
"prediction": prediction.tolist(),
"confidence": torch.max(prediction).item()
}
@app.get("/health")
async def health():
return {"status": "healthy"}
Monitoring & Observability
💡 Monitoring Stack
- Prometheus: Metrics collection
- Grafana: Visualization dashboards
- Sentry: Error tracking
- Evidently: ML model monitoring
Productivity Tools
Daily Drivers
- tmux: Terminal multiplexing for remote sessions
- fzf: Fuzzy finder for commands and files
- ripgrep: Fast code search
- httpie: User-friendly HTTP client
- jq: JSON processor for API testing
- Notion: Documentation and notes
- Obsidian: Technical knowledge base
My .zshrc Essentials
# Aliases
alias ll='ls -lah'
alias g='git'
alias d='docker'
alias dc='docker-compose'
alias k='kubectl'
alias py='python'
alias ipy='ipython'
alias jl='jupyter lab'
# Quick conda activation
alias mla='conda activate ml'
# GPU monitoring
alias gpus='watch -n 0.5 nvidia-smi'
# Port forwarding for Jupyter
alias jport='ssh -N -L 8888:localhost:8888'
# Quick experiment logging
exp() {
echo "$(date): $*" >> experiments.log
}
# Find Python processes
alias pyproc='ps aux | grep python'
Hardware Setup
My Workstation
- Local: RTX 4090 (24GB) for development
- Cloud: AWS p3.2xlarge (V100) for training
- Edge Testing: NVIDIA Jetson Orin
- Storage: 2TB NVMe SSD + Network NAS
Learning Resources
Tools I use to stay current:
- Papers: arXiv daily digest + Papers With Code
- News: Hacker News, r/MachineLearning
- Podcasts: TWIML AI, Gradient Dissent
- Blogs: Distill, Jay Alammar, Lil'Log
- Communities: Discord servers, Twitter AI community
Conclusion
The right toolkit amplifies your capabilities as an AI engineer. These tools have been battle-tested across dozens of projects and continue to evolve with the field.
Remember: tools are means, not ends. Start simple, add complexity only when needed, and always optimize for your workflow, not someone else's.