ONNX: The Universal Intermediate Format
How ONNX Works
An ONNX file contains a computation graph: nodes (operations like Conv, MatMul, ReLU), edges (tensor connections), and initializers (trained weights). The format is framework-agnostic; it represents what the model computes, not how PyTorch or TensorFlow implemented it. Export: torch.onnx.export(model, sample_input, "model.onnx"). The exporter traces your model on the sample input, recording all operations.
ONNX Runtime
ONNX Runtime is an inference engine that runs ONNX models with optimizations. It applies graph optimizations (constant folding, operation fusion), selects efficient kernel implementations, and supports multiple backends (CPU, CUDA, DirectML, TensorRT). Typical speedups over PyTorch: 1.5-3x. Not as fast as pure TensorRT but works on more hardware and requires less tuning.
Common Export Issues
Dynamic control flow (if statements depending on tensor values) doesn"t export cleanly. The exporter traces one path and bakes it in. Custom operators need explicit ONNX registration. Some operations (like torch.unique) lack ONNX equivalents. Fix: rewrite unsupported ops, use opset version 14+ for better coverage, or fall back to TorchScript for dynamic models.