- LLM Inference (HuggingFace Transformers)
This is a minimal example of running a causal language model using HuggingFace.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Explain transformer attention in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Key idea:
Autoregressive decoding
Next-token probability sampling
Transformer decoder stack
- Minimal LLM Forward Pass (PyTorch Concept)
This shows the core idea behind next-token prediction.
import torch
import torch.nn.functional as F
logits = torch.randn(1, 10000) # vocabulary size
convert to probabilities
probs = F.softmax(logits, dim=-1)
sample next token
next_token = torch.multinomial(probs, num_samples=1)
print(next_token)
Key idea:
LLM output = probability distribution over tokens
Sampling strategy defines creativity vs determinism
- Fine-Tuning / LoRA-style Concept (Simplified)
In production systems, full fine-tuning is often replaced with adapter-based methods like LoRA.
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpt2")
pseudo representation of adapter injection
for param in model.parameters():
param.requires_grad = False # freeze base model
train small adapter layers (conceptual)
Key idea:
Freeze base model
Train lightweight adaptation layers
Efficient for domain specialization
- Diffusion Model (PyTorch Core Idea)
Diffusion models generate images by reversing a noise process.
Forward noising step:
import torch
def add_noise(x, t, noise):
return x * (1 - t) + noise * t
Reverse denoising step (conceptual UNet pass):
class SimpleDenoiser(torch.nn.Module):
def init(self):
super().init()
self.net = torch.nn.Sequential(
torch.nn.Linear(1000, 512),
torch.nn.ReLU(),
torch.nn.Linear(512, 1000)
)
def forward(self, x, t):
return self.net(x)
model = SimpleDenoiser()
Sampling loop (core diffusion idea):
x = torch.randn(1, 1000) # pure noise
for t in reversed(range(10)):
x = model(x, t)
final x approximates generated sample
Key idea:
Start from Gaussian noise
Iteratively denoise toward data distribution
- Real Diffusion Pipeline (Stable Diffusion via HuggingFace)
This is how production-grade image generation works.
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
prompt = "a futuristic AI developer workspace, cinematic lighting"
image = pipe(prompt, num_inference_steps=30).images[0]
image.save("output.png")
Key idea:
Text encoder (CLIP / Transformer)
Latent diffusion model
Decoder reconstructs image from latent space
- Comparison at Code Level System Core Mechanism Output Type LLM Autoregressive Transformer Tokens (text) Diffusion Model Iterative denoising Pixels (image/video) GAN Adversarial training Synthetic data
- Unified Mental Model (Important Insight)
At implementation level:
LLMs = single-step probabilistic sampling over tokens
Diffusion = multi-step refinement in latent space
GANs = min-max optimization between generator/discriminator
- Closing Technical Insight
Modern generative systems are converging toward:
Transformer backbones everywhere
Shared latent representations
Unified multimodal generation pipelines
So the distinction is not disappearing—but architectural boundaries are becoming compositional rather than isolated.