LLM Inference (HuggingFace Transformers)

This is a minimal example of running a causal language model using HuggingFace.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Explain transformer attention in simple terms"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))
Key idea:
Autoregressive decoding
Next-token probability sampling
Transformer decoder stack

Minimal LLM Forward Pass (PyTorch Concept)

This shows the core idea behind next-token prediction.

import torch
import torch.nn.functional as F

logits = torch.randn(1, 10000) # vocabulary size

convert to probabilities

probs = F.softmax(logits, dim=-1)

sample next token

next_token = torch.multinomial(probs, num_samples=1)

print(next_token)
Key idea:
LLM output = probability distribution over tokens
Sampling strategy defines creativity vs determinism

Fine-Tuning / LoRA-style Concept (Simplified)

In production systems, full fine-tuning is often replaced with adapter-based methods like LoRA.

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("gpt2")

pseudo representation of adapter injection

for param in model.parameters():
param.requires_grad = False # freeze base model

train small adapter layers (conceptual)

Key idea:
Freeze base model
Train lightweight adaptation layers
Efficient for domain specialization

Diffusion Model (PyTorch Core Idea)

Diffusion models generate images by reversing a noise process.

Forward noising step:
import torch

def add_noise(x, t, noise):
return x * (1 - t) + noise * t
Reverse denoising step (conceptual UNet pass):
class SimpleDenoiser(torch.nn.Module):
def init(self):
super().init()
self.net = torch.nn.Sequential(
torch.nn.Linear(1000, 512),
torch.nn.ReLU(),
torch.nn.Linear(512, 1000)
)

def forward(self, x, t):
    return self.net(x)

model = SimpleDenoiser()
Sampling loop (core diffusion idea):
x = torch.randn(1, 1000) # pure noise

for t in reversed(range(10)):
x = model(x, t)

final x approximates generated sample

Key idea:
Start from Gaussian noise
Iteratively denoise toward data distribution

Real Diffusion Pipeline (Stable Diffusion via HuggingFace)

This is how production-grade image generation works.

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)

pipe = pipe.to("cuda")

prompt = "a futuristic AI developer workspace, cinematic lighting"

image = pipe(prompt, num_inference_steps=30).images[0]

image.save("output.png")
Key idea:
Text encoder (CLIP / Transformer)
Latent diffusion model
Decoder reconstructs image from latent space

Comparison at Code Level System Core Mechanism Output Type LLM Autoregressive Transformer Tokens (text) Diffusion Model Iterative denoising Pixels (image/video) GAN Adversarial training Synthetic data
Unified Mental Model (Important Insight)

At implementation level:

LLMs = single-step probabilistic sampling over tokens
Diffusion = multi-step refinement in latent space
GANs = min-max optimization between generator/discriminator

Closing Technical Insight

Modern generative systems are converging toward:

Transformer backbones everywhere
Shared latent representations
Unified multimodal generation pipelines

So the distinction is not disappearing—but architectural boundaries are becoming compositional rather than isolated.

LLMs vs Generative AI

convert to probabilities

sample next token

pseudo representation of adapter injection

train small adapter layers (conceptual)

final x approximates generated sample