PII Protection in PHP without a framework holding the leash

Every app that touches personal data eventually hits the same wall. You've got a national_id column, an email, a phone number, maybe a credit card. Compliance (PDPA here in Malaysia, GDPR, SOC 2 — pick your acronym) says you can't just store them in plaintext, and you definitely can't dump them into your audit log when someone edits a record.

The usual answer is "use the framework's encryption." And that works — until you're in a queue worker, a standalone CLI importer, a Symfony service, or a plain PHP webhook handler that doesn't have the framework's container booted. Suddenly your PII handling is coupled to config(), env(), and a service provider that isn't there.

I kept hitting this across different codebases, so I extracted the primitives into a small library: cleaniquecoders/pii-protection. This post walks through what's in it, but more usefully, why it's shaped the way it is — because the design decisions are the actual lesson here.

The one constraint that drove everything

The package has a single hard rule that everything else falls out of:

No framework. No global state. No static facades. No reading from env/config.

Every class is constructor-injected with explicit inputs and returns explicit outputs. That's it. The OpenSslEncrypter doesn't go looking for APP_KEY — you hand it a key. The masking strategies don't read a config file — you instantiate them with the options you want.

use CleaniqueCoders\PiiProtection\PiiManager;
use CleaniqueCoders\PiiProtection\Encryption\OpenSslEncrypter;
use CleaniqueCoders\PiiProtection\Masking\TailStrategy;

$manager = new PiiManager(
    new OpenSslEncrypter(key: $appKey),
    new TailStrategy(visible: 4),
);

$cipher = $manager->encrypt('0123456789');  // store at rest
$plain  = $manager->decrypt($cipher);       // "0123456789"
$masked = $manager->mask('0123456789');      // "******6789" for display
$audit  = $manager->redact($payload, ['phone', 'national_id']);

Why bother being this strict? Because PII protection is exactly the kind of cross-cutting concern that shows up in places your framework doesn't reach. The moment your encryption helper assumes a booted Laravel container, it's useless in the standalone importer that's actually processing the sensitive batch file at 2am. Keeping the primitives portable means the same code protects data everywhere — web request, artisan command, raw worker, test harness — with zero conditional "are we in a framework right now" branching.

There's a real trade-off here, and I want to name it honestly: you give up convenience. No auto-resolved facade, no Crypt::encrypt() one-liner. You have to wire the key in yourself. The package's position is that key handling is the caller's job — it'll rotate keys for you (more on that below), but loading and storing them is your responsibility. For a library whose entire reason to exist is correctness around sensitive data, I'd rather make the dependency explicit than hide it.

Requirements are PHP ^8.4, ext-openssl, ext-mbstring.

composer require cleaniquecoders/pii-protection

Three jobs, kept separate

The library cleanly splits into three responsibilities that are not the same thing, even though people often conflate them:

Encryption — reversible. You need the value back later. (AES-256-GCM)
Masking — one-way display transformation. ******6789 for the UI or a log.
Redaction — walk a payload and mask the listed fields before persisting.

Conflating these is where bugs come from. Masking is not security — it's a display concern. If you "mask" a value and store the masked version thinking it's protected, you've lost the data. If you encrypt a value you only ever need to show partially, you've added decryption surface for no reason. Knowing which job you're doing is half the battle.

Masking strategies

Each strategy implements one tiny contract — MaskStrategy::mask(string $value): string — and there's one per common PII shape:

Strategy	Behaviour	Example
`TailStrategy`	Keep last N chars	`******6789`
`FullStrategy`	Mask everything	`**********`
`EmailStrategy`	Mask local-part, keep domain	`****@acme.com`
`HashStrategy`	One-way `sha256` digest	`f4b0...e21`
`CreditCardStrategy`	Keep last 4, preserve grouping	`** ** 1111`
`IpStrategy`	Mask the last octet/group	`192.168.1.**`
`NameStrategy`	Keep each word's initial	`J* D`
`NricStrategy`	Mask MyKad digits, keep dashes	`****--****`

(new EmailStrategy())->mask('john.doe@acme.com');     // "****@acme.com"
(new CreditCardStrategy())->mask('4111 1111 1111 1111'); // "**** **** **** 1111"
(new NricStrategy())->mask('900101-01-1234');         // "******-**-****"

Every strategy takes an optional maskChar if * doesn't suit your UI:

(new TailStrategy(visible: 4, maskChar: '•'))->mask('0123456789'); // "••••••6789"

The NricStrategy is the local touch — masking Malaysian MyKad numbers while keeping the dash grouping intact, which is what you actually want on screen.

Encryption at rest, done properly

OpenSslEncrypter uses AES-256-GCM, and a few details matter:

use CleaniqueCoders\PiiProtection\Encryption\OpenSslEncrypter;

$encrypter = new OpenSslEncrypter(key: $appKey);

$cipher = $encrypter->encrypt('012345678'); // store this
$plain  = $encrypter->decrypt($cipher);     // "012345678"

Per message, it generates a random 16-byte salt and a 12-byte IV, then derives a fresh 256-bit data key with HKDF-SHA256 from the ring key plus that salt. The plaintext is encrypted under that derived key, with the caller context bound as GCM AAD. So encrypting the same value twice produces different ciphertext — correct behaviour for at-rest encryption, with a consequence people forget that I'll come back to.

The per-message derived key isn't decoration. Deriving a unique key per message means a single key/nonce mishap can't cascade — the blast radius is one record, not the whole column. You're not encrypting a million rows under literally the same AES key.

Ciphertext is written in a self-describing, versioned format:

v2.<keyId>.<base64( salt(16) || iv(12) || tag(16) || ciphertext )>

Everything decrypt needs travels with the ciphertext — which key id wrote it, the salt, the IV, the GCM tag. That's what makes seamless upgrades possible. The clever bit is backward compatibility: older 1.x releases (pre-1.2) wrote an unversioned base64(iv || tag || ciphertext) blob with a sha256-derived key. On decrypt, the code checks for the v2. prefix — present means versioned path, absent means legacy path. And the prefix is unambiguous because . isn't a base64 character, so legacy ciphertext can never accidentally look versioned. No flag column, no migration, no guessing. Old data keeps decrypting unchanged. Versioning your serialized formats from day one — and choosing a delimiter that can't appear in the payload — is the kind of small decision that saves a brutal migration later.

Context binding (AAD)

AES-GCM supports Additional Authenticated Data, and the package exposes it as context binding. You bind ciphertext to a context — a user id, a column name — and that same context is required to decrypt:

$cipher = $encrypter->encryptWithContext('012345678', 'user:123');
$plain  = $encrypter->decryptWithContext($cipher, 'user:123'); // wrong context throws

Why care? This defends against a sneaky class of attack: moving valid ciphertext from one row or column to another. Without AAD, an attacker (or a buggy migration) could copy user A's encrypted SSN into user B's row and it would decrypt fine. Bind the context to user:123 and that ciphertext is worthless anywhere else. It's cheap insurance against data being shuffled around.

Key rotation

Hand the encrypter a KeyRing instead of a single key, and rotation becomes a non-event:

use CleaniqueCoders\PiiProtection\Encryption\KeyRing;

$encrypter = new OpenSslEncrypter(new KeyRing(
    ['2024' => $oldKey, '2025' => $newKey],
    currentId: '2025', // encrypt with this; '2024' ciphertext still decrypts
));

New writes use the current key. Old ciphertext keeps decrypting with whichever key its embedded id points to (remember that versioned format — the key id rides along). No big-bang re-encryption migration. You rotate forward, and old data re-encrypts lazily as it's touched, or never, and it still works. Anyone who's tried to re-encrypt a 50-million-row table in one shot knows why this design exists.

The trap: never query an encrypted column

Here's the consequence I deferred earlier. Because encryption is non-deterministic — random IV and salt every call — the same value encrypts to different ciphertext each time. Which means:

-- This will NEVER match. Don't do it.
SELECT * FROM users WHERE email_cipher = ?

This is the single most common mistake with at-rest encryption, and the package treats it as a guardrail worth shouting about. The fix is a blind index — a deterministic, one-way HMAC you store alongside the ciphertext and query instead:

use CleaniqueCoders\PiiProtection\Encryption\HmacBlindIndex;

$blind = new HmacBlindIndex(key: $indexKey);

$row = [
    'email_cipher' => $encrypter->encrypt($email),  // for retrieval/display
    'email_index'  => $blind->index($email),        // for WHERE email_index = ?
];

$blind->matches($email, $row['email_index']); // true

The index confirms a match — it never reveals the value, and it's not reversible. matches() compares in constant time (hash_equals), so you're not leaking timing information either. You get equality lookups on encrypted data without compromising the encryption. (If you don't need the value back at all, just hash or mask it and skip the cipher column entirely.)

There's one trap worth calling out, because it's the thing people actually get wrong: the index is computed on the exact bytes you pass. John@Acme.com and john@acme.com produce different indexes, so a naive email lookup silently misses. The fix is a normaliser — applied at both write and query time:

$blind = new HmacBlindIndex(
    key: $indexKey,
    length: 32, // hex chars; trade storage for collision-resistance
    normaliser: fn (string $v) => strtolower(trim($v)),
);

Lowercase-and-trim the value before hashing and your lookups behave the way users expect. The length knob lets you shorten the stored index (down from the full 64 hex chars) when collision-resistance matters less than storage — a deliberate trade-off you get to make per column.

This whole area is the kind of thing that's obvious in hindsight and a production incident in foresight. Worth internalizing.

Redaction — where audit logs get cleaned up

This is the part I reach for most. You've got a change-log payload (old_values / new_values, or any nested map) about to be written to an audit table, and it's full of PII. ArrayRedactor walks it and masks only the fields you name — recursing into nested arrays and JSON-decoded structures — leaving everything else untouched.

use CleaniqueCoders\PiiProtection\ArrayRedactor;
use CleaniqueCoders\PiiProtection\Masking\TailStrategy;

$redactor = new ArrayRedactor(new TailStrategy(visible: 4));
$clean = $redactor->redact($payload, ['phone', 'national_id']);

Two features make this genuinely useful in real schemas rather than toy examples.

Per-field strategies — each field gets the right masking in a single pass:

$clean = $redactor->redact($payload, [
    'email' => new EmailStrategy,
    'phone' => new TailStrategy(visible: 4),
    'nric'  => new HashStrategy,
    'name',  // bare name → uses the redactor's default strategy
]);

Dot-path and wildcard targeting — so you mask a precise location, not any key that happens to share a name:

$clean = $redactor->redact($payload, [
    'user.phone',        // only user.phone, not a top-level "phone"
    'users.*.phone',     // every users[].phone
    'contact.email' => new EmailStrategy,
]);

That users.*.phone wildcard is the difference between this being a demo and being something you can point at a real nested API payload.

Redacting objects and DTOs

If you're working with typed objects instead of arrays, tag the properties with an attribute and let ObjectRedactor handle it:

use CleaniqueCoders\PiiProtection\Attributes\Pii;
use CleaniqueCoders\PiiProtection\ObjectRedactor;
use CleaniqueCoders\PiiProtection\Masking\{EmailStrategy, FullStrategy};

class User
{
    public function __construct(
        #[Pii] public string $name,
        #[Pii(strategy: EmailStrategy::class)] public string $email,
        public int $age, // untagged — copied through untouched
    ) {}
}

$clean = (new ObjectRedactor(new FullStrategy))->redact($user);
// ['name' => '********', 'email' => '****@acme.com', 'age' => 30]

Declaring sensitivity at the property is a nice pattern — the DTO becomes self-documenting about what's PII, and the redaction logic doesn't live in some far-away config list that drifts out of sync.

Scrubbing free text

Named fields are easy. The harder problem is PII buried in unstructured text — log lines, exception messages, user comments — where there's no "field" to target. PiiScrubber runs pattern detectors over free text:

use CleaniqueCoders\PiiProtection\Detection\PiiScrubber;

(new PiiScrubber)->scrub('contact john@acme.com from 192.168.1.42');
// "contact ************* from ************"

(new PiiScrubber)->detect($logLine);
// [['type' => 'email', 'value' => ..., 'offset' => ...], ...]

Built-in detectors cover email, credit card, Malaysian NRIC, IPv4, and phone numbers (the phone and NRIC patterns are tuned for Malaysian formats). There's a subtle design decision in here worth stealing: the detectors run in a fixed order, broadest pattern first. Credit cards get masked before the phone-number detector runs, because a 16-digit card number can otherwise partially match a phone pattern and you'd end up with half-masked garbage. When you're chaining regex replacements over the same string, order isn't cosmetic — it's correctness.

detect() is the non-destructive sibling: it returns each hit's type, value, and byte offset without touching the text, which is handy when you want to log that PII was present without logging the PII itself. You can also restrict which detectors run by passing types: ['email', 'nric'] to the constructor.

For anything bespoke, wrap a pattern in RegexStrategy:

use CleaniqueCoders\PiiProtection\Masking\RegexStrategy;

(new RegexStrategy('/\d{10}/', new TailStrategy(visible: 4)))->mask('ref 0123456789');
// "ref ******6789"

Pipe this into a log processor and your log files stop being a compliance liability.

Tokenization

Sometimes you don't want the value or a reversible cipher floating around your main system — you want an opaque placeholder, with the real value parked somewhere isolated. That's tokenization:

use CleaniqueCoders\PiiProtection\Tokenization\{Tokenizer, ArrayVault};

$tokenizer = new Tokenizer(new ArrayVault);

$token = $tokenizer->tokenize('012345678'); // "tok_9f3a..." — reveals nothing
$tokenizer->detokenize($token);             // "012345678"
$tokenizer->forget($token);                 // drop the mapping

An in-memory ArrayVault ships for tests and simple cases. For production you implement the Vault contract against wherever you actually want the mapping to live — a separate secured datastore, an external tokenization service, whatever. The token itself reveals nothing, so your primary database only ever sees tok_....

The shape of the thing

The whole architecture is contracts plus small, swappable implementations:

Contracts/        Encrypter, ContextualEncrypter, MaskStrategy, Redactor, Vault
Masking/          Tail, Full, Email, Hash, CreditCard, Ip, Name, Nric, Regex
Encryption/       OpenSslEncrypter, KeyRing, HmacBlindIndex
Detection/        PiiScrubber
Tokenization/     Tokenizer, ArrayVault
Attributes/       #[Pii]
Exceptions/       PiiException → EncryptionException, DecryptionException
ArrayRedactor · ObjectRedactor · PiiManager

Consumers depend on the contracts, not the concretions — so every piece is swappable. Don't like the OpenSSL implementation? Implement Encrypter yourself and the rest of the library doesn't notice. Want a masking shape that isn't in the box? One method, mask(string): string, and you're done.

Failures throw typed exceptions: EncryptionException and DecryptionException, both extending PiiException, which extends RuntimeException — so existing catch blocks keep working while you can catch precisely when you want to.

A few things that matter when you're deciding whether to trust a security-adjacent dependency: it has zero runtime dependencies (just PHP 8.4, ext-openssl, ext-mbstring — Pest, PHPStan and Pint are dev-only), every masking strategy is multibyte-safe (mb_* throughout, so non-ASCII names slice by character not byte), PHPStan runs at level max, and the suite includes mutation testing plus a test that fails the build if a stray dd(), dump(), or ray() is left in src/. For a library whose whole job is handling sensitive data, that level of paranoia about its own correctness is the point.

When you'd reach for this

It earns its place when:

you handle PII in places the framework doesn't boot — workers, CLI tools, microservices, webhook handlers;
you need audit/change logs that don't leak personal data;
you want encryption at rest with a sane rotation story and searchable lookups via blind index;
you're under PDPA / GDPR / SOC 2 and need to show your PII handling is deliberate, not incidental.

If all you ever do is Crypt::encrypt() inside a single Laravel monolith and nothing else, the framework's built-in is fine. The value of going pure-PHP shows up the moment your data crosses a boundary the framework doesn't own — and in any non-trivial system, it always does.

It's MIT-licensed and on Packagist:

composer require cleaniquecoders/pii-protection

Repo and full docs (architecture, usage, API reference): github.com/cleaniquecoders/pii-protection.

If you've got a PII shape that isn't covered yet, the strategy contract is one method — PRs welcome.