Built a distributed authentication system for my portfolio.
Here is an in-depth breakdown of the architecture, features, database configuration, and performance maps.
Technology Stack
-
Golang & gRPC: Enforces strict API via Protocol Buffers (
buf.build&protovalidate). -
PostgreSQL (SQLc): Database queries are written in raw SQL and compiled into safe Go code by using
SQLc. - Valkey: An opensource redis alternative for caching.
- Asynq: Background worker demon that send email using SMTP2Go API.
- OpenTelemetry: Used to for tracking authentication systems.
-
Docker & GitHub Actions: Used for testing using
testcontainersfor integration, e2e, and benchmark
Core System Features
- Registration: For registration, users must use a domain-specific email, and that domain must be verified before the server starts.
- Login: Through this endpoint, users can log in or request account or email verification. The system checks if it is a new, unverified account; if the account is verified but the email is not, it dispatches a verification code based on whether 2FA is enabled or not. Once all state validations pass, the system evaluates the user's 2FA settings: if 2FA is enabled, it issues a temporary 2FA session; otherwise, it issues the final authentication Token.
- Forgot Password: This endpoint functions similarly to the login flow, but it bypasses checks for whether 2FA is enabled. If all checks pass, it will issue a forgot password session.
- Account Verification: This endpoint validates the active session based on the originating action (Registration, Login, or Forgot Password). If the request originates from Registration, or a Login flow that does not have 2FA enabled, the system immediately issues the final authentication Token. Otherwise, it transitions the user into a 2FA verification session. For Forgot Password flows, it issues a dedicated forgot password session once the account or email verification check passes.
- Resend Account Verification: This endpoint checks the user's active session state, generates a new account verification code, and subsequently issues an updated session token.
- Verification: This endpoint verifies the forgot password email code and session. Once verified, the system issues a new session token specifically authorized for changing the password.
- Reset Password: Through this endpoint, users can reset their password, but they cannot reuse any of their previous 5 passwords.
Note: All session valid for only 5 minutes. The Access Token is valid for 15 minutes, and the Refresh Token is valid for 30 days. Every time a new Access Token is requested, the Refresh Token is also rotated. The Access Token's JTI (JWT ID) is stored in Valkey for ultra-fast lookup to immediately block revoked or blacklisted users from logging in or making requests.
The system uses the official valkey-go library and its om (Object Mapping) tool to save and load data easily without breaking any Go types.
Flowcharts
Login
Forget Password
Application-Layer Rate Limiter Matrix
Note: IP-based rate limiting is offloaded to Envoy in the production environment.
The application layer enforces strict rate limiting across all endpoints, with a strategic exception for the registration route. The system splits constraints into distinct structural layers:
- Login & Forget Password: Restricted exclusively by Layer 1, which tracks limit per email to prevent targeted brute-force attacks.
- All Other Endpoints: Limits requests based on the user's active Session token and User ID.
- Registration: There is no internal application-layer rate limits (relying entirely on upstream infra like Envoy).
Note: Layer 1 allows 5 requests per 5 minutes, while Layer 2 allows 5 requests per 30 minutes. If a user exceeds Layer 1 and tries to bypass it by starting a new session, Layer 2 will catch and block them.
To track these limits, the system uses the official valkey-go/valkeylimiter tool.
Testing Architecture
This project relies on a completely containerized testing environment to ensure production readiness without using mocks:
-
Real Infrastructure (
testcontainers): Integration, Benchmark and E2E tests run against real, live instances of PostgreSQL, Valkey, and OpenTelemetry dynamically inside Docker containers. - In-Memory Networking (bufconn): For all non-unit tests (Integration, E2E, and Benchmarks), the system handles gRPC traffic completely in-memory using bufconn. This removes local network overhead, prevents port collision errors on machine, and provides highly accurate performance metrics.
Code Coverage: 84.6% (Race Detector Active)
Note: This metric reflects the core application logic. It filters out main.go, generated protobuf files, database repository files, and test helpers to show real test health.
Note: Terminal output
Tests run against real, live infrastructure (PostgreSQL, Valkey, and OpenTelemetry) using testcontainers. The test suite uses zero mocks.
Setup, Execution & Testing
# 1. Clone the core framework engine
git clone https://github.com/neupaneanish/authentication.git
cd authentication
# 2. Initialize Git submodules
# (Note: if HTTP use git config --global url."https://github.com/".insteadOf "git@github.com:")
git submodule update --init
# 3. Generate Go code from protobuf definitions (Requires Buf CLI)
buf generate
# 4. Generate Go code from SQL queries using SQLc (Requires SQLc CLI)
sqlc generate
# 5. Execute the tests
go test -v -race -tags=unit,integration,benchmark,e2e -coverprofile=coverage.out ./...
# 6. Filter out external boundaries, generated code, and tooling
grep -v -E "cmd/|/internal/protobuf/|/internal/repository/|/tests/|/protobuf/|/database/" coverage.out > coverage_clean.out
# 7. Export to interactive HTML for local branch analysis
go tool cover -html=coverage_clean.out -o coverage_clean.html
# 8. Output statement breakdown to CLI
go tool cover -func=coverage_clean.out
# 9. Launch the asynchronous background worker daemon
go run cmd/worker/main.go
# 10. Launch the local microservice API server
# (Note: Requires an active OpenTelemetry collector instance, like SigNoz)
go run cmd/server/main.go
Performance & Profiling
Benchmark Environment
- OS: Ubuntu Linux (WSL2)
- Architecture:
amd64 - CPU: Intel® Core™ i7-10750H @ 2.60GHz (12 Execution Threads)
High-Concurrency Benchmarks (Parallel)
The system uses Bcrypt (Default Cost of 10) to secure sensitive fields. To capture pure business logic and processing overhead, test data was seeded beforehand and b.ResetTimer() was utilized before executing requests across the in-memory bufconn layer.
| Endpoints | Size (B) | Latency | Memory (B/op) | Heap (allocs/op) | Cryptographic Passes |
|---|---|---|---|---|---|
| Register | 166 | ~7.45 ms | 64068 | 908 | 1 |
| Account verification | N/A | N/A | N/A | N/A | 0 |
| Resend Accoungt Verification | N/A | N/A | N/A | N/A | 0 |
| Login | 132 | ~7.67 ms | 73483 | 618 | 1 |
| Login Two Factor | N/A | N/A | N/A | N/A | 0 (TOTP) / Max 10 (Recovery) |
| Forget Password | N/A | N/A | N/A | N/A | 0 |
| Verification | N/A | N/A | N/A | N/A | 0 |
| Reset Password | 81 | ~14.11 ms | 62284 | 597 | 2 (Max 6) |
Cryptographic Breakdown & Security Costs
To prevent timing attacks and keep authentication engine completely transparent, here is exactly what happens under the hood during these benchmarked workflows:
-
Register: 1 Bcrypt operation using
GenerateFromPasswordto hash raw password before storing in database. -
Login: 1 Bcrypt operation baseline (utilizes
CompareHashAndPasswordto verify the incoming credentials against the database record). -
Login Two Factor: Execution cost depends on the validation type:
- TOTP: Uses 0 Bcrypt operations, relying strictly on fast, time-base SHA-1 HMAC.
-
Recovery Code: 1 Bcrypt operation baseline
CompareHashAndPasswordupto 10 operation depend upon Recovery codes length.
-
Reset Password: 2 Bcrypt operations baseline (1
CompareHashAndPasswordto verify the active identity context + 1GenerateFromPasswordto securely hash the new replacement credentials). If a user has a fully populated password history, the endpoint dynamically invokes up to 4 additional historical comparisons to prevent credential reuse, scaling total passes to a maximum of 6.
CPU & Memory Profiling pprof
This execution chart was exported using go tool pprof during a standard benchmark run:
Register CPU
Register Memory
Performance Note: The CPU graph shows that a massive 95.58% of the system's processing power is spent entirely on the heavy math needed to securely hash user passwords. Additionally, the memory tracking graph confirms that the application runs cleanly under high parallel pressure with absolutely zero memory leaks.
Login CPU
Login Memory
Performance Note: The CPU profile confirms that over 56% of processing power goes straight into the heavy Bcrypt hashing to safely check user credentials. Because the surrounding gRPC server and router paths stay tiny and thin, the setup introduces virtually zero framework lag. Furthermore, memory metrics show clean, steady utilization under parallel load with absolutely zero memory leaks.
Reset Password CPU
Reset Password Memory
Performance Note: Even with the extra work of checking the password history (Base 1), a massive 69.32% of the CPU time is still spent on the core Bcrypt. The rest of the gRPC framework runs perfectly thin with almost zero resource overhead. Additionally, the memory allocation graph scales steadily and cleans up immediately after execution, confirming absolutely zero memory leaks.
Production Artifacts & Image Footprint
Using Google's Distroless base images removes all unnecessary programs, shells, and packages from the final container. This leaves behind only the compiled application binary, which significantly hardens security by removing things that attackers could exploit.
Because the containers are stripped of this extra bloat, the deployment footprint is extremely small. The worker image compresses down to 5.86 MB and the main service to 13.4 MB, allowing for near-instant network transfers and fast scaling in production environments.
Note for Engineering Managers & Teams
Don't just take my word for these profiling charts! This entire environment is fully containerized and reproducible. If you want to audit the performance metrics or verify the zero-leak memory footprint on your own machine, simply follow the setup steps above to clone the repository and run the pipeline yourself.