Distributed Authentication Microservice with Go, gRPC, PostgreSQL (SQLc), and Valkey

go dev.to

Built a distributed authentication system for my portfolio.

Here is an in-depth breakdown of the architecture, features, database configuration, and performance maps.

Technology Stack

  • Golang & gRPC: Enforces strict API via Protocol Buffers (buf.build & protovalidate).
  • PostgreSQL (SQLc): Database queries are written in raw SQL and compiled into safe Go code by using SQLc.
  • Valkey: An opensource redis alternative for caching.
  • Asynq: Background worker demon that send email using SMTP2Go API.
  • OpenTelemetry: Used to for tracking authentication systems.
  • Docker & GitHub Actions: Used for testing using testcontainers for integration, e2e, and benchmark

Core System Features

  • Registration: For registration, users must use a domain-specific email, and that domain must be verified before the server starts.
  • Login: Through this endpoint, users can log in or request account or email verification. The system checks if it is a new, unverified account; if the account is verified but the email is not, it dispatches a verification code based on whether 2FA is enabled or not. Once all state validations pass, the system evaluates the user's 2FA settings: if 2FA is enabled, it issues a temporary 2FA session; otherwise, it issues the final authentication Token.
  • Forgot Password: This endpoint functions similarly to the login flow, but it bypasses checks for whether 2FA is enabled. If all checks pass, it will issue a forgot password session.
  • Account Verification: This endpoint validates the active session based on the originating action (Registration, Login, or Forgot Password). If the request originates from Registration, or a Login flow that does not have 2FA enabled, the system immediately issues the final authentication Token. Otherwise, it transitions the user into a 2FA verification session. For Forgot Password flows, it issues a dedicated forgot password session once the account or email verification check passes.
  • Resend Account Verification: This endpoint checks the user's active session state, generates a new account verification code, and subsequently issues an updated session token.
  • Verification: This endpoint verifies the forgot password email code and session. Once verified, the system issues a new session token specifically authorized for changing the password.
  • Reset Password: Through this endpoint, users can reset their password, but they cannot reuse any of their previous 5 passwords.

Note: All session valid for only 5 minutes. The Access Token is valid for 15 minutes, and the Refresh Token is valid for 30 days. Every time a new Access Token is requested, the Refresh Token is also rotated. The Access Token's JTI (JWT ID) is stored in Valkey for ultra-fast lookup to immediately block revoked or blacklisted users from logging in or making requests.

The system uses the official valkey-go library and its om (Object Mapping) tool to save and load data easily without breaking any Go types.

Flowcharts

Login

Forget Password

Application-Layer Rate Limiter Matrix

Note: IP-based rate limiting is offloaded to Envoy in the production environment.

The application layer enforces strict rate limiting across all endpoints, with a strategic exception for the registration route. The system splits constraints into distinct structural layers:

  • Login & Forget Password: Restricted exclusively by Layer 1, which tracks limit per email to prevent targeted brute-force attacks.
  • All Other Endpoints: Limits requests based on the user's active Session token and User ID.
  • Registration: There is no internal application-layer rate limits (relying entirely on upstream infra like Envoy).

Note: Layer 1 allows 5 requests per 5 minutes, while Layer 2 allows 5 requests per 30 minutes. If a user exceeds Layer 1 and tries to bypass it by starting a new session, Layer 2 will catch and block them.

To track these limits, the system uses the official valkey-go/valkeylimiter tool.

Testing Architecture

This project relies on a completely containerized testing environment to ensure production readiness without using mocks:

  • Real Infrastructure (testcontainers): Integration, Benchmark and E2E tests run against real, live instances of PostgreSQL, Valkey, and OpenTelemetry dynamically inside Docker containers.
  • In-Memory Networking (bufconn): For all non-unit tests (Integration, E2E, and Benchmarks), the system handles gRPC traffic completely in-memory using bufconn. This removes local network overhead, prevents port collision errors on machine, and provides highly accurate performance metrics.

Code Coverage: 84.6% (Race Detector Active)

Note: This metric reflects the core application logic. It filters out main.go, generated protobuf files, database repository files, and test helpers to show real test health.

Note: Terminal output

Tests run against real, live infrastructure (PostgreSQL, Valkey, and OpenTelemetry) using testcontainers. The test suite uses zero mocks.

Setup, Execution & Testing

# 1. Clone the core framework engine
git clone https://github.com/neupaneanish/authentication.git
cd authentication

# 2. Initialize Git submodules
# (Note: if HTTP use git config --global url."https://github.com/".insteadOf "git@github.com:")
git submodule update --init

# 3. Generate Go code from protobuf definitions (Requires Buf CLI)
buf generate

# 4. Generate Go code from SQL queries using SQLc (Requires SQLc CLI)
sqlc generate

# 5. Execute the tests
go test -v -race -tags=unit,integration,benchmark,e2e -coverprofile=coverage.out ./...

# 6. Filter out external boundaries, generated code, and tooling 
grep -v -E "cmd/|/internal/protobuf/|/internal/repository/|/tests/|/protobuf/|/database/" coverage.out > coverage_clean.out

# 7. Export to interactive HTML for local branch analysis
go tool cover -html=coverage_clean.out -o coverage_clean.html 

# 8. Output statement breakdown to CLI
go tool cover -func=coverage_clean.out 

# 9. Launch the asynchronous background worker daemon
go run cmd/worker/main.go

# 10. Launch the local microservice API server 
# (Note: Requires an active OpenTelemetry collector instance, like SigNoz)
go run cmd/server/main.go
Enter fullscreen mode Exit fullscreen mode

Performance & Profiling

Benchmark Environment

  • OS: Ubuntu Linux (WSL2)
  • Architecture: amd64
  • CPU: Intel® Core™ i7-10750H @ 2.60GHz (12 Execution Threads)

High-Concurrency Benchmarks (Parallel)

The system uses Bcrypt (Default Cost of 10) to secure sensitive fields. To capture pure business logic and processing overhead, test data was seeded beforehand and b.ResetTimer() was utilized before executing requests across the in-memory bufconn layer.

Endpoints Size (B) Latency Memory (B/op) Heap (allocs/op) Cryptographic Passes
Register 166 ~7.45 ms 64068 908 1
Account verification N/A N/A N/A N/A 0
Resend Accoungt Verification N/A N/A N/A N/A 0
Login 132 ~7.67 ms 73483 618 1
Login Two Factor N/A N/A N/A N/A 0 (TOTP) / Max 10 (Recovery)
Forget Password N/A N/A N/A N/A 0
Verification N/A N/A N/A N/A 0
Reset Password 81 ~14.11 ms 62284 597 2 (Max 6)

Cryptographic Breakdown & Security Costs

To prevent timing attacks and keep authentication engine completely transparent, here is exactly what happens under the hood during these benchmarked workflows:

  • Register: 1 Bcrypt operation using GenerateFromPassword to hash raw password before storing in database.
  • Login: 1 Bcrypt operation baseline (utilizes CompareHashAndPassword to verify the incoming credentials against the database record).
  • Login Two Factor: Execution cost depends on the validation type:
    • TOTP: Uses 0 Bcrypt operations, relying strictly on fast, time-base SHA-1 HMAC.
    • Recovery Code: 1 Bcrypt operation baseline CompareHashAndPassword upto 10 operation depend upon Recovery codes length.
  • Reset Password: 2 Bcrypt operations baseline (1 CompareHashAndPassword to verify the active identity context + 1 GenerateFromPassword to securely hash the new replacement credentials). If a user has a fully populated password history, the endpoint dynamically invokes up to 4 additional historical comparisons to prevent credential reuse, scaling total passes to a maximum of 6.

CPU & Memory Profiling pprof

This execution chart was exported using go tool pprof during a standard benchmark run:

Register CPU

Register Memory

Performance Note: The CPU graph shows that a massive 95.58% of the system's processing power is spent entirely on the heavy math needed to securely hash user passwords. Additionally, the memory tracking graph confirms that the application runs cleanly under high parallel pressure with absolutely zero memory leaks.

Login CPU

Login Memory

Performance Note: The CPU profile confirms that over 56% of processing power goes straight into the heavy Bcrypt hashing to safely check user credentials. Because the surrounding gRPC server and router paths stay tiny and thin, the setup introduces virtually zero framework lag. Furthermore, memory metrics show clean, steady utilization under parallel load with absolutely zero memory leaks.

Reset Password CPU

Reset Password Memory

Performance Note: Even with the extra work of checking the password history (Base 1), a massive 69.32% of the CPU time is still spent on the core Bcrypt. The rest of the gRPC framework runs perfectly thin with almost zero resource overhead. Additionally, the memory allocation graph scales steadily and cleans up immediately after execution, confirming absolutely zero memory leaks.

Production Artifacts & Image Footprint

Using Google's Distroless base images removes all unnecessary programs, shells, and packages from the final container. This leaves behind only the compiled application binary, which significantly hardens security by removing things that attackers could exploit.

Because the containers are stripped of this extra bloat, the deployment footprint is extremely small. The worker image compresses down to 5.86 MB and the main service to 13.4 MB, allowing for near-instant network transfers and fast scaling in production environments.

Note for Engineering Managers & Teams

Don't just take my word for these profiling charts! This entire environment is fully containerized and reproducible. If you want to audit the performance metrics or verify the zero-leak memory footprint on your own machine, simply follow the setup steps above to clone the repository and run the pipeline yourself.

Source: dev.to

arrow_back Back to Tutorials