Building an audio search engine with Quarkus and pgvector

Hey everyone!

This is my first post and I wanted to start sharing this project with you all.

I've been experimenting with audio embeddings recently to see if I could build a self-hosted search tool for music.

The result is a prototype called (for now) Agnostic Intelligence Layer. It's a semantic audio search engine designed to run entirely offline without reliance on external cloud APIs.

The Stack & Architecture
I wanted something fast and efficient, so I decided to mix Java and Python:

Java Quarkus: Handles the core engine pipeline and container efficiency.
Python: Manages the actual AI heavy lifting using CLAP neural networks to extract audio features into 512-dimensional vectors.
PostgreSQL + pgvector: Stores the vectors and finds acoustically similar tracks using cosine similarity.
MinIO: Handles fast and easy local storage.

Everything starts up via a single docker-compose, so you don't have to waste time configuring external services.

Why I'm sharing this
It's still a prototype, but the core pipeline works fine, and I'm going to continue working on it. I wanted to share it early to get some eyes on the code and see if the architecture makes sense to other devs.

The repository has a quick start guide if you want to check out the code or test it locally:
👉 https://github.com/BothBasilisk/agnostic-audio-engine.git

Feel free to leave your thoughts, critiques, or any tips on how to improve the pipeline!