How We Built a Multilingual Cyber Threat Intelligence Platform From Turkey

go dev.to

By Baris Kececi, Founder & CTO at GNSAC Information Technologies

In 2021, I founded GNSAC with a simple observation: most organisations learn about data breaches affecting them from the news — not from their own security tools.

Five years later, GNSAC Vigil monitors over 150 dark web forums, 500+ Telegram channels, 30+ paste sites, 40+ ransomware blogs, and 100+ OSINT feeds in real time. Its continuously updated exposure intelligence corpus includes over 8.2 billion processed, normalised, and deduplicated credential-related intelligence signals — used strictly for authorised defensive monitoring, correlation, and alerting. We process, classify, and correlate threats across four languages — Turkish, English, Russian, and Arabic — using a proprietary multilingual NLP engine.

To avoid ambiguity, this corpus refers to processed exposure intelligence signals collected, normalised, deduplicated, and indexed for authorised defensive monitoring. It does not refer to customer-owned identity data or a commercial personal identity database.

This is the story of how we built it, what we learned, and why we believe threat intelligence needs to be more accessible.

The Problem We Set Out to Solve
When I was leading security operations at enterprise clients — port infrastructure, hospitals, manufacturing plants — I kept running into the same gap. Organisations would invest heavily in perimeter defences, SIEM platforms, and endpoint protection. But when their credentials appeared on a dark web forum or a Telegram channel, they had no visibility.

The existing CTI platforms were either too expensive for mid-market organisations, too generic in their coverage, or completely blind to non-English sources. Turkish-language dark web forums, Russian-speaking marketplaces, and Arabic paste sites were largely unmonitored by Western CTI vendors.

We decided to build our own.

Architecture Decisions That Shaped Everything

  1. Go as the Core Language We chose Go for the entire backend — not Python, not Java. The reason was simple: we needed a single compiled binary that could be deployed in restricted-network and on-premises environments without dependency headaches. Our phishing simulation platform (GNSAC Phishing, 50,000+ lines of Go) had already validated this approach.

Go’s concurrency model also gave us the throughput we needed. Our crawler engines process large volumes of external threat intelligence across dark web forums, paste sites, Telegram channels, code repositories, forums, and OSINT feeds using parallel processing.

  1. Multi-Source Crawling Pipeline We built a 7-stage pipeline:

Sources → Gate (quality filter) → Parser (credential indicator extraction) → Deduplication → Database → Indexing → API & Delivery

Each stage is independently scalable. The Gate stage filters noise before it reaches the parser — rejecting deceptive, duplicate, irrelevant, and low-confidence content. The Deduplication engine ensures the same credential pair appearing across twelve different breach compilations is counted once, not twelve times.

We also built scalable source discovery and validation workflows that continuously expand monitored intelligence coverage while maintaining source quality and reducing duplicate or low-confidence findings. The goal was not simply to add more sources, but to build a defensible intelligence pipeline that prioritises relevance, reliability, and customer-specific risk.

  1. Multilingual NLP — The Hardest Part Building NLP for English threat intelligence is well-documented. Building it for Turkish, Russian, and Arabic simultaneously is a different challenge entirely.

Turkish is an agglutinative language — a single word can carry meaning that takes an entire English sentence to express. Russian uses Cyrillic with domain-specific slang that changes monthly. Arabic has right-to-left text mixed with Latin-character technical terms.

Our classification engine processes raw text from dark web posts, identifies the language, extracts entities (email addresses, domains, IP addresses, cryptocurrency wallets, credential pairs), classifies the threat type, and assigns a confidence score — all within milliseconds. In internal benchmarks across supported languages and source categories, this classification engine has reached 99.7% accuracy.

  1. The 132-Module CTI Scan Engine Beyond passive monitoring, we built an authorised domain-scoped assessment engine with 132 scanning modules organised into 12 phases. All assessment capabilities are used for authorised, domain-scoped defensive security evaluation. Customers can only assess assets and domains approved within their licence scope.

The modules cover:

DNS intelligence, delegation analysis, and CAA/DANE/TLSA verification
SSL/TLS certificate transparency and chain validation
Web application fingerprinting, vulnerability correlation, and remediation priority scoring
Cloud misconfiguration detection (S3 buckets, Azure Blob, Kubernetes dashboards, Docker registries)
Authentication, OAuth, SAML, and SSO misconfiguration analysis
Mobile OSINT — app store presence, API discovery, and deep link analysis
Dark web exposure correlation and IOC confidence scoring
Geopolitical risk indicators — sanctions proximity, data residency, and sanctioned ASN analysis
Compliance mapping (ISO 27001, NIS2, KVKK, PCI-DSS)
Business impact and risk amplification scoring

A full scan takes 4-5 minutes and produces a comprehensive threat intelligence report covering 22 sections — from infrastructure vulnerabilities to geopolitical risk indicators — delivered as a server-generated PDF with executive summary, risk matrices, and prioritised remediation steps.

A Million Reasons to Block
Our Phishing Intelligence module aggregates over 1 million malicious URLs from seven independent sources — PhishTank, URLhaus, OpenPhish, PhishStats, Phishing Army, Phishing Database, and ThreatFox — with approximately 2,400 new URLs detected daily.

But aggregation is the easy part. The value is in the export: security teams can download filtered blocklists in CSV or TXT format and feed them directly into web proxies, DNS sinkholes, or email gateways. No API integration required. No vendor lock-in.

We made this decision deliberately. Phishing protection should not require a six-month integration project. A security analyst should be able to download a blocklist and deploy it in minutes.

Brand Protection at Scale
Credential monitoring tells you what has already leaked. Brand protection tells you what is being prepared against you.

Vigil runs six parallel brand scans for every monitored domain:

Typosquatting detection — algorithmically generated lookalike domains registered to impersonate your brand
Dark web mention tracking — your brand name appearing in marketplace listings or forum discussions
Certificate Transparency monitoring — SSL certificates issued for domains resembling yours
Social media impersonation — fake accounts using your brand identity
Fake mobile applications — fraudulent apps on third-party stores using your brand name and logos
Credential exposure correlation — leaked credentials tied to your corporate domain

When a threat is confirmed, the platform supports structured takedown workflows by preparing the necessary evidence, registrar or hosting-provider details, and abuse report templates for security teams to review and submit. Threats are classified, risk-scored, and tracked through their entire lifecycle: from detection to takedown confirmation.

Protecting People, Not Just Infrastructure
One of the less discussed aspects of threat intelligence is executive exposure. C-level executives, board members, and public-facing leadership are disproportionately targeted — yet most CTI platforms treat them as just another email address.

Vigil’s VIP Protection module monitors individual executives across four threat surfaces simultaneously:

Email exposure — corporate and personal email addresses appearing in breach databases
Social media impersonation — fake LinkedIn, Twitter, and Instagram profiles using executive identities
Dark web mentions — executive names appearing in targeting discussions or doxing threads
Deepfake risk indicators — public media assets that could be used to generate synthetic impersonation content

This is not theoretical. In one observed case, attackers used a forged display name and externally hosted email account to imitate executive communication. Weak email authentication enforcement increased the risk of delivery, but the behaviour was detected and escalated before it could progress further.

From Detection to Response in Seconds
Detection without response is just expensive logging.

Vigil includes an automated playbook engine with six pre-built response workflows:

Credential Leak Response — SOC notification, ticket creation, password reset workflow initiation, threat feed update, and incident documentation — triggered automatically or semi-automatically depending on customer policy.
Phishing Campaign Detection — malicious URL blocklist preparation, employee warning distribution, and forensic evidence collection from detected phishing pages.
VIP Exposure Mitigation — priority executive or security-team notification, takedown request preparation, enhanced monitoring frequency, and incident documentation.
Brand Impersonation Response — automated WHOIS and DNS analysis, abuse report preparation, and customer advisory generation.
Third-Party Vendor Breach — vendor risk assessment, API key rotation, network segment isolation, and legal/compliance notification.
Manual Threat Investigation — structured IOC gathering, cross-referencing against internal and external intelligence, and threat assessment workflow.

When a high-confidence credential exposure is detected outside working hours, the playbook can immediately trigger predefined response actions such as SOC notification, ticket creation, password reset workflow initiation, and incident documentation. Analysts can then review the completed workflow with full evidence context.

We designed this after observing that the time between detection and first response action was often measured in hours. With predefined playbooks, the first response workflow can be initiated in under 30 seconds for high-confidence alerts.

Scale and What It Teaches You
Processing billions of credential-related exposure signals teaches you things that no amount of planning can anticipate:

Storage architecture matters as much as application logic. At this scale, separating raw intelligence storage, indexes, and frequently queried enrichment data is essential for keeping investigation workflows fast and usable. We learned that I/O locality and tiered storage are not optimisation luxuries — they are operational requirements.
Deduplication is a product feature, not a technical detail. The same credential-related exposure can appear across dozens of breach compilations and source reposts. Showing a client 50,000 “new” alerts that are actually 3,000 unique exposures destroys trust. We built a 186-point automated assessment workflow specifically to deduplicate, enrich, and prioritise every alert.
At scale, the real challenge is not collecting more data but reducing noise. Our AI-assisted analysis engine has processed a large volume of internal threat assessments, helping reduce false positives by 60% compared with rule-based baselines. More importantly, the platform correlates indicators across multiple sources — credential leaks, paste sites, Telegram channels, ransomware blogs, and OSINT feeds — so analysts can identify patterns that would be invisible in single-source monitoring.

Deployment Reality: Why On-Premises Still Matters
One of our earliest design decisions was to support three deployment models: SaaS, private cloud, and on-premises / restricted-network deployment.

Many organisations we work with — healthcare groups handling patient data, port operators managing critical infrastructure, manufacturing companies with OT/ICS environments — cannot send their data to an external cloud. Regulatory requirements (KVKK in Turkey, GDPR in Europe, sector-specific regulations in Qatar) demand that sensitive threat intelligence stays within the organisation’s own infrastructure.

This constraint shaped our entire architecture. Core platform components must be capable of operating in restricted-network environments, with controlled and scheduled intelligence synchronisation where required. Updates are delivered as signed packages. The system operates in environments where external communication is limited to controlled, encrypted threat feed synchronisation with our central intelligence hub.

We built a controlled licensing and deployment governance system to manage customer activations, tiered licensing, usage visibility, and health monitoring across deployment models.

Supply Chain Risk: Your Security Is Only as Strong as Your Weakest Vendor
The average enterprise shares sensitive data with dozens of third-party vendors. When one of those vendors suffers a breach, the blast radius extends to every organisation in their supply chain.

Vigil’s Supply Chain module maps third-party vendors, tracks their compliance status across multiple frameworks, monitors for breach exposure in their domains, and calculates a composite risk score based on data access level, compliance posture, and observed threat indicators.

When a vendor breach is detected, affected clients can be notified with contextual risk information, including what data access that vendor had and recommended isolation steps. This is not a generic alert. It is a contextualised, actionable notification that tells a security team exactly what to do next.

From Product to Impact
GNSAC Vigil is now deployed across multiple sectors — critical port infrastructure, healthcare groups, manufacturing companies, and financial institutions across Turkey, Germany, and the Middle East. Our broader company revenue has grown from TRY 366,625 in our first year (2021) to TRY 2,705,656 in 2025 — a 7.4x increase over five consecutive years.

But the metric I care about most is not revenue. It is the moment a client receives an alert that their employee credentials have appeared on a dark web forum — and they can act on it within minutes, not months.

We built GNSAC Vigil because we believed that threat intelligence should not be a luxury reserved for Fortune 500 companies with seven-figure security budgets. Organisations in Turkey, in the Middle East, in emerging markets — they face the same threats. They deserve the same visibility.

What’s Next
Predictive Threat Intelligence
Our early predictive models are designed to identify coordinated campaign patterns by correlating surges across breach databases, IOC feeds, and threat actor activity. These capabilities are still evolving, but they already help analysts focus on emerging risks before they become fully developed incidents.

We also track active threat actor activity patterns and correlate them across monitored sources to identify emerging campaign behaviour.

Deeper Integrations
We are continuing to expand our coverage, improve our NLP models, and build deeper integrations with SOC platforms (Splunk, QRadar, Microsoft Sentinel). We are also exploring how large language models can enhance threat classification, automate analyst workflows, and generate human-readable threat briefings from raw intelligence data.

Regional Intelligence Expansion
Vigil currently provides region-specific intelligence modules for Turkey, the United Kingdom, the United States, and Germany, with localised compliance indicators, regional threat actor profiles, and jurisdiction-specific source coverage. We are expanding this regional model to improve visibility for organisations operating across Europe, the Middle East, and the Gulf region.

If you work in cybersecurity, threat intelligence, or security operations — I would love to hear how your organisation approaches dark web monitoring. What gaps do you see in the current landscape?

Baris Kececi is the Founder and CTO of GNSAC Information Technologies (gnsac.com.tr), a cybersecurity product company based in Turkey developing proprietary threat intelligence and security awareness platforms.

Source: dev.to

arrow_back Back to Tutorials