How do you choose the right technologies when scaling?
The latest trendy tech, or what you’re already familiar with?
Early on at BforeAI, like many startups, we prioritized speed, flexibility, and quick iteration. When you’re in discovery mode — validating ideas, exploring customer needs, and adjusting your roadmap every few weeks — the tech stack needs to get out of the way. We started with a single relational database that acted as a central source of truth for everything: telemetry, metadata, customer logic, internal configurations, all bundled together in a flexible but increasingly overloaded schema.
This decision made sense back then. It helped us move fast, pivot quickly, and onboard early adopters without getting bogged down by overly rigid data structures. But as the product matured and our customer base expanded, so did our operational complexity, and more critically, our data volumes.
Today, we’re a data-first company
Our PreCrime platform ingests real-time signals from dozens of sources: DNS traffic, brand impersonation attempts, malware delivery infrastructures, and countless other threat indicators. The scale is intense: an always-growing number of events every minute, each with its own format, structure, and lifecycle. It quickly became clear that our original approach — a single relational system trying to serve every purpose — was not just under strain, but also limiting our ability to scale, innovate, and operate efficiently.
This is why we’re actively transitioning toward a Data Mesh architecture, a foundational change in how we organize our data systems and how our teams work with data.
Instead of treating data as a monolithic backend concern, we’re evolving to treat it as a product owned by specific teams, designed with clear contracts, and optimized for usability across the organization. Each team that owns a product or function also owns the data it generates and consumes. This includes defining schemas, managing quality, setting access controls, and publishing documentation for downstream consumers. Ownership is distributed, not centralized.
This also means we are no longer relying on a “one-size-fits-all” database. We’re implementing a polyglot persistence model, where we choose the storage technology based on the characteristics of each dataset:
- Time-series databases power our high-resolution telemetry ingestion pipelines.
- Graph databases help us represent and analyze relationships between malicious infrastructure.
- Document stores support enrichment metadata and dynamic configurations.
- Columnar stores are used for fast analytical queries and reporting pipelines.
We are building domain-oriented data pipelines, with decoupled ownership and a common governance layer to ensure consistency across the mesh. This shift lets us scale without bottlenecks — and more importantly, it empowers our teams to move faster, ship independently, and innovate on their own terms.
Best practices to balance innovation with long-term scalability
The shift isn’t just technical, it’s also cultural. We’re establishing best practices and principles for data product thinking: treating data with the same care as any customer-facing feature. We’re also building shared tooling to reduce friction: schema validation tools, lineage tracking, cataloging services, access auditing, and observability layers that span across technologies and domains.
Some of the best practices we’ve learned (and are still refining) during this journey include:
- Embrace evolving needs: Startups need flexibility early on. Don’t over-engineer too soon — but do build with an eye toward modularity and change.
- Align architecture with team topology: Organize teams around product domains, and let the data architecture follow. Avoid creating central bottlenecks that slow everyone down.
- Standardize interfaces, not implementations: Different data systems are fine, as long as how you expose and consume data is predictable, well-documented, and monitored.
- Invest early in data observability: As data products multiply, visibility becomes critical. Build in logging, monitoring, and quality checks as first-class citizens.
- Govern with a light touch: Set up shared principles and frameworks, but avoid heavy central control. The value of Data Mesh lies in autonomy. Don’t kill it with bureaucracy.
- Make cost a first-order consideration: Scaling data infrastructure is not just an engineering challenge, it’s a financial one. Optimize for cost, and keep usage aligned with business value.
Ultimately, our journey from a monolithic relational database to a distributed, domain-driven architecture is about enabling long-term agility at scale. We’re not chasing shiny objects; we’re building a stack that lets us adapt to change, empower our teams, and continue to respond quickly to an evolving threat landscape.
As CTO, I believe the tech stack isn’t just a set of tools — it’s a strategic lever. It shapes how fast we can move, how resilient we are under pressure, and how easily we can grow. And sometimes, the most important decision isn’t picking a new technology — it’s knowing when to let go of an old one.
We’re still learning. We’re still evolving. But we’re doing it with intention, and a clear view of where we want to go.