Data Intelligence · Genomic Taxonomy
Genomic taxonomy and content intelligence rebuild across 200M+ digital assets.
Assets reclassified
200M+
Analysis time reduction
60%
Metadata accuracy
+15%
Revenue impact
$215M+
Users on rebuilt system
70M+
01 · CONTEXT
Who. What scale. What was at stake.
A major audio subscription platform with 70M+ active users ran its recommendation, personalization, and ad targeting systems on a proprietary genomic taxonomy. The taxonomy was the product. It was also the constraint. The original schema was brittle, hard to extend, and incompatible with the ML pipeline direction the business needed to move toward.
02 · CONSTRAINT
The architectural problem.
The migration had to happen across 200M+ assets without breaking downstream systems generating revenue in production. None could be taken offline. The second constraint: the new taxonomy had to unlock capability the numeric schema had been blocking, not just be a cleaner version of the same structure.
03 · DECISION POINTS
Three decisions that shaped everything downstream.
Decision 01
Schema direction
Migrated from numeric genre identifiers to a classified, semantic, multi-dimensional tag structure. Each asset carries multiple dimensions of meaning rather than a single genre assignment. This unlocked cross-genre and mood-based recommendation logic the product had been unable to build.
Decision 02
Migration sequencing
Parallel schema operation during migration. Old taxonomy stayed live for downstream systems. New taxonomy built alongside it. Cutover happened system by system, not all at once. Reduced production risk at the cost of temporary infrastructure complexity.
Decision 03
Knowledge graph integration
Taxonomy became a queryable graph, not a flat tag store. Relationships between assets, artists, moods, tempo, and context became traversable. This was the structural change that enabled the downstream revenue impact.
04 · SYSTEM
What was built.
Rebuilt genomic tagging system. Migrated from numeric to classified, semantic, multi-dimensional taxonomy across 200M+ assets.
New enrichment pipelines. Faster ingestion, cleaner labels, downstream-ready outputs for ML systems.
Knowledge graph integration. Taxonomy became a queryable graph with traversable relationships.
Parallel schema operation during migration. No production downtime. Old taxonomy maintained while new taxonomy was validated.
Cross-functional alignment across content, data, engineering, and product maintained throughout.
05 · OUTCOMES
All metrics from production systems.
200M+ assets reclassified under the new taxonomy
60% reduction in per-asset analysis time, from 17 minutes to 8 minutes per track
15% metadata accuracy improvement across the full catalog
Foundation for $215M+ in annualized revenue impact across recommendation, personalization, and ad targeting
70M+ users served by systems running on the rebuilt taxonomy
06 · DISCUSS FURTHER
The architecture above is public. What follows is a conversation.
Taxonomy schema design, migration sequencing decisions, ML pipeline integration approach, knowledge graph structure, team scaling model, and the specific enrichment pipeline architecture are shared in a 30-minute conversation with executives evaluating similar domain intelligence work.