# Proposal: Data Model v1 (Consolidated Schema) Status: approved Authors: Architecture Team, Data Team Owners: Architecture Lead, Data Lead, Compliance Lead Created: 2025-11-17 Scope: spec Related: openspec/specs/data-model.md Summary - Consolidate all entity schemas from approved feature specs into a unified data model with field-level data classification, relationships, and migration strategy. Motivation - Ensure consistent data modeling across all features before implementation begins. - Establish clear PHI/PII boundaries and retention policies at the schema level. - Enable efficient backend development with well-defined entities and relationships. Goals / Non-Goals - Goals: consolidated entity schemas, field-level data classes (Public/PII/PHI), relationships/foreign keys, indexing strategy, retention/soft-delete rules, migration versioning. - Non-Goals: vendor-specific schema syntax (use portable DDL concepts); performance tuning (covered in implementation). Requirements Functional - All entities from approved specs: User, Profile, ForumCategory, ForumThread, ForumPost, ForumReaction, ForumReport, BlogPost, PodcastEpisode, TributeEntry, Resource, MerchProduct, etc. - Relationships clearly defined with foreign keys and cascade rules. - Indexes for common query patterns (user lookups, thread pagination, search, etc.). Privacy & Compliance - Every field tagged with data class: Public, PII, or PHI. - PHI fields isolated where possible; encryption requirements noted. - Retention policies per entity (e.g., soft-delete window, hard-delete rules). - DSR support: export and delete operations mapped to entities. Data Model Core Entities - User: id, email (PII), created_at, updated_at, deleted_at - Profile: id, user_id (FK), display_name, pseudonym, pronouns, avatar_url, bio, health_journey (PHI, private by default), consent_flags, created_at, updated_at - ForumCategory: id, name, description, order, created_at - ForumThread: id, category_id (FK), author_id (FK User), title, pinned, locked, created_at, updated_at - ForumPost: id, thread_id (FK), author_id (FK User), parent_post_id (FK ForumPost, nullable), content (may contain PHI), deleted_at, created_at, updated_at - ForumReaction: id, post_id (FK), user_id (FK), emoji_code, created_at - ForumReport: id, post_id (FK), reporter_id (FK User), reason, status, moderator_notes, resolved_at, created_at - BlogPost: id, author_id (FK User), title, slug, content, published_at, created_at, updated_at - PodcastEpisode: id, title, description, audio_url, duration, published_at, created_at - TributeEntry: id, author_id (FK User), subject_name, memorial_text (may contain PHI), published, created_at, updated_at - Resource: id, title, slug, content, access_tier (public/members), tags, created_at, updated_at - MerchProduct: id, name, description, price, stock_count, created_at, updated_at - Order: id, user_id (FK), total, status, shipping_address (PII), created_at, updated_at - Consent: id, user_id (FK), consent_type, granted, granted_at, revoked_at Relationships - User → Profile (1:1, cascade delete) - User → ForumPost (1:N, soft-delete user → anonymize posts) - User → ForumThread (1:N) - ForumCategory → ForumThread (1:N) - ForumThread → ForumPost (1:N, cascade delete) - ForumPost → ForumReaction (1:N, cascade delete) - ForumPost → ForumReport (1:N) - User → BlogPost (1:N) - User → TributeEntry (1:N) - User → Order (1:N) Data Classification Summary - Public: ForumCategory, PodcastEpisode, Resource (public tier), MerchProduct, BlogPost (published) - PII: User.email, Profile.display_name, Order.shipping_address, Profile.avatar_url - PHI: Profile.health_journey, ForumPost.content (context-dependent), TributeEntry.memorial_text (context-dependent) Indexing Strategy - User: email (unique), created_at - Profile: user_id (unique FK) - ForumThread: category_id, author_id, created_at, updated_at - ForumPost: thread_id, author_id, created_at - BlogPost: slug (unique), author_id, published_at - Resource: slug (unique), access_tier, tags (GIN/array index) - Order: user_id, created_at Retention & Soft-Delete - User: soft-delete (deleted_at); 90-day window before hard-delete; anonymize posts. - ForumPost: soft-delete (deleted_at); 90-day window; on user delete → replace author with "[deleted]". - BlogPost, TributeEntry: indefinite retention unless user requests DSR delete. - Order: 7-year retention for compliance (tax/commerce), then hard-delete. Migration Strategy - Versioned migrations (e.g., Alembic, Flyway, or similar). - Idempotent scripts for rollback safety. - Seed data for initial categories, default consents, sample resources. Security & Threat Model - Encryption at rest: PII/PHI fields encrypted at database level or app level. - Access controls: RBAC enforced at API layer; row-level security (RLS) for multi-tenancy if needed. - Audit logging: all mutations on PHI/PII entities logged (excluding PHI content itself). Observability & Telemetry - Schema change tracking in migration logs. - Query performance metrics for critical paths (thread list, user profile fetch). Test Plan - Unit tests for schema constraints (foreign keys, unique indexes). - Integration tests for cascade deletes and soft-delete behavior. - DSR export/delete tests: verify all user data is captured or purged. - Retention tests: simulate time-based purges. Migration / Rollout Plan - Deploy schema migrations to staging; run dry-run seeds. - Validate with read-only queries before enabling writes. - Rollback plan: revert migration, restore from backup if needed. Risks & Mitigations - Schema drift across features → enforce single source of truth in data-model.md. - PHI leakage → code reviews and automated scanning for PHI in logs. Alternatives Considered - NoSQL vs relational: choosing relational (Postgres) for strong consistency, relationships, and mature tooling. - Schema-per-feature vs unified: choosing unified for easier joins and data integrity. Work Breakdown 1. Document all entity schemas in `data-model.md` 2. Generate migration scripts from schema 3. Seed test data for development 4. Validate DSR export/delete coverage 5. Review with compliance and security teams Acceptance Criteria - All approved feature entities documented in `data-model.md` with field-level data classes. - Relationships and indexes defined. - Retention and soft-delete rules specified. - DSR export/delete coverage verified. - Sign-off from compliance and security teams. Open Questions - Specific database vendor (Postgres assumed, but could be vendor-specific features)? - Multi-region replication strategy (future proposal)? - Real-time sync requirements for forum (WebSocket state management)? Slash Commands - `/review areas=backend,compliance,security,data` - `/apply spec=openspec/specs/data-model.md` - `/archive link=`