Back to Deep Dives
Technical Deep Dive
Madhusudhan Kakurala 12 min read

Deep Dive: Understanding Sim Studio's Database Architecture

A comprehensive exploration of the PostgreSQL database schema powering Sim Studio's AI workflow builder - from normalized workflow graphs to vector embeddings

Sim Studio is an open-source visual workflow builder that enables developers to design, deploy, and monitor AI agent workflows. Behind its intuitive drag-and-drop canvas lies a sophisticated PostgreSQL database schema that persists everything from workflow definitions to execution traces, user authentication, and semantic search embeddings.

In this deep dive, we’ll explore the database architecture that powers over 60,000 developers building AI workflows on Sim Studio.

Overview: A Schema Built for Scale

The Sim Studio database schema is implemented using Drizzle ORM and consists of 38 tables organized into eight functional domains:

  1. Workflow System - Normalized workflow graphs with blocks, edges, and subflows
  2. Authentication & Authorization - Multi-provider OAuth with Better Auth
  3. Organizations & Workspaces - Multi-tenant architecture
  4. Execution Logging - Comprehensive traces with cost tracking
  5. Knowledge Base - Document storage with pgvector embeddings
  6. Scheduling & Webhooks - Cron jobs and external triggers
  7. Rate Limiting - Per-user and per-organization counters
  8. Permissions - Fine-grained access control

The schema leverages PostgreSQL’s advanced features including:

  • JSONB columns for flexible, semi-structured data
  • pgvector extension for semantic search capabilities
  • HNSW indexes for high-performance similarity queries
  • Atomic SQL operations for race-condition-free rate limiting

Part 1: The Workflow Data Model

At the heart of Sim Studio is its workflow system. Workflows are visual DAGs (Directed Acyclic Graphs) where blocks represent operations and edges represent data flow between them.

The Dual-State Architecture

Sim Studio maintains workflow definitions in two complementary formats:

1. Denormalized State (JSONB) - The workflow.state column contains the complete workflow as edited in the UI:

// workflow.state structure
{
  blocks: {
    "block-uuid-1": { type: "agent", name: "GPT-4 Agent", ... },
    "block-uuid-2": { type: "api", name: "Fetch Data", ... }
  },
  edges: [
    { source: "block-uuid-1", target: "block-uuid-2", ... }
  ],
  loops: { ... },
  parallels: { ... }
}

This denormalized format is the source of truth for workflow execution - it provides instant access to the complete workflow topology without joins.

2. Normalized Tables - The same data is decomposed into relational tables for efficient querying:

Core Workflow Tables

workflow - The Primary Table

ColumnTypeDescription
iduuidPrimary key
userIduuidOwner reference
workspaceIduuidWorkspace container
folderIduuidOptional folder for organization
nametextWorkflow name
descriptiontextWorkflow description
colortextUI color code (hex)
statejsonbComplete workflow definition
isDeployedbooleanDeployment status
runCountintegerExecution counter
lastRunAttimestampLast execution time
variablesjsonWorkflow-level variables

The state column deserves special attention - it’s a JSONB blob containing the entire workflow graph as visualized on the canvas. This design choice prioritizes read performance during execution over storage normalization.

workflow_blocks - Normalized Block Storage

CREATE TABLE workflow_blocks (
  id            uuid PRIMARY KEY,
  workflow_id   uuid REFERENCES workflow(id) ON DELETE CASCADE,
  block_id      text NOT NULL,          -- Internal block identifier
  block_type    text NOT NULL,          -- 'agent', 'api', 'function', etc.
  block_name    text NOT NULL,          -- User-defined name
  position      jsonb,                   -- {x, y} canvas coordinates
  sub_blocks    jsonb,                   -- Configuration and inputs
  outputs       jsonb,                   -- Output schema definition
  enabled       boolean DEFAULT true,
  metadata      jsonb
);

Block types in Sim Studio include:

  • Trigger blocks: start, webhook, schedule, api, chat
  • Processing blocks: agent, api, function, guardrails
  • Logic blocks: condition, router, loop, parallel
  • Output blocks: response, evaluator, wait

workflow_edges - Connection Graph

CREATE TABLE workflow_edges (
  id            uuid PRIMARY KEY,
  workflow_id   uuid REFERENCES workflow(id) ON DELETE CASCADE,
  edge_id       text NOT NULL,
  source        text NOT NULL,          -- Source block ID
  target        text NOT NULL,          -- Target block ID
  source_handle text,                   -- Output handle identifier
  target_handle text                    -- Input handle identifier
);

Edges define the DAG structure that determines execution order. The execution engine uses these to:

  1. Build dependency graphs
  2. Determine parallel execution opportunities
  3. Route data between blocks

workflow_subflows - Loops and Parallels

CREATE TABLE workflow_subflows (
  id            uuid PRIMARY KEY,
  workflow_id   uuid REFERENCES workflow(id) ON DELETE CASCADE,
  subflow_id    text NOT NULL,
  subflow_type  text NOT NULL,          -- 'loop' or 'parallel'
  config        jsonb,                   -- Iteration configuration
  block_ids     text[]                   -- Contained block IDs
);

Subflows are container structures that group blocks for:

  • Loop execution: ForEach, While, Do-While patterns
  • Parallel execution: Concurrent branch processing

The execution engine expands these into sentinel nodes during DAG compilation, preserving the acyclic property while enabling iteration.

Workflow Organization

workflow_folder - Hierarchical Organization

CREATE TABLE workflow_folder (
  id            uuid PRIMARY KEY,
  user_id       uuid REFERENCES "user"(id) ON DELETE CASCADE,
  workspace_id  uuid REFERENCES workspace(id) ON DELETE CASCADE,
  parent_id     uuid REFERENCES workflow_folder(id),  -- Self-reference
  name          text NOT NULL,
  color         text DEFAULT '#6B7280',
  is_expanded   boolean DEFAULT true,
  sort_order    integer DEFAULT 0
);

Folders support arbitrary nesting depth through self-referential parent_id, enabling project-style organization.


Part 2: Authentication & Multi-Tenancy

Sim Studio uses Better Auth for authentication, supporting multiple OAuth providers and enterprise SSO.

User & Session Management

user - Core Identity

CREATE TABLE "user" (
  id              text PRIMARY KEY,
  name            text NOT NULL,
  email           text NOT NULL UNIQUE,
  email_verified  boolean NOT NULL,
  image           text,
  stripe_customer_id  text,
  is_super_user   boolean DEFAULT false
);

session - Active Sessions

CREATE TABLE session (
  id              text PRIMARY KEY,
  user_id         text REFERENCES "user"(id) ON DELETE CASCADE,
  token           text NOT NULL UNIQUE,
  expires_at      timestamp NOT NULL,
  ip_address      text,
  user_agent      text,
  active_organization_id  text REFERENCES organization(id)
);

Sessions track the active organization context, enabling seamless workspace switching without re-authentication.

CREATE TABLE account (
  id                      text PRIMARY KEY,
  user_id                 text REFERENCES "user"(id) ON DELETE CASCADE,
  account_id              text NOT NULL,
  provider_id             text NOT NULL,           -- 'github', 'google', etc.
  access_token            text,                    -- Encrypted
  refresh_token           text,                    -- Encrypted
  access_token_expires_at timestamp,
  scope                   text,
  password                text,                    -- Hashed (email/password auth)
  UNIQUE(user_id, provider_id, account_id)
);

Multi-Tenant Architecture

organization - Team Containers

CREATE TABLE organization (
  id          uuid PRIMARY KEY,
  name        text NOT NULL,
  slug        text UNIQUE,              -- URL-safe identifier
  logo        text,
  metadata    text                      -- JSON string
);

member - Organization Membership

CREATE TABLE member (
  id              uuid PRIMARY KEY,
  organization_id uuid REFERENCES organization(id),
  user_id         uuid REFERENCES "user"(id),
  role            text NOT NULL          -- 'owner', 'admin', 'member'
);

workspace - Workflow Containers

Workspaces provide isolation within organizations:

CREATE TABLE workspace (
  id              uuid PRIMARY KEY,
  organization_id uuid REFERENCES organization(id),
  name            text NOT NULL,
  slug            text,
  -- Additional workspace settings
);

Subscription & Billing

CREATE TABLE subscription (
  id            uuid PRIMARY KEY,
  plan          text NOT NULL,           -- 'free', 'pro', 'team', 'enterprise'
  status        text NOT NULL,           -- 'active', 'canceled', 'past_due'
  seats         integer,
  reference_id  uuid NOT NULL,           -- User ID or Organization ID
  metadata      jsonb                    -- Stripe data and allowances
);

The reference_id polymorphically points to either a user (personal subscription) or organization (team subscription).


Part 3: Execution Logging & Observability

Every workflow execution generates comprehensive logs for debugging, cost tracking, and compliance.

workflow_execution_logs - The Execution Record

CREATE TABLE workflow_execution_logs (
  id                uuid PRIMARY KEY,
  workflow_id       uuid REFERENCES workflow(id),
  execution_id      text UNIQUE NOT NULL,
  state_snapshot_id uuid,                -- Workflow state at execution time
  level             text,                 -- 'info', 'error', 'success'
  trigger           text,                 -- 'manual', 'api', 'webhook', 'schedule', 'chat'
  started_at        timestamp,
  ended_at          timestamp,
  total_duration_ms integer,
  execution_data    jsonb,               -- Trace spans and block outputs
  cost              jsonb,               -- Detailed cost breakdown
  files             jsonb                -- Associated file uploads
);

The execution_data Structure

{
  "traceSpans": [
    {
      "blockId": "agent-1",
      "blockType": "agent",
      "startTime": "2025-01-15T10:00:00Z",
      "endTime": "2025-01-15T10:00:02Z",
      "input": { ... },
      "output": { ... },
      "tokens": { "prompt": 150, "completion": 200 }
    }
  ],
  "finalOutput": { ... },
  "enhanced": true
}

The cost Structure

{
  "total": 0.0045,
  "input": 0.0015,
  "output": 0.0030,
  "tokens": {
    "total": 350,
    "prompt": 150,
    "completion": 200
  },
  "models": {
    "gpt-4": { "calls": 1, "cost": 0.0045 }
  }
}

paused_executions - Human-in-the-Loop

CREATE TABLE paused_executions (
  id                uuid PRIMARY KEY,
  execution_id      text NOT NULL,
  workflow_id       uuid REFERENCES workflow(id),
  status            text,                 -- 'pending', 'fully_resumed'
  total_pause_count integer,
  resumed_count     integer,
  pause_data        jsonb                -- Pause states and user inputs
);

This table enables workflows to pause for human review/input and resume exactly where they left off.

user_stats - Aggregated Metrics

CREATE TABLE user_stats (
  user_id                     uuid PRIMARY KEY REFERENCES "user"(id),
  total_executions            integer DEFAULT 0,
  total_scheduled_executions  integer DEFAULT 0,
  total_webhook_executions    integer DEFAULT 0,
  total_api_executions        integer DEFAULT 0,
  last_active                 timestamp,
  metadata                    jsonb
);

Sim Studio’s knowledge base system enables RAG (Retrieval-Augmented Generation) workflows through semantic search.

knowledge_base - Container Configuration

CREATE TABLE knowledge_base (
  id                   uuid PRIMARY KEY,
  user_id              uuid REFERENCES "user"(id),
  workspace_id         uuid REFERENCES workspace(id),
  name                 text NOT NULL,
  description          text,
  token_count          integer DEFAULT 0,
  embedding_model      text,              -- 'text-embedding-3-small'
  embedding_dimension  integer,           -- 1536
  chunking_config      jsonb,
  deleted_at           timestamp          -- Soft delete
);

document - File Records

CREATE TABLE document (
  id                      uuid PRIMARY KEY,
  knowledge_base_id       uuid REFERENCES knowledge_base(id),
  filename                text NOT NULL,
  file_url                text,            -- S3/Blob storage URL
  file_size               integer,
  mime_type               text,
  chunk_count             integer,
  token_count             integer,
  processing_status       text,            -- 'pending', 'processing', 'completed', 'failed'
  processing_error        text,
  enabled                 boolean DEFAULT true,
  tag1 through tag7       text,            -- Custom metadata tags
  deleted_at              timestamp
);

The seven tag columns enable flexible metadata categorization without schema changes.

embedding - Vector Storage

CREATE TABLE embedding (
  id                uuid PRIMARY KEY,
  document_id       uuid REFERENCES document(id),
  knowledge_base_id uuid REFERENCES knowledge_base(id),
  chunk_index       integer,
  content           text NOT NULL,
  embedding         vector(1536) NOT NULL,  -- pgvector type
  token_count       integer,
  tag1 through tag7 text                     -- Inherited from document
);

-- HNSW index for fast similarity search
CREATE INDEX embedding_hnsw_idx ON embedding
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The HNSW index parameters are tuned for 1536-dimensional embeddings (OpenAI’s default), providing sub-millisecond similarity queries.


Part 5: Scheduling & Webhooks

workflow_schedule - Cron Jobs

CREATE TABLE workflow_schedule (
  id              uuid PRIMARY KEY,
  workflow_id     uuid REFERENCES workflow(id),
  block_id        uuid,                  -- Schedule trigger block
  cron_expression text,
  status          text,                  -- 'active', 'disabled', 'paused'
  next_run_at     timestamp,
  last_ran_at     timestamp,
  last_queued_at  timestamp,
  last_failed_at  timestamp,
  failed_count    integer DEFAULT 0
);

-- Index for due schedule queries
CREATE INDEX schedule_next_run_idx ON workflow_schedule(next_run_at)
  WHERE status = 'active';

webhook - External Triggers

CREATE TABLE webhook (
  id              uuid PRIMARY KEY,
  workflow_id     uuid REFERENCES workflow(id),
  path            text UNIQUE,           -- Webhook URL path
  is_active       boolean DEFAULT true,
  provider        text,                  -- 'slack', 'github', 'twilio', 'generic'
  provider_config jsonb,                 -- Provider-specific auth/routing
  filters         jsonb                  -- Event filtering rules
);

Part 6: Rate Limiting

user_rate_limits - Atomic Counters

CREATE TABLE user_rate_limits (
  reference_id          text PRIMARY KEY,  -- User ID or Org ID
  sync_api_requests     integer DEFAULT 0,
  async_api_requests    integer DEFAULT 0,
  api_endpoint_requests integer DEFAULT 0,
  window_start          timestamp,
  last_request_at       timestamp,
  is_rate_limited       boolean DEFAULT false,
  rate_limit_reset_at   timestamp
);

Rate limits are enforced using atomic SQL increments:

UPDATE user_rate_limits
SET sync_api_requests = sync_api_requests + 1,
    last_request_at = NOW()
WHERE reference_id = $1
RETURNING sync_api_requests;

Rate Limit Tiers

PlanSync API/minAsync API/minExternal API/min
free105010
pro2520030
team7550060
enterprise1501000120

Part 7: Permissions System

permissions - Fine-Grained Access Control

CREATE TABLE permissions (
  id              uuid PRIMARY KEY,
  user_id         uuid REFERENCES "user"(id),
  entity_type     text NOT NULL,         -- 'workspace', 'workflow', 'knowledge_base'
  entity_id       uuid NOT NULL,
  permission_type text NOT NULL,         -- 'read', 'write', 'admin'
  UNIQUE(user_id, entity_type, entity_id)
);

Permissions are checked via efficient joins:

SELECT w.* FROM workflow w
LEFT JOIN permissions p ON p.entity_id = w.id
  AND p.entity_type = 'workflow'
  AND p.user_id = $1
WHERE w.user_id = $1
   OR p.permission_type IN ('read', 'write', 'admin');

Entity Relationship Diagram

┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
│    user     │────<│   session    │     │  organization   │
└─────────────┘     └──────────────┘     └─────────────────┘
       │                   │                      │
       │                   │                      │
       ▼                   ▼                      ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
│   account   │     │   member     │────>│   workspace     │
└─────────────┘     └──────────────┘     └─────────────────┘

       ┌──────────────────────────────────────────┤
       │                                          │
       ▼                                          ▼
┌─────────────────┐                    ┌─────────────────────┐
│ workflow_folder │                    │      workflow       │
└─────────────────┘                    └─────────────────────┘

       ┌──────────────┬──────────────┬───────────┼───────────┐
       │              │              │           │           │
       ▼              ▼              ▼           ▼           ▼
┌──────────────┐ ┌──────────┐ ┌───────────┐ ┌─────────┐ ┌─────────┐
│workflow_blocks│ │  edges   │ │ subflows  │ │ webhook │ │schedule │
└──────────────┘ └──────────┘ └───────────┘ └─────────┘ └─────────┘


┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│ knowledge_base  │────<│   document   │────<│    embedding    │
└─────────────────┘     └──────────────┘     └─────────────────┘

Design Patterns & Best Practices

1. Dual-State Pattern

The workflow system maintains both denormalized (JSONB) and normalized (relational) representations. This provides:

  • Fast reads during execution (single JSONB fetch)
  • Efficient queries for search and filtering (indexed columns)
  • Atomic updates via JSONB operations

2. Soft Deletes

Tables like document and knowledge_base use deleted_at timestamps instead of hard deletes, enabling:

  • Audit trails
  • Undo functionality
  • Compliance requirements

3. Polymorphic References

The subscription.reference_id and user_rate_limits.reference_id columns can point to either users or organizations, avoiding table duplication.

4. JSONB for Flexibility

Complex, evolving structures like block configurations and execution traces use JSONB columns, enabling schema evolution without migrations.

5. Strategic Indexing

  • Foreign key indexes for join performance
  • Timestamp indexes (DESC) for “recent items” queries
  • Partial indexes for filtered queries (e.g., active schedules)
  • HNSW vector indexes for similarity search

Conclusion

Sim Studio’s database schema represents a thoughtful balance between relational integrity and document flexibility. By combining:

  • Normalized tables for efficient querying and referential integrity
  • JSONB columns for complex, evolving structures
  • pgvector for AI-powered semantic search
  • Better Auth for enterprise-grade authentication

The architecture supports the demanding requirements of a modern AI workflow platform: real-time collaboration, comprehensive observability, multi-tenant isolation, and sub-second vector search across millions of embeddings.

For developers looking to contribute or deploy self-hosted instances, understanding this schema provides the foundation for extending Sim Studio’s capabilities while maintaining data integrity and performance.


Want to explore more? Check out the full schema on GitHub or dive into the official documentation.