6.8 KiB
DP-001: Schema v1.1 – Core Metadata Domains
Status: Accepted
Authors: @carol
Date: 2025-05-17
1. Context
We’ve landed a basic SQLite-backed files
table and a contentless FTS5 index. Before we build out higher-level features, we need to lock down our v1.1 metadata schema for:
- Hierarchical tags (
tags
+file_tags
) – alias resolution handled at query time - Custom attributes (
attributes
) - File-to-file relationships (
links
) - Named collections (
collections
+collection_files
) - Views (
views
)
Locking this schema now lets downstream CLI & GUI work against a stable model and ensures our migrations stay easy to reason about.
Alias relationships are resolved outside the table itself; there is no canonical_id
column.
2. Decision
Each migration will begin by enabling foreign-key enforcement and WAL journaling:
PRAGMA foreign_keys = ON;
PRAGMA journal_mode = WAL;
All foreign keys use ON DELETE CASCADE
so deleting a file, tag, etc. automatically cleans up dependents.
-
Bump to schema version 1.1 in our
schema_version
table. -
Provide four migration scripts, applied in order:
- 0001_initial_schema.sql – create core tables (
files
,tags
,file_tags
,attributes
), a contentless FTS5 table (files_fts
), core FTS triggers, and performance-critical indexes. - 0002_update_fts_and_triggers.sql – replace old tag/attr FTS triggers with
INSERT OR REPLACE
semantics for full-row refresh. - 0003_create_links_collections_views.sql – introduce
links
,collections
,collection_files
, andviews
tables. - 0004_fix_hierarchical_tags_fts.sql – refine FTS triggers to index full hierarchical tag-paths via a recursive CTE.
- 0001_initial_schema.sql – create core tables (
-
Expose this schema through our library (
libmarlin::db::open
) so any client sees a v1.1 store. -
Track the version in code via
SCHEMA_VERSION
and providecurrent_schema_version()
to query the DB.
3. ER Diagram
Below is the updated entity-relationship diagram (PlantUML):
@startuml
entity files {
* id : INTEGER <<PK>>
--
path : TEXT
size : INTEGER
mtime : INTEGER
hash : TEXT
}
entity tags {
* id : INTEGER <<PK>>
--
name : TEXT
parent_id : INTEGER <<FK>>
}
entity file_tags {
* file_id : INTEGER <<FK>>
* tag_id : INTEGER <<FK>>
}
entity attributes {
* id : INTEGER <<PK>>
--
file_id : INTEGER <<FK>>
key : TEXT
value : TEXT
}
entity links {
* id : INTEGER <<PK>>
--
src_file_id : INTEGER <<FK>>
dst_file_id : INTEGER <<FK>>
type : TEXT
}
entity collections {
* id : INTEGER <<PK>>
--
name : TEXT
}
entity collection_files {
* collection_id : INTEGER <<FK>>
* file_id : INTEGER <<FK>>
}
entity views {
* id : INTEGER <<PK>>
--
name : TEXT
query : TEXT
}
files ||--o{ file_tags
tags ||--o{ file_tags
files ||--o{ attributes
files ||--o{ links : "src_file_id"
files ||--o{ links : "dst_file_id"
collections ||--o{ collection_files
files ||--o{ collection_files
views ||..|| files : "exec via FTS"
@enduml
Or in plain-ASCII:
┌────────┐ ┌────────────┐ ┌───────┐
│ files │1────*──│ file_tags │*────1─│ tags │
└────────┘ └────────────┘ └───────┘
│
│1
*
┌────────────┐
│ attributes │
└────────────┘
┌────────┐ ┌────────┐ ┌────────┐
│ files │1──*──│ links │*───1──│ files │
└────────┘ └────────┘ └────────┘
┌─────────────┐ ┌──────────────────┐ ┌────────┐
│ collections │1──*─│ collection_files │*──1─│ files │
└─────────────┘ └──────────────────┘ └────────┘
┌─────────────┐
│ views │
│ (exec FTS) │
└─────────────┘
4. Migration Summary
File | Purpose |
---|---|
0001_initial_schema.sql | Core tables + contentless FTS + core triggers + indexes |
0002_update_fts_and_triggers.sql | Full-row FTS refresh on tag/attr changes |
0003_create_links_collections_views.sql | Add links , collections , collection_files , views |
0004_fix_hierarchical_tags_fts.sql | Recursive CTE for full tag-path indexing in FTS triggers |
0005_add_dirty_table.sql | Track modified files needing reindexing |
0006_drop_tags_canonical_id.sql | Remove legacy canonical_id column from tags |
Performance-Critical Indexes
idx_files_path
onfiles(path)
idx_files_hash
onfiles(hash)
idx_tags_name_parent
ontags(name, parent_id)
idx_file_tags_tag_id
onfile_tags(tag_id)
idx_attr_file_key
onattributes(file_id, key)
5. Example CLI Session
$ marlin init
Database initialised at ~/.local/share/marlin/index_*.db
Initial scan complete – indexed/updated 42 files
$ marlin link add ./todo.md ./projects/plan.md
Linked './todo.md' → './projects/plan.md'
$ marlin coll create "MyDocs"
Created collection 'MyDocs'
$ marlin view save tasks "tag:project AND TODO"
Saved view 'tasks' = tag:project AND TODO
$ marlin view list
tasks: tag:project AND TODO
$ marlin view exec tasks
~/Projects/Alpha/draft1.md
~/Projects/Beta/final.md
6. Consequences
- Backward compatibility: older v1.0 stores will be migrated on first open.
- Stability: downstream features (TUI, VS Code, web UI) can depend on a stable v1.1 schema.
- Simplicity: by consolidating metadata domains now, future migrations remain small and focused.
- Performance: v1.1 schema meets our cold-start P95 ≤ 3 s on a 100 k-file corpus (with CI-enforced benchmarks and the indexes above).
End of DP-001