Merge pull request #46 from PR0M3TH3AN/codex/improve-marlin-file-renaming-and-metadata-handling

Document rename handling design
This commit is contained in:
thePR0M3TH3AN
2025-05-21 16:59:25 -04:00
committed by GitHub
2 changed files with 73 additions and 1 deletions

View File

@@ -25,7 +25,7 @@
| Phase / Sprint | Timeline | Focus & Rationale | ✦ Key UX Deliverables | △ Engineering artefacts / tasks | Definition of Done | | Phase / Sprint | Timeline | Focus & Rationale | ✦ Key UX Deliverables | △ Engineering artefacts / tasks | Definition of Done |
| --------------------------------------------- | -------- | ---------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | | --------------------------------------------- | -------- | ---------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
| ~~**Epic 1 — Scale & Reliability**~~ | ~~2025-Q2~~ | ~~Stay fast @ 100 k files~~ | ~~• `scan --dirty` (re-index touched rows only)~~ | ~~• DP-002 Dirty-flag design + FTS rebuild cadence<br>• Hyperfine benchmark script committed~~ | ~~Dirty scan vs full ≤ 15 % runtime on 100 k corpus; benchmark job passes~~ | | ~~**Epic 1 — Scale & Reliability**~~ | ~~2025-Q2~~ | ~~Stay fast @ 100 k files~~ | ~~• `scan --dirty` (re-index touched rows only)~~ | ~~• DP-002 Dirty-flag design + FTS rebuild cadence<br>• Hyperfine benchmark script committed~~ | ~~Dirty scan vs full ≤ 15 % runtime on 100 k corpus; benchmark job passes~~ |
| **Epic 2 — Live Mode & Self-Pruning Backups** | 2025-Q2 | “Just works” indexing, DB never explodes | • `marlin watch <dir>` (notify/FSEvents)<br>`backup --prune N` & auto-prune | • DP-003 file-watcher life-cycle & debouncing<br>• Integration test with inotify-sim <br>• Cron-style GitHub job for nightly prune | 8 h stress-watch alters 10 k files < 1 % misses; backup dir N | | **Epic 2 — Live Mode & Self-Pruning Backups** | 2025-Q2 | “Just works” indexing, DB never explodes | • `marlin watch <dir>` (notify/FSEvents)<br>`backup --prune N` & auto-prune<br>• rename/move tracking keeps paths current | • DP-003 file-watcher life-cycle & debouncing<br>• Integration test with inotify-sim<br>• Rename/Move handling spec & tests<br>• Cron-style GitHub job for nightly prune | 8 h stress-watch alters 10 k files < 1 % misses; backup dir N |
| **Phase 3 — Content FTS + Annotations** | 2025-Q3 | Search inside files, leave notes | Grep-style snippet output (`-C3`)<br>`marlin annotate add/list` | • DP-004 content-blob strategy (inline vs ext-table)<br>• Syntax-highlight via `syntect` PoC<br>• New FTS triggers unit-tested | Indexes 1 GB corpus in ≤ 30 min; snippet CLI passes golden-file tests | | **Phase 3 — Content FTS + Annotations** | 2025-Q3 | Search inside files, leave notes | Grep-style snippet output (`-C3`)<br>`marlin annotate add/list` | • DP-004 content-blob strategy (inline vs ext-table)<br>• Syntax-highlight via `syntect` PoC<br>• New FTS triggers unit-tested | Indexes 1 GB corpus in ≤ 30 min; snippet CLI passes golden-file tests |
| **Phase 4 — Versioning & Deduplication** | 2025-Q3 | Historic diffs, detect dupes | • `scan --rehash` (SHA-256)<br>`version diff <file>` | • DP-005 hash column + Bloom-de-dupe<br>• Binary diff adapter research | Diff on 10 MB file ≤ 500 ms; dupes listed via CLI | | **Phase 4 — Versioning & Deduplication** | 2025-Q3 | Historic diffs, detect dupes | • `scan --rehash` (SHA-256)<br>`version diff <file>` | • DP-005 hash column + Bloom-de-dupe<br>• Binary diff adapter research | Diff on 10 MB file ≤ 500 ms; dupes listed via CLI |
| **Phase 5 — Tag Aliases & Semantic Booster** | 2025-Q3 | Tame tag sprawl, start AI hints | • `tag alias add/ls/rm`<br>`tag suggest`, `summary` | • DP-006 embeddings size & model choice<br>• Vector store schema + k-NN index bench | 95 % of “foo/bar~foo” alias look-ups resolve in one hop; suggest CLI returns ≤ 150 ms | | **Phase 5 — Tag Aliases & Semantic Booster** | 2025-Q3 | Tame tag sprawl, start AI hints | • `tag alias add/ls/rm`<br>`tag suggest`, `summary` | • DP-006 embeddings size & model choice<br>• Vector store schema + k-NN index bench | 95 % of “foo/bar~foo” alias look-ups resolve in one hop; suggest CLI returns ≤ 150 ms |
@@ -46,6 +46,7 @@
| Tarpaulin coverage gate | S0 | | | | Tarpaulin coverage gate | S0 | | |
| Watch mode (FS events) | Epic1 | `marlin watch .` | DP002 | | Watch mode (FS events) | Epic1 | `marlin watch .` | DP002 |
| Backup autoprune | Epic1 | `backup --prune N` | | | Backup autoprune | Epic1 | `backup --prune N` | |
| Rename/move tracking | Epic2 | automatic path update | |
| Dirtyscan | Epic2 | `scan --dirty` | DP002 | | Dirtyscan | Epic2 | `scan --dirty` | DP002 |
| Grep snippets | Phase3 | `search -C3 …` | DP004 | | Grep snippets | Phase3 | `search -C3 …` | DP004 |
| Hash / dedupe | Phase4 | `scan --rehash` | DP005 | | Hash / dedupe | Phase4 | `scan --rehash` | DP005 |

View File

@@ -0,0 +1,71 @@
# Marlin — Rename & Move Handling
**Integration Specification · v0.1 (2025-05-19)**
---
## 0 · Scope
This document outlines how Marlin should respond when files or folders are renamed or moved. It extends the watcher lifecycle design (DP003) so that metadata remains consistent without requiring a full rescan.
## 1 · Background
The current watcher maps any `notify::EventKind::Modify(_)` including renames to the generic `EventPriority::Modify` and merely logs the event:
```
415 let prio = match event.kind {
416 EventKind::Create(_) => EventPriority::Create,
417 EventKind::Remove(_) => EventPriority::Delete,
418 EventKind::Modify(_) => EventPriority::Modify,
419 EventKind::Access(_) => EventPriority::Access,
420 _ => EventPriority::Modify,
421 };
...
455 for event_item in &evts_to_process {
456 info!("Processing event (DB available): {:?} for path {:?}",
457 event_item.kind, event_item.path);
458 }
```
No database update occurs, so renamed files keep their old `path` in the `files` table. The schema does have a trigger to propagate `path` updates to the FTS index:
```
72 -- When a files path changes
73 DROP TRIGGER IF EXISTS files_fts_au_file;
74 CREATE TRIGGER files_fts_au_file
75 AFTER UPDATE OF path ON files
76 BEGIN
77 UPDATE files_fts
78 SET path = NEW.path
79 WHERE rowid = NEW.id;
80 END;
```
## 2 · Requirements
1. **Detect old and new paths** from `Rename` events provided by the `notify` crate.
2. **Update the `files` table** with the new absolute path when the target remains inside a scanned root.
3. **Mark as removed** if the new location is outside all configured roots.
4. **Batch updates** to avoid excessive writes during large folder moves.
5. **Integration tests** exercising rename and move scenarios across platforms.
## 3 · Implementation Sketch
* Extend `ProcessedEvent` to carry `old_path` and `new_path` for `Rename` events.
* Upon flushing events, call `db::mark_dirty` for the affected row, then update the `files.path` column. The existing trigger keeps `files_fts` in sync.
* For directory renames, update child paths with a single SQL `UPDATE ... WHERE path LIKE 'old/%'` inside a transaction.
* Emit `Create` and `Remove` events for files crossing watch boundaries so `scan --dirty` can prune or index them accordingly.
## 4 · Edge Cases
* **Atomic cross-filesystem moves** may surface as `Remove` + `Create`; both should be handled.
* **Concurrent modifications** while moving should result in the newer metadata winning when `scan --dirty` runs.
## 5 · Future Work
Large scale refactors (e.g. moving an entire project) may benefit from a highlevel command that updates tags and links en masse. That is outside the scope of this spec but enabled by accurate rename tracking.
---
*End of document*