mirror of
https://github.com/PR0M3TH3AN/Marlin.git
synced 2025-09-08 07:08:44 +00:00
Merge pull request #46 from PR0M3TH3AN/codex/improve-marlin-file-renaming-and-metadata-handling
Document rename handling design
This commit is contained in:
@@ -25,7 +25,7 @@
|
|||||||
| Phase / Sprint | Timeline | Focus & Rationale | ✦ Key UX Deliverables | △ Engineering artefacts / tasks | Definition of Done |
|
| Phase / Sprint | Timeline | Focus & Rationale | ✦ Key UX Deliverables | △ Engineering artefacts / tasks | Definition of Done |
|
||||||
| --------------------------------------------- | -------- | ---------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
|
| --------------------------------------------- | -------- | ---------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
|
||||||
| ~~**Epic 1 — Scale & Reliability**~~ | ~~2025-Q2~~ | ~~Stay fast @ 100 k files~~ | ~~• `scan --dirty` (re-index touched rows only)~~ | ~~• DP-002 Dirty-flag design + FTS rebuild cadence<br>• Hyperfine benchmark script committed~~ | ~~Dirty scan vs full ≤ 15 % runtime on 100 k corpus; benchmark job passes~~ |
|
| ~~**Epic 1 — Scale & Reliability**~~ | ~~2025-Q2~~ | ~~Stay fast @ 100 k files~~ | ~~• `scan --dirty` (re-index touched rows only)~~ | ~~• DP-002 Dirty-flag design + FTS rebuild cadence<br>• Hyperfine benchmark script committed~~ | ~~Dirty scan vs full ≤ 15 % runtime on 100 k corpus; benchmark job passes~~ |
|
||||||
| **Epic 2 — Live Mode & Self-Pruning Backups** | 2025-Q2 | “Just works” indexing, DB never explodes | • `marlin watch <dir>` (notify/FSEvents)<br>• `backup --prune N` & auto-prune | • DP-003 file-watcher life-cycle & debouncing<br>• Integration test with inotify-sim <br>• Cron-style GitHub job for nightly prune | 8 h stress-watch alters 10 k files < 1 % misses; backup dir ≤ N |
|
| **Epic 2 — Live Mode & Self-Pruning Backups** | 2025-Q2 | “Just works” indexing, DB never explodes | • `marlin watch <dir>` (notify/FSEvents)<br>• `backup --prune N` & auto-prune<br>• rename/move tracking keeps paths current | • DP-003 file-watcher life-cycle & debouncing<br>• Integration test with inotify-sim<br>• Rename/Move handling spec & tests<br>• Cron-style GitHub job for nightly prune | 8 h stress-watch alters 10 k files < 1 % misses; backup dir ≤ N |
|
||||||
| **Phase 3 — Content FTS + Annotations** | 2025-Q3 | Search inside files, leave notes | • Grep-style snippet output (`-C3`)<br>• `marlin annotate add/list` | • DP-004 content-blob strategy (inline vs ext-table)<br>• Syntax-highlight via `syntect` PoC<br>• New FTS triggers unit-tested | Indexes 1 GB corpus in ≤ 30 min; snippet CLI passes golden-file tests |
|
| **Phase 3 — Content FTS + Annotations** | 2025-Q3 | Search inside files, leave notes | • Grep-style snippet output (`-C3`)<br>• `marlin annotate add/list` | • DP-004 content-blob strategy (inline vs ext-table)<br>• Syntax-highlight via `syntect` PoC<br>• New FTS triggers unit-tested | Indexes 1 GB corpus in ≤ 30 min; snippet CLI passes golden-file tests |
|
||||||
| **Phase 4 — Versioning & Deduplication** | 2025-Q3 | Historic diffs, detect dupes | • `scan --rehash` (SHA-256)<br>• `version diff <file>` | • DP-005 hash column + Bloom-de-dupe<br>• Binary diff adapter research | Diff on 10 MB file ≤ 500 ms; dupes listed via CLI |
|
| **Phase 4 — Versioning & Deduplication** | 2025-Q3 | Historic diffs, detect dupes | • `scan --rehash` (SHA-256)<br>• `version diff <file>` | • DP-005 hash column + Bloom-de-dupe<br>• Binary diff adapter research | Diff on 10 MB file ≤ 500 ms; dupes listed via CLI |
|
||||||
| **Phase 5 — Tag Aliases & Semantic Booster** | 2025-Q3 | Tame tag sprawl, start AI hints | • `tag alias add/ls/rm`<br>• `tag suggest`, `summary` | • DP-006 embeddings size & model choice<br>• Vector store schema + k-NN index bench | 95 % of “foo/bar~foo” alias look-ups resolve in one hop; suggest CLI returns ≤ 150 ms |
|
| **Phase 5 — Tag Aliases & Semantic Booster** | 2025-Q3 | Tame tag sprawl, start AI hints | • `tag alias add/ls/rm`<br>• `tag suggest`, `summary` | • DP-006 embeddings size & model choice<br>• Vector store schema + k-NN index bench | 95 % of “foo/bar~foo” alias look-ups resolve in one hop; suggest CLI returns ≤ 150 ms |
|
||||||
@@ -46,6 +46,7 @@
|
|||||||
| Tarpaulin coverage gate | S0 | — | – |
|
| Tarpaulin coverage gate | S0 | — | – |
|
||||||
| Watch mode (FS events) | Epic 1 | `marlin watch .` | DP‑002 |
|
| Watch mode (FS events) | Epic 1 | `marlin watch .` | DP‑002 |
|
||||||
| Backup auto‑prune | Epic 1 | `backup --prune N` | – |
|
| Backup auto‑prune | Epic 1 | `backup --prune N` | – |
|
||||||
|
| Rename/move tracking | Epic 2 | automatic path update | – |
|
||||||
| Dirty‑scan | Epic 2 | `scan --dirty` | DP‑002 |
|
| Dirty‑scan | Epic 2 | `scan --dirty` | DP‑002 |
|
||||||
| Grep snippets | Phase 3 | `search -C3 …` | DP‑004 |
|
| Grep snippets | Phase 3 | `search -C3 …` | DP‑004 |
|
||||||
| Hash / dedupe | Phase 4 | `scan --rehash` | DP‑005 |
|
| Hash / dedupe | Phase 4 | `scan --rehash` | DP‑005 |
|
||||||
|
71
docs/spec-details/Rename+Move-Handling.md
Normal file
71
docs/spec-details/Rename+Move-Handling.md
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# Marlin — Rename & Move Handling
|
||||||
|
|
||||||
|
**Integration Specification · v0.1 (2025-05-19)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0 · Scope
|
||||||
|
|
||||||
|
This document outlines how Marlin should respond when files or folders are renamed or moved. It extends the watcher life‑cycle design (DP‑003) so that metadata remains consistent without requiring a full re‑scan.
|
||||||
|
|
||||||
|
## 1 · Background
|
||||||
|
|
||||||
|
The current watcher maps any `notify::EventKind::Modify(_)` – including renames – to the generic `EventPriority::Modify` and merely logs the event:
|
||||||
|
|
||||||
|
```
|
||||||
|
415 let prio = match event.kind {
|
||||||
|
416 EventKind::Create(_) => EventPriority::Create,
|
||||||
|
417 EventKind::Remove(_) => EventPriority::Delete,
|
||||||
|
418 EventKind::Modify(_) => EventPriority::Modify,
|
||||||
|
419 EventKind::Access(_) => EventPriority::Access,
|
||||||
|
420 _ => EventPriority::Modify,
|
||||||
|
421 };
|
||||||
|
...
|
||||||
|
455 for event_item in &evts_to_process {
|
||||||
|
456 info!("Processing event (DB available): {:?} for path {:?}",
|
||||||
|
457 event_item.kind, event_item.path);
|
||||||
|
458 }
|
||||||
|
```
|
||||||
|
|
||||||
|
No database update occurs, so renamed files keep their old `path` in the `files` table. The schema does have a trigger to propagate `path` updates to the FTS index:
|
||||||
|
|
||||||
|
```
|
||||||
|
72 -- When a file’s path changes
|
||||||
|
73 DROP TRIGGER IF EXISTS files_fts_au_file;
|
||||||
|
74 CREATE TRIGGER files_fts_au_file
|
||||||
|
75 AFTER UPDATE OF path ON files
|
||||||
|
76 BEGIN
|
||||||
|
77 UPDATE files_fts
|
||||||
|
78 SET path = NEW.path
|
||||||
|
79 WHERE rowid = NEW.id;
|
||||||
|
80 END;
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2 · Requirements
|
||||||
|
|
||||||
|
1. **Detect old and new paths** from `Rename` events provided by the `notify` crate.
|
||||||
|
2. **Update the `files` table** with the new absolute path when the target remains inside a scanned root.
|
||||||
|
3. **Mark as removed** if the new location is outside all configured roots.
|
||||||
|
4. **Batch updates** to avoid excessive writes during large folder moves.
|
||||||
|
5. **Integration tests** exercising rename and move scenarios across platforms.
|
||||||
|
|
||||||
|
## 3 · Implementation Sketch
|
||||||
|
|
||||||
|
* Extend `ProcessedEvent` to carry `old_path` and `new_path` for `Rename` events.
|
||||||
|
* Upon flushing events, call `db::mark_dirty` for the affected row, then update the `files.path` column. The existing trigger keeps `files_fts` in sync.
|
||||||
|
* For directory renames, update child paths with a single SQL `UPDATE ... WHERE path LIKE 'old/%'` inside a transaction.
|
||||||
|
* Emit `Create` and `Remove` events for files crossing watch boundaries so `scan --dirty` can prune or index them accordingly.
|
||||||
|
|
||||||
|
## 4 · Edge Cases
|
||||||
|
|
||||||
|
* **Atomic cross-filesystem moves** may surface as `Remove` + `Create`; both should be handled.
|
||||||
|
* **Concurrent modifications** while moving should result in the newer metadata winning when `scan --dirty` runs.
|
||||||
|
|
||||||
|
## 5 · Future Work
|
||||||
|
|
||||||
|
Large scale refactors (e.g. moving an entire project) may benefit from a high‑level command that updates tags and links en masse. That is outside the scope of this spec but enabled by accurate rename tracking.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*End of document*
|
||||||
|
|
Reference in New Issue
Block a user