This commit is contained in:
thePR0M3TH3AN
2025-05-18 14:25:38 -04:00
parent 31d25b7eb4
commit 6157ac5233
3 changed files with 313 additions and 211 deletions

226
README.md
View File

@@ -1,194 +1,80 @@
![Marlin Logo](https://raw.githubusercontent.com/PR0M3TH3AN/Marlin/refs/heads/main/assets/png/marlin_logo.png)
# Marlin ― Delivery Roadmap **v3.2**
# Marlin
*Engineeringready revised 20250518*
**Marlin** is a lightweight, metadata-driven file indexer that runs **100 % on your computer**. It scans folders, stores paths and file stats in SQLite, lets you attach hierarchical **tags** and **custom attributes**, keeps timestamped **snapshots**, and offers instant full-text search via FTS5. _No cloud, no telemetry your data never leaves the machine._
> **Legend** engineering artefact uservisible deliverable
---
## Feature highlights
## 0 · Methodology primer  (what “Done” means)
| Area | What you get |
| ------------------- | ----------------------------------------------------------------------------------------------------- |
| **Safety** | Timestamped backups (`marlin backup`) and one-command restore (`marlin restore`) |
| **Resilience** | Versioned, idempotent schema migrations zero-downtime upgrades |
| **Indexing** | Fast multi-path scanner with SQLite WAL concurrency |
| **Metadata** | Hierarchical tags (`project/alpha`) & key-value attributes (`reviewed=yes`) |
| **Relations** | Typed file ↔ file links (`marlin link`) with backlinks viewer |
| **Collections / Views** | Named playlists (`marlin coll`) & saved searches (`marlin view`) for instant recall |
| **Search** | Prefix-aware FTS5 across paths, tags, attrs & links; optional `--exec` per match <br>(grep-style context snippets coming Q3) |
| **DX / Logs** | Structured tracing (`RUST_LOG=debug`) for every operation |
| Theme | Project ruleofthumb |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| **Branching** | Trunkbased. Feature branch → PR → 2 reviews → squashmerge. |
| **Spec first** | Each epic begins with a **Design Proposal (DPxxx)** in `/docs/adr/` containing schema diffs, example CLI session, perf targets. |
| **Coverage** | Tarpaulin gate ≥ 85% **on lines touched this sprint** (checked in CI). |
| **Perf gate** | Coldstart P95  3 s on 100k files **unless overridden in DP**. Regressions fail CI. |
| **Docs** | CLI flags & examples land in `README.md` **same PR**. Docs tables (CLI cheatsheet, TUI keymap) are autogenerated during the build. |
| **Demo** | Closing each epic yields a ≤2min asciinema or GIF in `docs/demos/`. |
---
## How it works
## 1 · Birdseye table (engineering details + deliverables)
```text
┌──────────────┐ marlin scan ┌─────────────┐
│ your files │ ─────────────────────▶│ SQLite │
│ (any folder) │ │ files/tags │
└──────────────┘ tag / attr / link │ attrs / FTS │
▲ search / exec └──────┬──────┘
└────────── backup / restore ▼
timestamped snapshots
````
| Phase / Sprint | Timeline | Focus & Rationale | ✦ Key UX Deliverables | △ Engineering artefacts / tasks | Definition of Done |
| ----------------------------------------------- | ----------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| **Sprint0 — Bootstrap & CI Baseline** | **2025Q2<br>(now  30 May)** | CI scaffolding, coverage, crate split | — | • Split repo into **`libmarlin` (core)** + **`cli-bin`** + **`tui-bin`** <br>• Tarpaulin coverage + Hyperfine perf jobs wired <br>`build.rs` renders CLI cheatsheet from `commands.yaml` <br>• Docs / cheatsheet autogen step in GitHub Actions | `cargo test --all` passes with coverage gate  85%; docs artefacts appear in build; crates compile. |
| **Sprintα — Bedrock & Metadata Domains** | **31 May  13 Jun 2025** | Lock schema v1.1, first metadata objects | • CLI stubs: `marlin link / coll / view` <br>`marlin demo` interactive tour | • **DP001 Schema v1.1** (ER + migration scripts) <br>• Unit tests (`escape_fts`, `determine_scan_root`) <br>• GitHub Action for SQL dryrun | 100% migrations green; demo prints ✅; logo badge shows schema version. |
| **Epic1 — LiveWatch Mode & Backup Prune** | **2025Q2** | Continuous indexing via FS events; backups never explode | • `marlin watch <dir>` (inotify / FSEvents) <br>`backup --prune N` (autoprune pre and postcommand) | • **DP002** filewatch lifecycle & debounce strategy <br>• Changetable schema storing dirty file IDs <br>• Nightly prune CI job | 8h stresswatch alters 10k files  <1% missed; backup dir size  N; watch CPU idle <3%. |
| **Epic2 — Dirtyscan optimisation** | **2025Q2** | Reindex only paths marked dirty by watch table | `scan --dirty` | Reuse changetable from watch; Hyperfine benchmark script committed | Dirtyscan runtime 15% full scan on 100k corpus; bench job passes. |
| **Phase3 — Content FTS + Annotations** | 2025Q3 | Grep snippets, inline notes | `search -C3` grepstyle context <br>`annotate add/list` | • **DP004** contentblob strategy (inline vs exttable) <br>`syntect` highlight PoC | Indexes 1GB corpus ≤30min; snippet CLI golden tests pass. |
| **Phase4 — Versioning & Deduplication** | 2025Q3 | Historic diffs, SHA256 dedupe | • `scan --rehash` <br>`version diff <file>` | • **DP005** hash column + Bloomdedupe research | Diff on 10MB file ≤500ms; duplicate sets emitted by CLI. |
| **Phase5 — Tag Aliases & Semantic Booster** | 2025Q3 | Tame tag sprawl; start AI hints | • `tag alias add/ls/rm` <br>`tag suggest`, `summary` | • **DP006** embeddings size & kNN search bench | 95% alias lookups resolved in one hop; suggest query ≤150ms. |
| **Phase6 — Search DSL v2 & Smart Views** | 2025Q4 | AND/OR, ranges, structured grammar; smart folders | • New `nom` grammar <br>• Legacy parser behind **`--legacy-search`** (warn on use) | • **DP007** BNF + 30acceptance strings <br>• Lexer fuzz tests (`cargofuzz`) | Old queries keep working; 0 panics in fuzz run 1M cases. |
| **Phase7 — Structured Workflows & Templates** | 2025Q4 | State graph, relationship templates | • `state set/log` <br>`template apply` | • **DP008** workflow tables & YAML template spec <br>• Sample template e2e tests | Create template, apply to 20 files → all attrs/link rows present; illegal transitions blocked. |
| **Phase8 — TUIv1 + Lightweight Integrations** | 2026Q1 | Keyboard UI, VS Code sidebar | • **`marlintui`** binary (tiling panes, keymap) <br>• Readonly VS Code sidebar | • **DP009** TUI redraw budget & keymap <br>• Crate split fully consumed | TUI binary ≤2MB; scroll redraw ≤4ms; VS Code extension loads index. |
| **Phase9 — Dolphin Sidebar (MVP)** | 2026Q1 | Peek metadata inline in KDE Dolphin | • Qt/KIO sidebar | • **DP010** DB/IP bridge (DBus vs UNIX socket) <br>• CMake packaging script | Sidebar opens ≤150ms; passes KDE lint. |
| **Phase10 — Full GUI & Multidevice Sync** | 2026Q2 | Visual editor + optional sync backend | • Electron/Qt hybrid explorer UI <br>• Select & integrate sync (LiteFS / Postgres) | • **DP011** sync backend tradestudy <br>• Busytimeout/retry strategy for multiwriter mode | CRUD roundtrip <2s between two nodes; 25 GUI e2e tests green. |
---
## Prerequisites
### 2 · Feature crossmatrix (quick lookups)
| Requirement | Why |
| ------------------ | ----------------------------- |
| **Rust ≥ 1.77** | Build toolchain (`rustup.rs`) |
| C build essentials | Builds bundled SQLite (Linux) |
macOS & Windows users: let the Rust installer pull the matching build tools.
| Capability | Sprint / Phase | CLI / GUI element | Linked DP |
| -------------------------- | -------------- | ----------------------------------- | --------- |
| Crate split & docs autogen | S0 | | |
| Tarpaulin coverage gate | S0 | | |
| Watch mode (FS events) | Epic1 | `marlin watch .` | DP002 |
| Backup autoprune | Epic1 | `backup --prune N` | |
| Dirtyscan | Epic2 | `scan --dirty` | DP002 |
| Grep snippets | Phase3 | `search -C3 …` | DP004 |
| Hash / dedupe | Phase4 | `scan --rehash` | DP005 |
| Tag aliases | Phase5 | `tag alias` commands | DP006 |
| Search DSL v2 | Phase6 | new grammar, `--legacy-search` flag | DP007 |
| Relationship templates | Phase7 | `template new/apply` | DP008 |
| TUI v1 | Phase8 | `marlintui` | DP009 |
---
## Build & install
## 3 · Milestone acceptance checklist
```bash
git clone https://github.com/PR0M3TH3AN/Marlin.git
cd Marlin
cargo build --release
Before a milestone is declared **shipped**:
# (Optional) install into your PATH
sudo install -Dm755 target/release/marlin /usr/local/bin/marlin
```
* [ ] **Spec** DPxxx merged with schema diff, ASCIIcast demo
* [ ] **Tests** Tarpaulin 85% on changed lines; all suites green
* [ ] **Perf guard** script passes on CI matrix (Ubuntu 22, macOS 14)
* [ ] **Docs** autoregenerated; README & cheatsheet updated
* [ ] **Demo** asciinema/GIF committed and linked in release notes
* [ ] **Release tag** pushed; Cargo binary uploaded to GitHub Releases
---
## Quick start
## 4 · Next immediate actions
For a concise walkthrough—including **links, collections and views**—see
[**Quick start & Demo**](marlin_demo.md).
---
## Testing
Below is a **repeat-able 3-step flow** you can use **every time you pull fresh code**.
### 0Prepare once
```bash
# Put build artefacts in one place (faster incremental builds)
export CARGO_TARGET_DIR=target
```
### 1Build the new binary
```bash
git pull
cargo build --release
sudo install -Dm755 target/release/marlin /usr/local/bin/marlin
```
### 2Run the smoke-test suite
```bash
cargo test --test e2e -- --nocapture
```
*Streams CLI output live; exit-code 0 = all good.*
### 3(Optionally) run **all** tests
```bash
cargo test --all -- --nocapture
```
This now covers:
* unit tests in `src/**`
* positive & negative integration suites (`tests/pos.rs`, `tests/neg.rs`)
* doc-tests
#### One-liner helper
```bash
git pull && cargo build --release &&
sudo install -Dm755 target/release/marlin /usr/local/bin/marlin &&
cargo test --test e2e -- --nocapture
```
Alias it as `marlin-ci` for a 5-second upgrade-and-verify loop.
---
### Database location
| OS | Default path |
| ----------- | ----------------------------------------------- |
| **Linux** | `~/.local/share/marlin/index.db` |
| **macOS** | `~/Library/Application Support/marlin/index.db` |
| **Windows** | `%APPDATA%\marlin\index.db` |
Override:
```bash
export MARLIN_DB_PATH=/path/to/custom.db
```
---
## CLI reference
```text
marlin <COMMAND> [ARGS]
init create / migrate DB **and perform an initial scan of the cwd**
scan <PATHS>... walk directories & (re)index files
tag "<glob>" <tag_path> add hierarchical tag
attr set <pattern> <key> <val> set or update custom attribute
attr ls <path> list attributes
link add|rm|list|backlinks manage typed file-to-file relations
coll create|add|list manage named collections (“playlists”)
view save|list|exec save and run smart views (saved queries)
search <query> [--exec CMD] FTS5 query; optionally run CMD per hit
backup create timestamped snapshot in `backups/`
restore <snapshot.db> replace DB with snapshot
completions <shell> generate shell completions
```
### Attribute sub-commands
| Command | Example |
| ----------- | ------------------------------------------------ |
| `attr set` | `marlin attr set ~/Docs/**/*.pdf reviewed yes` |
| `attr ls` | `marlin attr ls ~/Docs/report.pdf` |
| JSON output | `marlin --format=json attr ls ~/Docs/report.pdf` |
---
## Backups & restore
```bash
marlin backup
# → ~/.local/share/marlin/backups/backup_2025-05-14_22-15-30.db
```
```bash
marlin restore ~/.local/share/marlin/backups/backup_2025-05-14_22-15-30.db
```
> Marlin also creates an **automatic safety backup before every non-`init` command.**
> *Auto-prune (`backup --prune <N>`) lands in Q2.*
---
## Upgrading
```bash
cargo install --path . --force # rebuild & replace installed binary
```
Versioned migrations preserve your data across upgrades.
---
## License
MIT see [`LICENSE`](LICENSE).
| # | Task | Owner | Due |
| - | ------------------------------ | ------ | ------------- |
| 1 | Crate split + CI baseline | @alice | **26May 25** |
| 2 | Tarpaulin + Hyperfine jobs | @bob | **26May 25** |
| 3 | **DP001 Schema v1.1** draft | @carol | **30May 25** |
| 4 | backup prune CLI + nightly job | @dave | **05Jun 25** |