This commit is contained in:
thePR0M3TH3AN
2025-05-14 15:31:46 -04:00
parent 1b893cd88e
commit 1368693d06
10 changed files with 435 additions and 0 deletions

View File

@@ -0,0 +1,14 @@
[package]
name = "marlin"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = "1.0"
clap = { version = "4.5.2", features = ["derive"] }
directories = "5.0"
glob = "0.3"
rusqlite = { version = "0.31.0", features = ["bundled"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] }
walkdir = "2.5"

11
README.md Normal file
View File

@@ -0,0 +1,11 @@
# 1. Build
cargo build --release
# 2. Initialise DB (idempotent)
./target/release/marlin init
# 3. Scan a directory
./target/release/marlin scan ~/Pictures
# 4. Tag all JPEGs in Pictures
./target/release/marlin tag "~/Pictures/**/*.jpg" vacation

186
features.md Normal file
View File

@@ -0,0 +1,186 @@
# Marlin MetadataDriven File Explorer
*Version 2 12 May 2025*
---
## 1  Key Features & Functionality
| Feature Area | Capabilities |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Tagging System** | • Unlimited, hierarchical or flat tags.<br>• Alias/synonym support with precedence rules (admindefined canonical name).<br>**Bulk tag editing** via multiselect context menu.<br>• FoldertoTag import with optional *watch & sync* mode so new subfolders inherit tags automatically. |
| **Custom Metadata Attributes** | • Userdefined fields (text, number, date, enum, boolean).<br>• Pertemplate **Custom Metadata Schemas** (e.g. *Photo**Date, Location*). |
| **File Relationships** | • Typed, directional or bidirectional links (*related to*, *duplicate of*, *cites*…).<br>• Plugin API can register new relationship sets. |
| **Version Control for Metadata** | • Every change logged; unlimited rollback.<br>• Sidebyside diff viewer and *blame* panel showing *who/when/what*.<br>• Offline edits stored locally and merged (Gitstyle optimistic merge with conflict prompts). |
| **Advanced Search & Smart Folders** | • Structured query syntax: `tag:ProjectX AND author:Alice`.<br>• Naturallanguage search (*"files Alice edited last month"*) with toggle to exact mode.<br>• Visual Query Builder showing live query string.<br>• Saved queries appear as virtual “smart folders” that update in realtime. |
| **User Interface** | • Sidebar: tags, attributes, relationships.<br>• Draganddrop tagging; inline metadata editor.<br>• Search bar with autocomplete (Bloom filter backed).<br>**Dual View Mode** metadata vs traditional folder; remembers preference per location.<br>**Interactive 60second tour** on first launch plus contextual tooltip help. |
| **Collaboration** | • Realtime metadata sync across devices via cloud or selfhosted relay.<br>• Conflict handling as per Version Control.<br>• Rolebased permissions (read / write / admin) on tags & attributes. |
| **Performance & Scale** | • Sharded/distributed index optional for >1 M files.<br>• Query cache with LRU eviction.<br>• Target metrics (100 k files): cold start  3 s, complex query  150 ms (stretch 50 ms). |
| **Backup & Restore** | • Scheduled encrypted backups; export to JSON / XML.<br>• Oneclick restore from any pointintime snapshot. |
| **Extensibility** | • Plugin system (TypeScript/JS) see §2.4.<br>• Python scripting hook for automation and batch tasks.<br>• REST/IPC API for external tools. |
---
## 2  Technical Implementation
### 2.1  Core Stack
| Component | Primary Choice | Notes |
| -------------- | -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| File Manager | **Dolphin (KDE)** KIObased plugins | GTK users can install a Nautilus extension (featureparity subset). |
| Metadata Store | **SQLite + FTS5** (singleuser) → optional **LiteFS/Postgres** for replication & multiuser scale. | Perrow AESGCM encryption for sensitive fields; keys stored in OS keyring. |
| Indexer Daemon | Rust service using `notify` (inotify on Linux, FSEvents on macOS). | 100 ms debounce batches, async SQLite writes. |
| Cache | Inmemory LRU + Bloom filter for autocomplete. | |
### 2.2  Database Schema (simplified)
```text
files(id PK, path, inode, size, mtime, ctime, hash)
tags(id PK, name, parent_id, canonical_id)
file_tags(file_id FK, tag_id FK)
attributes(id PK, file_id FK, key, value, value_type)
relationships(id PK, src_file_id FK, dst_file_id FK, rel_type, direction)
change_log(change_id PK, object_table, object_id, op, actor, ts, payload_json)
```
### 2.3  Sync & Conflict Resolution
1. Each client appends to **change\_log** (CRDTcompatible delta).
2. Delta sync via WebSocket; server merges and rebroadcasts.
3. Conflicts → *Conflict Queue* UI (choose theirs / mine / merge).
### 2.4  Plugin API (TypeScript)
```ts
export interface MarlinPlugin {
onInit(ctx: CoreContext): void;
extendSchema?(db: Database): void; // e.g. add new relationship table
addCommands?(ui: UIContext): void; // register menus, actions
}
```
Plugins run in a sandboxed process with whitelisted IPC calls.
---
## 3  UX & Accessibility
* **Keyboardonly workflow** audit (Tab / ShiftTab / Space toggles).
* Highcontrast theme; adheres to WCAG 2.1 AA.
* `Ctrl+Alt+V` toggles Dual View.
* Generated query string shown live under Visual Builder educates power users.
---
## 4  Performance Budget
| Metric | MVP | Stretch |
| ------------------------ | --------- | ---------- |
| Cold start (100 k files) | ≤ 3 s | 1 s |
| Complex AND/OR query | ≤ 150 ms | 50 ms |
| Sustained inserts | 5 k ops/s | 20 k ops/s |
Benchmarks run nightly; regressions block merge.
---
## 5  Security & Privacy
* **Rolebased ACL** on tags/attributes.
* Perchange audit trail; logs rotated to cold storage (≥ 90 days online).
* Plugins confined by seccomp/AppArmor; no direct disk/network unless declared.
---
## 6  Packaging & Distribution
* **Flatpak** (GNOME/KDE) and **AppImage** for portable builds.
* Background service runs as a systemd user unit: `--user marlin-indexerd.service`.
* CLI (`marlin-cli`) packaged for headless servers & CI.
---
## 7  Roadmap
| Milestone | Scope | Timeline |
| --------- | ----------------------------------------------------------------------------- | -------- |
| **M1** | Tagging, attributes, virtual folders, SQLite, Dolphin plugin | 6 weeks |
| **M2** | Sync service, version control, CLI | +6 weeks |
| **M3** | NLP search, Visual Builder, distributed index prototype | +6 weeks |
| **M4** | Plugin marketplace, enterprise auth (LDAP/OIDC), mobile companion (viewonly) | +8 weeks |
---
## 8  Branding
* **Name**: **Marlin** fast, precise.
* Icon: stylised sailfish fin forming a folder corner.
* Tagline: *“Cut through clutter.”*
* Domain: `marlinexplorer.io` (availability checked 20250512).
---
## 9  QuickWin Checklist (Sprint 0)
* [ ] Implement bulk metadata editor UI
* [ ] Write conflictresolution spec & unit tests
* [ ] Build diff viewer prototype
* [ ] Keyboardonly navigation audit
* [ ] Establish performance CI with sample 100 k file corpus
---
---
## 10 Development Plan (Outline)
### 10.1 Process & Methodology
* **Framework**  2week Scrum sprints with Jira backlog, GitHub Projects mirror for public issues.
* **Branching**  Trunkbased: feature branches → PR → required CI & codereview approvals (2).*Main* autodeploys nightly Flatpak.
* **Definition of Done**  Code + unit tests + docs + passing CI + demo video (for UI work).
* **CI/CD**  GitHub Actions matrix (Ubuntu22.04, KDENeon, Fedora39) → Flatpak / AppImage artefacts, `cargo clippy`, coverage gate  85%.
### 10.2 Team & Roles (FTEequivalent)
| Role | Core Skills | Allocation |
| ----------------------------- | -------------------------------- | ---------- |
| Lead Engineer | Rust, Qt/Kirigami, KIO | 1.0 |
| Backend Engineer | Rust, LiteFS/Postgres, WebSocket | 1.0 |
| Fullstack / Plugin Engineer | TypeScript, Node, IPC | 0.8 |
| UX / QA | Figma, accessibility, Playwright | 0.5 |
| DevOps (fractional) | CI, Flatpak, security hardening | 0.2 |
### 10.3 Roadmap → Sprintlevel Tasks
| Sprint | Goal | Key Tasks | Exit Criteria |
| ---------------------- | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| **S0 (2 wks)** | Project bootstrap | • Repo + CI skeleton<br>• SQLite schema + migrations<br>`marlin-cli init` & basic scan<br>• Hyperfine perf baseline | CLI scans dir; tests pass; artefact builds |
| **S13 (M1, 6 wks)** | Tagging + virtual folders MVP | • Indexer daemon in Rust<br>• CRUD tags/attributes via CLI & DB<br>• Dolphin plugin: sidebar + tag view<br>• KIO `tags://` virtual folder<br>• Bulkedit dialog | 100kfile corpus coldstart ≤3s; user can tag files & navigate `tags://Urgent` |
| **S46 (M2, 6 wks)** | Sync & version control | • Changelog table + diff viewer<br>• LiteFS replication PoC<br>• WebSocket delta sync<br>• Conflict queue UI + lastwritewins fallback | Two devices sync metadata in <1s roundtrip; rollback works |
| **S79 (M3, 6 wks)** | NLP search & Visual Builder | Integrate Tantivy FTS + ONNX intent model<br>• Toggle exact vs natural search<br>• QML Visual Builder with live query string | NL query "docs Alice edited last week" returns expected set in ≤300 ms |
| **S1013 (M4, 8 wks)** | Plugin marketplace & mobile companion | • IPC sandbox + manifest spec<br>• Sample plugins (image EXIF autotagger)<br>• Flutter readonly client<br>• LDAP/OIDC enterprise auth | First external plugin published; mobile app lists smart folders |
### 10.4 Tooling & Infrastructure
* **Issue tracking**  Jira → labels `component/indexer`, `component/ui`.
* **Docs**  mkdocsmaterial hosted on GitHub Pages; automatic diagram generation via `cargo doc` + Mermaid.
* **Nightly Perf Benchmarks**  Run in CI against 10k, 100k, 1M synthetic corpora; fail build if P95 query > target.
* **Security**  Dependabot, Trivy scans, optional SLSA level 2 provenance for releases.
### 10.5 Risks & Mitigations
| Risk | Impact | Mitigation |
| ------------------------------ | ---------------- | --------------------------------------------------------------------------- |
| CRDT complexity | Delays M2 | Ship LWW first; schedule CRDT refactor postlaunch |
| File system event overflow | Index corruption | Debounce & autofallback to full rescan; alert user |
| Crossdistro packaging pain | Adoption drops | Stick to Flatpak; AppImage only for power users; collect telemetry (optin) |
| Scaling >1M files on slow HDD | Perf complaints | Offer "index on SSD" wizard; tune FTS page cache |
### 10.6 Budget & Timeline Snapshot
* **Total dev time**  30 weeks.
* **Buffer** +10 % (3 weeks) for holidays & unknowns → **33 weeks** (\~8 months).
* **Rough budget** (3 FTE avg × 33 wks × \$150k/yr) ≈ **\$285k** payroll + \$15k ops / tooling.
---

29
src/cli.rs Normal file
View File

@@ -0,0 +1,29 @@
use std::path::PathBuf;
use clap::{Parser, Subcommand};
/// Marlin metadata-driven file explorer (CLI utilities)
#[derive(Parser, Debug)]
#[command(author, version, about)]
pub struct Cli {
#[command(subcommand)]
pub command: Commands,
}
#[derive(Subcommand, Debug)]
pub enum Commands {
/// Initialise the database (idempotent)
Init,
/// Scan a directory and populate the file index
Scan {
/// Directory to walk
path: PathBuf,
},
/// Tag files matching a glob pattern
Tag {
/// Glob pattern (quote to avoid shell expansion)
pattern: String,
/// Tag name
tag: String,
},
}

31
src/config.rs Normal file
View File

@@ -0,0 +1,31 @@
use std::path::{Path, PathBuf};
use anyhow::Result;
use directories::ProjectDirs;
/// Runtime configuration (currently just the DB path).
#[derive(Debug, Clone)]
pub struct Config {
pub db_path: PathBuf,
}
impl Config {
/// Resolve configuration from environment or XDG directories.
pub fn load() -> Result<Self> {
let db_path = std::env::var_os("MARLIN_DB_PATH")
.map(PathBuf::from)
.or_else(|| {
ProjectDirs::from("io", "Marlin", "marlin")
.map(|dirs| dirs.data_dir().join("index.db"))
})
.unwrap_or_else(|| Path::new("index.db").to_path_buf());
std::fs::create_dir_all(
db_path
.parent()
.expect("db_path should always have a parent directory"),
)?;
Ok(Self { db_path })
}
}

22
src/db/migrations.sql Normal file
View File

@@ -0,0 +1,22 @@
PRAGMA foreign_keys = ON;
CREATE TABLE IF NOT EXISTS files (
id INTEGER PRIMARY KEY,
path TEXT NOT NULL UNIQUE,
size INTEGER,
mtime INTEGER
);
CREATE TABLE IF NOT EXISTS tags (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE IF NOT EXISTS file_tags (
file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
tag_id INTEGER NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (file_id, tag_id)
);
CREATE INDEX IF NOT EXISTS idx_files_path ON files(path);
CREATE INDEX IF NOT EXISTS idx_file_tags_tag_id ON file_tags(tag_id);

28
src/db/mod.rs Normal file
View File

@@ -0,0 +1,28 @@
use std::path::Path;
use anyhow::Result;
use rusqlite::{params, Connection};
const MIGRATIONS_SQL: &str = include_str!("migrations.sql");
/// Open (or create) the SQLite database and run embedded migrations.
pub fn open<P: AsRef<Path>>(db_path: P) -> Result<Connection> {
let mut conn = Connection::open(db_path)?;
conn.pragma_update(None, "journal_mode", "WAL")?;
conn.execute_batch(MIGRATIONS_SQL)?;
Ok(conn)
}
/// Ensure a tag exists, returning its id.
pub fn ensure_tag(conn: &Connection, tag: &str) -> Result<i64> {
conn.execute(
"INSERT OR IGNORE INTO tags(name) VALUES (?1)",
params![tag],
)?;
let id: i64 = conn.query_row(
"SELECT id FROM tags WHERE name = ?1",
params![tag],
|row| row.get(0),
)?;
Ok(id)
}

13
src/logging.rs Normal file
View File

@@ -0,0 +1,13 @@
use tracing_subscriber::{fmt, EnvFilter};
/// Initialise global tracing subscriber.
///
/// Reads `RUST_LOG` for filtering, falls back to `info`.
pub fn init() {
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));
fmt()
.with_target(false)
.with_level(true)
.with_env_filter(filter)
.init();
}

60
src/main.rs Normal file
View File

@@ -0,0 +1,60 @@
mod cli;
mod config;
mod db;
mod logging;
mod scan;
use anyhow::Result;
use cli::{Cli, Commands};
use glob::glob;
use rusqlite::params;
use tracing::{error, info};
fn main() -> Result<()> {
logging::init();
let args = Cli::parse();
let cfg = config::Config::load()?;
let conn = db::open(&cfg.db_path)?;
match args.command {
Commands::Init => {
info!("database initialised at {}", cfg.db_path.display());
}
Commands::Scan { path } => {
scan::scan_directory(&conn, &path)?;
}
Commands::Tag { pattern, tag } => {
apply_tag(&conn, &pattern, &tag)?;
}
}
Ok(())
}
/// Apply `tag` to every file that matches `pattern`.
fn apply_tag(conn: &rusqlite::Connection, pattern: &str, tag: &str) -> Result<()> {
let tag_id = db::ensure_tag(conn, tag)?;
let mut stmt_file = conn.prepare("SELECT id FROM files WHERE path = ?1")?;
let mut stmt_insert = conn.prepare(
"INSERT OR IGNORE INTO file_tags(file_id, tag_id) VALUES (?1, ?2)",
)?;
for entry in glob(pattern)? {
match entry {
Ok(path) => {
let path_str = path.to_string_lossy();
if let Ok(file_id) =
stmt_file.query_row(params![path_str], |row| row.get::<_, i64>(0))
{
stmt_insert.execute(params![file_id, tag_id])?;
info!(file = %path_str, tag = tag, "tagged");
} else {
error!(file = %path_str, "file not in index run `marlin scan` first");
}
}
Err(e) => error!(error = %e, "glob error"),
}
}
Ok(())
}

41
src/scan.rs Normal file
View File

@@ -0,0 +1,41 @@
use std::fs;
use std::path::Path;
use anyhow::Result;
use rusqlite::{params, Connection};
use tracing::{debug, info};
use walkdir::WalkDir;
/// Recursively walk `root` and upsert file metadata.
pub fn scan_directory(conn: &Connection, root: &Path) -> Result<usize> {
let tx = conn.transaction()?;
let mut stmt = tx.prepare(
r#"
INSERT INTO files(path, size, mtime)
VALUES (?1, ?2, ?3)
ON CONFLICT(path) DO UPDATE
SET size = excluded.size,
mtime = excluded.mtime
"#,
)?;
let mut count = 0usize;
for entry in WalkDir::new(root).into_iter().filter_map(Result::ok).filter(|e| e.file_type().is_file())
{
let meta = fs::metadata(entry.path())?;
let size = meta.len() as i64;
let mtime = meta
.modified()?
.duration_since(std::time::UNIX_EPOCH)?
.as_secs() as i64;
let path_str = entry.path().to_string_lossy();
stmt.execute(params![path_str, size, mtime])?;
count += 1;
debug!(file = %path_str, "indexed");
}
tx.commit()?;
info!(indexed = count, "scan complete");
Ok(count)
}