PHM Vibench CLAUDE.md
CLAUDE.md for the PHM Vibench project (Jupyter Notebook).
> Sourced from [PHMbench/PHM-Vibench](https://github.com/PHMbench/PHM-Vibench) — [Apache-2.0](https://github.com/PHMbench/PHM-Vibench/blob/b7e62c4c97693d058b92b63b5ec6d0b201799a4f/src/model_factory/RNN/CLAUDE.md).
# CLAUDE.md
This file documents PHM-Vibench’s intent, architecture, and change strategy.
- Canonical onboarding + runnable path: `README.md` and `configs/README.md`
- Copy-paste run/validate commands: `AGENTS.md`
## Repository Overview
PHM-Vibench is a configuration-first benchmark platform for industrial vibration signal analysis (fault diagnosis /
predictive maintenance). The core value is:
- Reproducible experiments (configs are the experiment contract).
- Modular extension (swap datasets/models/tasks/trainers via registries + factories, not by rewriting pipelines).
## Key Architecture Components
### Factory Design Pattern (wiring points)
- `src/data_factory/`: dataset loading, splits, preprocessing, dataloaders.
- `src/model_factory/`: backbones/foundation models + heads (registry-driven).
- `src/task_factory/`: training logic / metrics / losses (LightningModule-like tasks).
- `src/trainer_factory/`: PyTorch Lightning Trainer wiring (callbacks/loggers/devices).
Module deep dives live under:
- `src/data_factory/CLAUDE.md`
- `src/model_factory/CLAUDE.md`
- `src/task_factory/CLAUDE.md`
- `src/trainer_factory/CLAUDE.md`
- Config loader internals: `src/configs/CLAUDE.md`
### Pipeline System (orchestrators)
Pipelines assemble factories in a fixed order (load config → build data → build model → build task → build trainer).
Common pipelines:
- `src/Pipeline_01_default.py`: standard single-stage pipeline.
- `src/Pipeline_02_pretrain_fewshot.py`: pretrain + few-shot pipeline.
- `src/Pipeline_03_multitask_pretrain_finetune.py`: multi-task pretrain/fine-tune pipeline.
- `src/Pipeline_ID.py`: ID-based ingestion variant.
## Configuration System (maintained)
### Single supported entrypoint (contract)
Run via:
```bash
python main.py --config <yaml> [--override key=value ...]
```
Pipeline is selected by YAML top-level `pipeline:` (not by a `--pipeline` CLI flag).
### 5-block config model
All maintained configs are organized into:
- `environment` / `data` / `model` / `task` / `trainer`
Composition rules (low → high precedence):
1) `base_configs.*` YAML files
2) The demo YAML’s own block overrides
3) Optional machine-local override `configs/local/local.yaml` (or `--local_config ...`)
4) CLI `--override key=value` (repeatable)
### Single Source of Truth (SSOT) + tooling
To reduce ambiguity and make configs traceable:
- Registry (authoritative index): `configs/config_registry.csv`
- Registry schema: `docs/config_registry_schema.md`
- Generated atlas (human-readable): `docs/CONFIG_ATLAS.md` (regen: `python -m scripts.gen_config_atlas`)
- Inspect tool: `python -m scripts.config_inspect` (resolved config + sources + instantiation targets + sanity)
- Schema validate: `python -m scripts.validate_configs` (loader-driven + pydantic; see `src/config_schema/`)
Start here for user-facing config docs: `configs/README.md`.
## Paper / Research Workflows (kept separate)
Paper-grade experiments live in a git submodule to avoid confusing the main onboarding path:
- `paper/2025-10_foundation_model_0_metric/` (init requires network):
- `git submodule update --init --recursive paper/2025-10_foundation_model_0_metric`
- See `paper/README_SUBMODULE.md`
Rule: do not make core repo validation depend on paper-only scripts/configs.
### HSE-Prompt research experiments (paper-only)
- Location: `paper/2025-10_foundation_model_0_metric/` (submodule)
- Goal: HSE/HSE-Prompt cross-system generalization studies
- If the submodule is not initialized, use `configs/demo/05_pretrain_fewshot/` and `configs/demo/06_pretrain_cddg/` as
the runnable reference in this repo.
## Configuration Standards (reduce ambiguity)
### Template source
- Maintained templates: `configs/demo/`
- Local research variants: `configs/experiments/<task_dataset_variant>/`
- Legacy configs (planned migration/removal): `configs/reference/` (do not template from this directory)
### Model component naming
Use registry-style IDs:
- embeddings: `E_**_*`
- backbones: `B_**_*`
- heads: `H_**_*`
Example (correct):
```yaml
model:
type: "ISFM"
name: "M_01_ISFM"
embedding: "E_01_HSE"
backbone: "B_04_Dlinear"
task_head: "H_01_Linear_cla"
```
Example (incorrect; avoid ambiguous names):
```yaml
model:
embedding: "HSE" # ambiguous / not registry-addressed
backbone: "Dlinear" # missing prefix (B_**_)
task_head: "Linear" # missing prefix (H_**_)
```
### Dataset selection (traceable / reproducible)
Prefer numeric IDs from metadata:
```yaml
task:
target_system_id: [1, 2]
```
Those IDs should come from the metadata file referenced by `data.metadata_file` (typically the `Dataset_id` column).
How to verify the mapping (example for Excel metadata):
```bash
python - <<'PY'
import pandas as pd
df = pd.read_excel("data/metadata.xlsx")
print(df[["Dataset_id", "Name"]].drop_duplicates().sort_values("Dataset_id"))
PY
```
## Common Pitfalls (and what to do)
### “Which value is actually used?”
Use inspect to get resolved config + field sources:
```bash
python -m scripts.config_inspect --config <yaml> --override key=value
```
### Component import errors
If a model component cannot be imported, it usually means the ID is not registered or mistyped. Use:
```bash
python -m scripts.config_inspect --config <yaml> --dump targets --format yaml
```
### Dataset / path errors
- Avoid hard-coded absolute paths in committed configs.
- Put machine-specific paths into `configs/local/local.yaml` or pass `--override data.data_dir=/path/to/...`.
## Common Development Commands
For practical runbook and copy-paste commands, see [@AGENTS.md - Quick Commands].
Key commands include:
- Smoke test: `python main.py --config configs/demo/00_smoke/dummy_dg.yaml`
- Config inspect: `python -m scripts.config_inspect --config <yaml>`
- Validate: `python -m scripts.validate_configs`
- Tests: `python -m pytest test/`
## Results and Output (where files go)
- Default base directory is `save/`.
- If `environment.output_dir` is set, outputs go under that directory instead (many demos use `results/demo/...`).
- Final run directory is `base_dir/<experiment_name>/iter_<k>/` (see `src/configs/config_utils.py:path_name`).
Typical artifacts depend on trainer/task, but usually include checkpoints, metrics/logs, and a config snapshot.
## Model Architecture (high-level map)
### Foundation models (ISFM family)
- `M_01_ISFM`, `M_02_ISFM`, `M_03_ISFM`
### Common backbones (examples)
- `B_04_Dlinear`, `B_06_TimesNet`, `B_08_PatchTST`, `B_09_FNO`
### Heads (examples)
- `H_01_Linear_cla` (classification), `H_03_Linear_pred` (prediction)
## Task Types (high-level map)
- Classification / DG / CDDG (domain generalization)
- FS / GFS (few-shot and generalized few-shot)
- Pretrain (self-supervised / contrastive pretraining)
## Environment Setup (practical notes)
- Core dependencies are listed in `requirements.txt`.
- Keep machine-specific paths out of committed configs; use `configs/local/local.yaml` or CLI `--override`.
## Important Notes (how to keep the repo maintainable)
- Always register new components via the appropriate factory/registry instead of hardcoding imports in pipelines.
- Keep demos runnable with minimal assumptions; prefer smoke-friendly defaults (e.g. `--override trainer.num_epochs=1`).
- When changing config structure, update SSOT (registry/atlas) and add a migration note if any user-facing key changes.
- Vibecoding (AI-assisted changes): default to the simplest viable implementation. Avoid over-engineering and
unnecessary defensive design; apply Occam’s razor; reason from first principles; iterate incrementally.
## Hard Constraints (Do Not Break)
- Do not introduce breaking changes to `main.py` public CLI or core YAML keyspaces (`environment/data/model/task/trainer`)
without a compatibility layer + migration notes.
- Docs must be “本科生能跑 + 博士生能改”: minimal runnable commands first, then field explanation + pitfalls + extension
points, with links to consumer code.
- Config fields must be traceable: any key should answer “final value comes from where” + “consumed where”.
## Required Change Order (stable)
1) Registry → 2) Atlas → 3) Inspect → 4) Schema validate → 5) `configs/**/README.md` → 6) CI/tests + acceptance
## PR/Step Review Checklist
- Change list: what changed and why (tight scope).
- How to validate: copy-paste commands from `AGENTS.md`.
- Expected outputs: generated files + output directory pattern.
Add to your project
Paste into your project's CLAUDE.md or ~/.claude/CLAUDE.md for global rules.
More for Python
Python FastAPI Expert
by @Claude Rules
Building high-performance REST APIs with FastAPI, Pydantic, and async Python.
Django Web Framework
by @Claude Rules
Full-stack Django development with DRF, proper models, and security best practices.
Mindx CLAUDE.md
by @DotNetAge
一个可自主进化的数字化分身
Minimal Second Brain CLAUDE.md
by @gokhanarkan
A minimal, AI-native Obsidian vault template. 3 folders, manifest files for Claude/Copilot, automated maintenance.
Awesome Claude Notes CLAUDE.md
by @loulanyue
🏄 帮助你快速搭建 Claude Code 与 AI Agent 生产力工作流的实战仓库 🏄 A practical toolkit to help you quickly build high-productivity workflows for Claude Code and AI agents
N8n Install CLAUDE.md
by @kossakovsky
🚀 Self-hosted AI automation platform. Deploy n8n, Ollama, Flowise, RAG, Supabase & 30+ tools with one command. Auto HTTPS. Free Zapier/Make alternative.
MCP servers for Python
microsoft/markitdown
🎖️ 🐍 🏠 - MCP tool access to MarkItDown -- a library that converts many file formats (local or remote) to Markdown for LLM consumption.
mindsdb/mindsdb
Connect and unify data across various platforms and databases with .
FastMCP
🐍 - A high-level framework for building MCP servers in Python
Browse by Tag
Get the Claude Code Starter Pack
Top CLAUDE.md rules for Next.js, TypeScript, Python, Go, and React — free.
