📋

Transcriber CLAUDE.md

Automated video-to-text transcription powered by ElevenLabs Scribe API with multi-language support and speaker diarization

Node.js Java TypeScript

by @kossakovsky

Sourced from kossakovsky/transcriber — Apache-2.0

View profile

CLAUDE.md

> Sourced from [kossakovsky/transcriber](https://github.com/kossakovsky/transcriber) — [Apache-2.0](https://github.com/kossakovsky/transcriber/blob/b2091d289aff9b844b790b91971d8a941dab36b1/CLAUDE.md).

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Node.js ESM project for automated video transcription using ElevenLabs Scribe API. The project processes videos through a three-stage pipeline:

1. **Video** → Extract audio from video files
2. **Audio** → Transcribe audio using ElevenLabs Scribe API
3. **Text** → Save transcription results

## Environment Setup

**Required**: Create a `.env` file with:
```
ELEVENLABS_API_KEY=your_api_key_here
```

Get your API key from: https://elevenlabs.io/app/speech-to-text

## Running the Application

```bash
# Install dependencies
npm install

# Run the interactive transcription pipeline (default: English)
npm start

# Run with a specific language
npm start -- --lang=ru

# Using node directly
node index.js --lang=es
```

## Folder Structure

The project uses three main folders (all added to .gitignore):

- **video/** - Place your video files here (.mp4, .mov)
- **audio/** - Extracted audio files in MP3 format (auto-generated)
- **text/** - Transcription results in .txt format (auto-generated)

## Architecture

### index.js (Main Entry Point)

**Core Workflow**:
1. Scans `video/` folder for .mp4 and .mov files
2. Presents interactive menu for each file (continue/skip/exit)
3. Extracts audio from video → saves as MP3 in `audio/` (skips if already exists)
4. Transcribes audio using ElevenLabs Scribe API → saves to `text/`
5. Automatically skips files that already have a .txt output

**Interactive Features**:
- Uses `inquirer` to present user-friendly menus before processing each file
- User can choose to: continue processing, skip current file, or exit program
- Shows progress indicators (e.g., "File 3/10")
- Displays detailed file info (size, duration) before transcription

**Key Features**:
- **Smart file handling**: Checks for existing .txt and .mp3 files to avoid reprocessing
- **Large file support**: Validates files up to 3GB and 10 hours duration before processing
- **MP3 format**: Extracts audio as compressed MP3 using libmp3lame codec
- **Speaker diarization**: Automatically identifies different speakers (configurable)
- **Robust error handling**: Continues processing remaining files if one fails
- **English language**: Default transcription language is English ("en")
- **High accuracy**: Uses ElevenLabs Scribe v1 model

**ElevenLabs Scribe API Configuration** (lines 28-95):

The `TRANSCRIPTION_CONFIG` object contains all configurable parameters:
- `model_id`: "scribe_v1" (stable) or "scribe_v1_experimental"
- `language_code`: "en" (English) - set to null for auto-detection
- `diarize`: true - enables speaker identification
- `num_speakers`: null - auto-detect number of speakers (1-32 if specified)
- `diarization_threshold`: null - uses model default (typically 0.22)
- `tag_audio_events`: true - tags (laughter), (footsteps), etc.
- `timestamps_granularity`: "word" - options: "none", "word", "character"
- `temperature`: null - output randomness (0.0-2.0)
- `seed`: null - for deterministic results
- `use_multi_channel`: false - for multi-channel audio (max 5 channels)
- `file_format`: "other" - or "pcm_s16le_16" for specific format
- `enable_logging`: true - set false for zero-retention (enterprise only)
- `webhook`: false - send results to webhook instead of waiting
- `webhook_id`: null - specific webhook to use

**Command Line Arguments**:
- `--lang` - Override transcription language (e.g., `--lang=ru`, `--lang=es`)
  - If not specified, defaults to "en" (English)
  - Set to "null" for auto-detection
  - Parsed at startup using `parseArgs()` function

**Key Functions**:
- `parseArgs()` - Parses command line arguments (index.js:28)
- `getVideoFiles()` - Scans video directory for supported formats (index.js:136)
- `extractAudio()` - Uses fluent-ffmpeg to extract MP3 audio (index.js:159)
- `getAudioMetadata()` - Gets duration and size using ffprobe (index.js:186)
- `transcribeWithElevenLabs()` - Sends file to ElevenLabs API with config (index.js:216)
- `showFileMenu()` - Interactive inquirer menu (index.js:299)
- `transcribeAudioFile()` - Full transcription workflow with validation (index.js:332)
- `processVideoFile()` - Complete video→audio→text pipeline (index.js:398)
- `main()` - Entry point with initialization and loop (index.js:457)

**File Limits**:
- Supported video formats: `.mp4`, `.mov` (index.js:103)
- Audio format: MP3 with libmp3lame codec
- File size limit: 3GB (ElevenLabs limit, validated at index.js:327)
- Duration limit: 10 hours (ElevenLabs limit, validated at index.js:333)
- API timeout: 20 minutes for large files (index.js:245)

## Dependencies

- **axios** - HTTP client for ElevenLabs API calls
- **dotenv** - Environment variable management
- **fluent-ffmpeg** - FFmpeg wrapper for audio/video processing
- **@ffmpeg-installer/ffmpeg** - Bundled FFmpeg binary (no system install needed)
- **@ffprobe-installer/ffprobe** - Bundled FFprobe binary for metadata extraction
- **form-data** - Multipart form data for file uploads
- **inquirer** - Interactive CLI prompts and menus

## Usage Example

```bash
# 1. Place video files in video/ folder
cp /path/to/your/video.mp4 ./video/

# 2. Run the interactive processor
npm start

# 3. Follow the menu prompts for each file
# Choose: ✅ Continue, ⏭️ Skip, or 🚪 Exit

# 4. Find transcription in text/ folder
cat ./text/video.txt
```

## Changing Transcription Settings

### Language Setting

**Via command line (recommended)**:
```bash
npm start -- --lang=ru     # Russian
npm start -- --lang=es     # Spanish
npm start -- --lang=null   # Auto-detect
```

**Via configuration file**:
Modify the `TRANSCRIPTION_CONFIG` object in index.js (lines 28-95):
```javascript
language_code: "en",  // English
language_code: null,  // Auto-detect
```

**Note:** Command line arguments override the configuration file.

**Disable speaker diarization**:
```javascript
diarize: false,
```

**Specify exact number of speakers**:
```javascript
num_speakers: 2,  // Must be between 1-32
```

**Change timestamp detail**:
```javascript
timestamps_granularity: "character",  // More granular than "word"
```

ElevenLabs Scribe supports 99 languages. See: https://elevenlabs.io/docs/capabilities/speech-to-text

## Error Handling

The script implements comprehensive error handling:
- Validates file size (max 3GB) and duration (max 10 hours) before processing
- Detailed error logging (API status, network errors, stack traces)
- Continues with remaining files if one fails
- Won't re-process files with existing .txt output
- Shows clear error messages in Russian with troubleshooting context

## API Pricing

ElevenLabs Scribe pricing:
- Starting from $0.40 per hour of transcribed audio
- Lower rates available at scale with Enterprise plans
- More info: https://elevenlabs.io/speech-to-text

Add to your project

Paste into your project's CLAUDE.md or ~/.claude/CLAUDE.md for global rules.

More for Node.js

▲

Next.js Expert

by @Claude Rules

Expert-level Next.js development with App Router, Server Components, and modern patterns.

Next.jsReactTypeScript

🟢

Node.js Express API

by @Claude Rules

Building scalable Node.js REST APIs with Express, middleware, and proper async patterns.

Node.jsExpressJavaScript

💚

Vue.js Composition API

by @Claude Rules

Modern Vue 3 development with Composition API, Pinia, and TypeScript.

Vue.jsTypeScriptFrontend

🔴

Angular Enterprise

by @Claude Rules

Enterprise Angular development with RxJS, NgRx, standalone components, and best practices.

AngularTypeScriptEnterprise

📋

Mindx CLAUDE.md

by @DotNetAge

一个可自主进化的数字化分身

ReactTypeScriptPythonTailwind CSSGo

📋

Coolify Docs CLAUDE.md

by @coollabsio

Documentation for Coolify

TypeScriptVue.jsDockerTailwind CSSJavaScript

MCP servers for Node.js

netdata/netdata#Netdata

🎖️ 🏠 ☁️ 📟 🍎 🪟 🐧 - Discovery, exploration, reporting and root cause analysis using all observability data, including metrics, logs, systems, containers, processes, and network connections

upstash/context7

📇 ☁️ - Up-to-date code documentation for LLMs and AI code editors.

microsoft/playwright-mcp

Official Microsoft Playwright MCP server, enabling LLMs to interact with web pages through structured accessibility snapshots

Browse all MCP servers →

Browse by Tag

Node.js Java TypeScript

Get the Claude Code Starter Pack

Top CLAUDE.md rules for Next.js, TypeScript, Python, Go, and React — free.