# CleftAI Audio Transcription & Text Summarization API ## Overview CleftAI provides a production-ready API service for audio transcription and text summarization using OpenAI's Whisper and GPT-4o models. The service converts audio files and text into well-formatted notes with automatic tag generation (exactly 3 comma-separated tags), supports note merging, appending, and updating operations, and includes API key authentication with asynchronous job processing. ## Base URL https://api.cleftai.com/api ## Authentication All endpoints require API key authentication using one of these methods: - Authorization header: `Authorization: Bearer sk-proj-cleftai-your-api-key-here` - Custom header: `X-API-Key: sk-proj-cleftai-your-api-key-here` - Query parameter: `?api_key=sk-proj-cleftai-your-api-key-here` API keys follow the OpenAI format pattern: `sk-proj-cleftai-*` and are managed manually for security. ## API Endpoints ### Audio Processing **POST /api/audio/process** - Uploads audio files for transcription using OpenAI Whisper - Formats transcripts into organized notes with automatic tag generation (exactly 3 tags) - Supported formats: mp3, wav, m4a, mp4, aac, ogg, webm (max 25MB) - Parameters: audio_file (required), custom_instructions (optional), language (optional), webhook_url (optional) - Returns: job_id for asynchronous processing ### Text Processing **POST /api/text/summarize** - Processes text directly for voice memo formatting and tag generation (exactly 3 tags) - Parameters: text (required), custom_instructions (optional), webhook_url (optional) - Returns: job_id for asynchronous processing ### Note Merging **POST /api/notes/merge** - Combines multiple existing notes into a single cohesive document - Parameters: note_ids (array of UUIDs, min 2), custom_instructions (optional), webhook_url (optional) - Returns: job_id for asynchronous processing ### Note Appending **POST /api/notes/append** - Transcribes audio file and returns content to append to an existing note - Parameters: target_note_id (UUID), audio_file (required), custom_instructions (optional), language (optional), webhook_url (optional) - Returns: job_id for asynchronous processing with target_note_id for app-side appending ### Note Updating **POST /api/notes/update** - Reprocesses an existing note with fresh AI analysis and custom instructions - Updates content while preserving the same note UUID - Parameters: note_id (UUID), custom_instructions (optional), webhook_url (optional) - Returns: job_id for asynchronous processing with "reprocessed": true flag ### Job Status Tracking **GET /api/jobs/{job_id}** - Retrieves processing status and results for any job - Returns: job status, completion data, note_id, tags (exactly 3), formatted content ### Authentication Status **GET /api/auth/status** - Verifies API key validity and returns authentication status - Returns: authentication confirmation and truncated API key info ## Response Format All successful processing jobs return: ```json { "success": true, "job_id": "unique-job-identifier", "status": "completed", "data": { "note_id": "unique-note-uuid", "tags": "meeting, project, deadline", "summary": "# Formatted Notes\n\n## Key Points\n- Point 1\n- Point 2\n\n## Action Items\n- [ ] Task 1\n- [ ] Task 2", "transcription": "Full transcription (audio jobs only)", "processing_info": { "whisper_model": "whisper-1", "gpt_model": "gpt-4o", "processing_time": 4.2, "word_count": 150 }, "reprocessed": false } } ``` For update operations, the response includes `"reprocessed": true` to indicate the note was reprocessed with fresh AI analysis. ## Key Features - **UUID Note System**: Every processed note receives a unique UUID for identification and future operations - **Asynchronous Processing**: All jobs are processed asynchronously with real-time status tracking - **Custom Instructions**: Users can provide specific formatting instructions for tailored output - **Language Support**: Optional language parameter for improved Whisper transcription accuracy (e.g., 'en', 'es', 'fr', 'de') - **Webhook Support**: Optional webhook notifications for job completion - **Voice Memo Formatting**: Specialized prompts convert transcripts to first-person note format - **Automatic Tag Generation**: Creates exactly 3 relevant tags in comma-separated format for content categorization - **Markdown Output**: Proper formatting with headings, checkboxes, and bullet points - **Note Operations**: Merge multiple notes, append content to existing notes, or update notes with fresh AI processing - **File Validation**: Audio format and size validation with detailed error messages and format detection ## Rate Limits & Restrictions - Audio files: 25MB maximum size - Processing timeout: 10 minutes for audio jobs, 2 minutes for text jobs - Supported audio formats: mp3, wav, m4a, mp4, aac, ogg, webm - Rate limits may apply based on API key tier ## Example Usage ### Audio Processing ```bash curl -X POST https://api.cleftai.com/api/audio/process \ -H "Authorization: Bearer sk-proj-cleftai-your-api-key-here" \ -H "Content-Type: multipart/form-data" \ -F "audio_file=@meeting.mp3" \ -F "custom_instructions=Focus on action items and decisions" \ -F "language=en" ``` ### Text Processing ```bash curl -X POST https://api.cleftai.com/api/text/summarize \ -H "Authorization: Bearer sk-proj-cleftai-your-api-key-here" \ -H "Content-Type: application/json" \ -d '{ "text": "Meeting discussion about project timeline and budget", "custom_instructions": "Create organized notes with checkboxes" }' ``` ### Note Merging ```bash curl -X POST https://api.cleftai.com/api/notes/merge \ -H "Authorization: Bearer sk-proj-cleftai-your-api-key-here" \ -H "Content-Type: application/json" \ -d '{ "note_ids": ["uuid1", "uuid2", "uuid3"], "custom_instructions": "Combine into cohesive summary" }' ``` ### Note Appending ```bash curl -X POST https://api.cleftai.com/api/notes/append \ -H "Authorization: Bearer sk-proj-cleftai-your-api-key-here" \ -H "Content-Type: multipart/form-data" \ -F "audio_file=@additional_content.mp3" \ -F "target_note_id=existing-note-uuid" \ -F "custom_instructions=Format as bullet points" \ -F "language=es" ``` ### Job Status Check ```bash curl -X GET https://api.cleftai.com/api/jobs/job-id-here \ -H "Authorization: Bearer sk-proj-cleftai-your-api-key-here" ``` ### Note Updating ```bash curl -X POST https://api.cleftai.com/api/notes/update \ -H "Authorization: Bearer sk-proj-cleftai-your-api-key-here" \ -H "Content-Type: application/json" \ -d '{ "note_id": "existing-note-uuid", "custom_instructions": "Focus more on actionable insights" }' ``` ## Technical Implementation - **Backend**: Express.js with TypeScript - **AI Models**: OpenAI Whisper-1 for transcription, GPT-4o for text processing - **Database**: PostgreSQL with Drizzle ORM - **Authentication**: API key-based with multiple authentication methods - **File Processing**: Multer for multipart uploads with validation - **Error Handling**: Comprehensive error responses with detailed messages ## Use Cases - Voice memo transcription and formatting - Meeting notes automation - Interview transcript processing - Lecture recording summarization - Note organization and management - Content consolidation from multiple sources - Research note compilation This API is designed for developers building applications that need reliable audio transcription and intelligent text formatting capabilities with enterprise-grade features and scalability.