Endpoint
POST /api/v1/music/persona
Creates a reusable AI singer persona from an existing clip. Once created, pass the returned persona_id to Generate Music to apply the vocal style to any new track.
Request body
| Field | Type | Required | Description |
|---|
name | string | Yes | Display name for the persona (e.g. "My Artist" or "我的歌手"). |
clip_id | string | Yes | Source clip ID whose vocal style will be learned. |
Response
{
"code": 0,
"message": "ok",
"request_id": "req-1710000000000",
"data": {
"taskBatchId": "batch-xyz789",
"items": [
{
"id": "persona-abc123",
"status": "complete"
}
]
}
}
| Field | Type | Description |
|---|
taskBatchId | string | Internal batch identifier |
items | array | Created persona items |
items[].id | string | Persona ID — pass this as persona_id in generation requests |
items[].status | string | "complete" when persona is ready to use |
Using your Persona to generate music
Once you have a persona_id, pass it to any Generate Music request. The AI will sing the generated track in the learned vocal style.
curl -X POST https://api.example.com/api/v1/music/generate \
-H "Authorization: Bearer sk-mm-your-key" \
-H "Content-Type: application/json" \
-d '{
"prompt": "[Verse]\n月光照在窗台\n风轻轻吹来\n[Chorus]\n你是我心中的光",
"title": "心中的光",
"tags": "流行 抒情 女声",
"persona_id": "persona-abc123"
}'
persona_id works with both custom mode (prompt + tags) and inspiration mode (gpt_description_prompt). The vocal style is applied regardless of which generation mode you choose.
Complete workflow
1. Record or generate a source clip with the target vocal style
↓
2. (Recommended) Run Stem Separation to isolate clean vocals
↓
3. POST /api/v1/music/persona → receive persona_id
↓
4. POST /api/v1/music/generate + persona_id → task_id
↓
5. Poll GET /api/v1/task/{task_id} → audio_url
Create persona example
curl -X POST https://api.example.com/api/v1/music/persona \
-H "Authorization: Bearer sk-mm-your-key" \
-H "Content-Type: application/json" \
-d '{
"name": "My Custom Singer",
"clip_id": "abc123def456"
}'
Choosing a source clip
For best results, use a clip with:
- Clear, isolated vocals (minimal background noise)
- At least 30 seconds of vocal content
- A consistent vocal style throughout
Run Stem Separation on a clip first to get a clean vocal track, then use that vocal audio as the source for persona creation.