Skip to main content

Endpoint

POST /api/v1/music/persona
Creates a reusable AI singer persona from an existing clip. Once created, pass the returned persona_id to Generate Music to apply the vocal style to any new track.

Request body

FieldTypeRequiredDescription
namestringYesDisplay name for the persona (e.g. "My Artist" or "我的歌手").
clip_idstringYesSource clip ID whose vocal style will be learned.

Response

{
  "code": 0,
  "message": "ok",
  "request_id": "req-1710000000000",
  "data": {
    "taskBatchId": "batch-xyz789",
    "items": [
      {
        "id": "persona-abc123",
        "status": "complete"
      }
    ]
  }
}
FieldTypeDescription
taskBatchIdstringInternal batch identifier
itemsarrayCreated persona items
items[].idstringPersona ID — pass this as persona_id in generation requests
items[].statusstring"complete" when persona is ready to use

Using your Persona to generate music

Once you have a persona_id, pass it to any Generate Music request. The AI will sing the generated track in the learned vocal style.
curl -X POST https://api.example.com/api/v1/music/generate \
  -H "Authorization: Bearer sk-mm-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "[Verse]\n月光照在窗台\n风轻轻吹来\n[Chorus]\n你是我心中的光",
    "title": "心中的光",
    "tags": "流行 抒情 女声",
    "persona_id": "persona-abc123"
  }'
persona_id works with both custom mode (prompt + tags) and inspiration mode (gpt_description_prompt). The vocal style is applied regardless of which generation mode you choose.

Complete workflow

1. Record or generate a source clip with the target vocal style

2. (Recommended) Run Stem Separation to isolate clean vocals

3. POST /api/v1/music/persona  →  receive persona_id

4. POST /api/v1/music/generate  +  persona_id  →  task_id

5. Poll GET /api/v1/task/{task_id}  →  audio_url

Create persona example

curl -X POST https://api.example.com/api/v1/music/persona \
  -H "Authorization: Bearer sk-mm-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Custom Singer",
    "clip_id": "abc123def456"
  }'

Choosing a source clip

For best results, use a clip with:
  • Clear, isolated vocals (minimal background noise)
  • At least 30 seconds of vocal content
  • A consistent vocal style throughout
Run Stem Separation on a clip first to get a clean vocal track, then use that vocal audio as the source for persona creation.