Audio Processing API

Scalable audio REST API to convert, trim, concatenate, optimize, and compress audio files.

https://upcdn.io

Account

/W142hJk/

API

audio

File Path

/example.mp3

Parameters

?br=96

1 Upload your input

Firstly, your audio file must be uploaded or accessible to Bytescale:

Use the Bytescale Dashboard to upload a file manually.

Use the Upload Widget, Bytescale SDKs or Bytescale API to upload a file programmatically.

Use our external storage options to process external audio.

2 Build your audio URL

Build an audio processing URL:

Get the raw URL for your file:

https://upcdn.io/W142hJk/raw/example.mp3

Replace "raw" with "audio":

https://upcdn.io/W142hJk/audio/example.mp3

Add querystring parameters to control the output:

https://upcdn.io/W142hJk/audio/example.mp3?br=96

3 Play your audio

Play your audio by navigating to the URL from step 2.

By default, your audio will be encoded to AAC.

The default HTTP response will be an HTML webpage with an embedded audio player. This is for debug purposes only: developers are expected to override this behavior by specifying an f option when embedding audio into their webpages and apps.

Examples

Embed Code

Read the docs

MP3 Output

Read the docs

AAC Output

Read the docs

WAV Output

Read the docs

HLS Output

Read the docs

HLS-RT Output

Read the docs

Metadata

Read the docs

Example #1: Embedding an audio file

To embed audio in a webpage using Video.js:

<!DOCTYPE html>
<html>
<head>
  <link href="https://unpkg.com/video.js@7/dist/video-js.min.css" rel="stylesheet">
  <script src="https://unpkg.com/video.js@7/dist/video.min.js"></script>
  <style type="text/css">
    .audio-container {
      height: 316px;
      max-width: 600px;
    }
  </style>
</head>
<body>
  <div class="audio-container">
    <video-js
      class="vjs-fill vjs-big-play-centered"
      controls
      preload="auto">
      <p class="vjs-no-js">To play this audio please enable JavaScript.</p>
    </video-js>
  </div>
  <script>
    var vid = document.querySelector('video-js');
    var player = videojs(vid, {responsive: true});
    player.on('loadedmetadata', function() {
      // Begin playing from the start of the audio. (Required for 'f=hls-aac-rt'.)
      player.currentTime(player.seekable().start(0));
    });
    player.src({
      src: 'https://upcdn.io/W142hJk/audio/example.mp3!f=hls-aac-rt&br=80&br=256',
      type: 'application/x-mpegURL'
    });
  </script>
</body>
</html>

The f=hls-aac-rt output format is designed to reduce the wait time for your listeners when the given audio has not been transcoded before. Like the other output formats, this audio format incurs an initial delay while transcoding starts. However, unlike the other formats, once transcoding begins the audio will be streamed to listeners during transcoding. As with the other formats, once transcoded, the resulting audio will be cached and will not need to be transcoded again.

Example #3: Creating MP3 audio

To create an MP3 file:

Upload an input file (e.g. an audio or video file) or create an external file source.

Replace /raw/ with /audio/ in the file's URL, and then append ?f=mp3 to the URL.

Navigate to the URL (i.e. request the URL using a simple GET request).

Wait for status: "Succeeded" in the JSON response.

The result will contain a URL to the MP3 file:

https://upcdn.io/W142hJk/audio/example.mp3?f=mp3

{
  "jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
  "jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
  "jobId": "01H3211XMV1VH829RV697VE3WM",
  "jobType": "ProcessFileJob",
  "accountId": "W142hJk",
  "created": 1686916626075,
  "lastUpdated": 1686916669389,
  "status": "Succeeded",
  "summary": {
    "result": {
      "type": "Artifact",
      "artifact": "/audio.mp3",
      "artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=mp3&a=/audio.mp3"
    }
  }
}

Example #4: Creating AAC audio

To create an AAC file:

Upload an input file (e.g. an audio or video file) or create an external file source.

Replace /raw/ with /audio/ in the file's URL, and then append ?f=aac to the URL.

Navigate to the URL (i.e. request the URL using a simple GET request).

Wait for status: "Succeeded" in the JSON response.

The result will contain a URL to the AAC file:

https://upcdn.io/W142hJk/audio/example.mp3?f=aac

{
  "jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
  "jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
  "jobId": "01H3211XMV1VH829RV697VE3WM",
  "jobType": "ProcessFileJob",
  "accountId": "W142hJk",
  "created": 1686916626075,
  "lastUpdated": 1686916669389,
  "status": "Succeeded",
  "summary": {
    "result": {
      "type": "Artifact",
      "artifact": "/audio.aac",
      "artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=aac&a=/audio.aac"
    }
  }
}

Example #2: Creating WAV audio

To create a WAV file:

Upload an input file (e.g. an audio or video file) or create an external file source.

Replace /raw/ with /audio/ in the file's URL, and then append ?f=wav-riff to the URL.

Navigate to the URL (i.e. request the URL using a simple GET request).

Wait for status: "Succeeded" in the JSON response.

The result will contain a URL to the WAV file:

https://upcdn.io/W142hJk/audio/example.mp3?f=wav-riff

{
  "jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
  "jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
  "jobId": "01H3211XMV1VH829RV697VE3WM",
  "jobType": "ProcessFileJob",
  "accountId": "W142hJk",
  "created": 1686916626075,
  "lastUpdated": 1686916669389,
  "status": "Succeeded",
  "summary": {
    "result": {
      "type": "Artifact",
      "artifact": "/audio.wav",
      "artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=wav-riff&a=/audio.wav"
    }
  }
}

Example #5: Creating HLS audio with multiple bitrates

To create an HTTP Live Streaming (HLS) file:

Upload an input file (e.g. an audio or video file) or create an external file source.

Replace /raw/ with /audio/ in the file's URL, and then append ?f=hls-aac to the URL.

Add parameters from the Audio Transcoding API or Audio Compression API

You can create adaptive bitrate (ABR) audio by specifying multiple groups of bitrate and/or sample rate parameters. The end-user's audio player will automatically switch to the most appropriate variant during playback. By default, a single 96 kbps variant is produced.

You can specify up to 10 variants. Each variant's parameters must be adjacent on the querystring. For example: br=80&sr=24&br=256&sr=48 specifies 2 variants, whereas br=80&br=256&sr=24&sr=48 specifies 3 variants (which would most likely be a mistake). You can add next=true between groups of parameters to forcefully split them into separate variants.

Navigate to the URL (i.e. request the URL using a simple GET request).

Wait for status: "Succeeded" in the JSON response.

The result will contain a URL to the HTTP Live Streaming (HLS) file:

https://upcdn.io/W142hJk/audio/example.mp3?f=hls-aac&br=80&br=256

{
  "jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
  "jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
  "jobId": "01H3211XMV1VH829RV697VE3WM",
  "jobType": "ProcessFileJob",
  "accountId": "W142hJk",
  "created": 1686916626075,
  "lastUpdated": 1686916669389,
  "status": "Succeeded",
  "summary": {
    "result": {
      "type": "Artifact",
      "artifact": "/audio.m3u8",
      "artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=hls-aac&br=80&br=256&a=/audio.m3u8"
    }
  }
}

Example #6: Creating HLS audio with real-time transcoding

Real-time transcoding allows you to return HLS manifests (.m3u8 files) while they're being transcoded, rather than having to wait for the full transcode job to complete.

To create HTTP Live Streaming (HLS) audio with real-time transcoding:

Complete the steps from creating HLS audio.

Replace f=hls-aac with f=hls-aac-rt.

The result will be an M3U8 file that's dynamically updated as new segments finish transcoding:

https://upcdn.io/W142hJk/audio/example.mp4?f=hls-aac-rt

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=2038521,AVERAGE-BANDWIDTH=2038521,CODECS="mp4a.40.2"
example.mp3!f=hls-aac-rt&a=/0f/manifest.m3u8

Example #7: Extracting audio metadata

The Audio Metadata API allows you to extract the audio file's duration, codec, and more.

To extract an audio file's duration using JavaScript:

<!DOCTYPE html>
<html>
<body>
  <p>Please wait, loading audio metadata...</p>
  <script>
    async function getAudioDuration() {
      const response = await fetch("https://upcdn.io/W142hJk/audio/example.mp4?f=meta");
      const jsonData = await response.json();
      const audioTrack = (jsonData.tracks ?? []).find(x => x.type === "Audio");
      if (audioTrack === undefined) {
        alert("Cannot find audio metadata.")
      }
      else {
        alert(`Duration (seconds): ${audioTrack.duration}`)
      }
    }
    getAudioDuration().then(() => {}, e => alert(`Error: ${e}`))
  </script>
</body>
</html>

Supported Inputs

The Audio Processing API can transcode audio from video and audio files:

Supported Input Audio

The Audio Processing API can transcode audio from the following audio inputs:

File Extension(s)	Audio Container	Audio Codecs
.wma, .asf	Advanced Systems Format (ASF)	WMA, WMA2, WMA Pro
.fla, .flac	FLAC	FLAC
.mp3	MPEG-1 Layer 3	MP3
.ts, .m2ts	MPEG-2 TS	MP2, PCM
.aac, .mp4, .m4a	MPEG-4	AAC
.mka	Matroska Audio Container	Opus, FLAC
.oga	OGA	Opus, Vorbis, FLAC
.wav	Waveform Audio File	PCM

Supported Input Videos

The Audio Processing API can transcode audio from the following video inputs:

File Extension(s)	Video Container	Video Codecs
.gif	No Container	GIF 87a, GIF 89a
.m2v, .mpeg, .mpg	No Container	AVC (H.264), DV/DVCPRO, HEVC (H.265), MPEG-1, MPEG-2
.3g2	3G2	AVC (H.264), H.263, MPEG-4 part 2
.3gp	3GP	AVC (H.264), H.263, MPEG-4 part 2
.wmv	Advanced Systems Format (ASF)	VC-1
.flv	Adobe Flash	AVC (H.264), Flash 9 File, H.263
.avi	Audio Video Interleave (AVI)	Uncompressed, Canopus HQ, DivX/Xvid, DV/DVCPRO, MJPEG
.m3u8	HLS (MPEG-2 TS segments)	AVC (H.264), HEVC (H.265), MPEG-2
.mxf	Interoperable Master Format (IMF)	Apple ProRes, JPEG 2000 (J2K)
.mxf	Material Exchange Format (MXF)	Uncompressed, AVC (H.264), AVC Intra 50/100, Apple ProRes (4444, 4444 XQ, 422, 422 HQ, LT, Proxy), DV/DVCPRO, DV25, DV50, DVCPro HD, JPEG 2000 (J2K), MPEG-2, Panasonic P2, SonyXDCam, SonyXDCam MPEG-4 Proxy, VC-3
.mkv	Matroska	AVC (H.264), MPEG-2, MPEG-4 part 2, PCM, VC-1
.mpg, .mpeg, .m2p, .ps	MPEG Program Streams (MPEG-PS)	MPEG-2
.m2t, .ts, .tsv	MPEG Transport Streams (MPEG-TS)	AVC (H.264), HEVC (H.265), MPEG-2, VC-1
.dat, .m1v, .mpeg, .mpg, .mpv	MPEG-1 System Streams	MPEG-1, MPEG-2
.mp4, .mpeg4	MPEG-4	Uncompressed, DivX/Xvid, H.261, H.262, H.263, AVC (H.264), AVC Intra 50/100, HEVC (H.265), JPEG 2000, MPEG-2, MPEG-4 part 2, VC-1
.mov, .qt	QuickTime	Uncompressed, Apple ProRes (4444, 4444 XQ, 422, 422 HQ, LT, Proxy), DV/DVCPRO, DivX/Xvid, H.261, H.262, H.263, AVC (H.264), AVC Intra 50/100, HEVC (H.265), JPEG 2000 (J2K), MJPEG, MPEG-2, MPEG-4 part 2, QuickTime Animation (RLE)
.webm	WebM	VP8, VP9

Some codec profiles are not supported by Bytescale. It is worth noting that AVC (H.264) High 4:4:4 Predictive is currently not supported. We aim to provide a full list of supported profiles in the near future.

Audio Metadata API

Use the Audio Metadata API to extract the duration, codec, and other information from an audio file.

Instructions:

Replace raw with audio in your audio URL.
Append ?f=meta to the URL.
The result will be a JSON payload describing the audio's tracks (see below).

Example audio metadata JSON response:

{
  "tracks": [
    {
      "bitRate": 159980,
      "bitRateMode": "VBR",
      "channels": 2,
      "codec": "AAC",
      "codecId": "mp4a-40-2",
      "frameCount": 35875,
      "frameRate": 46.875,
      "samplingRate": 48000,
      "title": "Stereo",
      "type": "Audio"
    }
  ]
}

Audio Transcoding API

Use the Audio Transcoding API to transcode your audio to a specific format.

Output Format f

Use the f parameter to change the output format of the audio:

Format	Transcoding	Compression	Browser Support
f=mp3	medium	good	all
f=aac recommended	medium	excellent	all
f=wav-riff	medium	none	none
f=wav-rf64	medium	none	none
f=hls-aac	medium	excellent	requires SDK
f=hls-aac-rt	fast	excellent	requires SDK

f=mp3

Transcodes the audio to MP3 (.mp3).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the MP3 file on job completion.

f=aac

Transcodes the audio to AAC (.aac).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the AAC file on job completion.

f=wav-riff

Transcodes the audio to Waveform (.wav) using the RIFF wave format.

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the WAV file on job completion.

f=wav-rf64

Transcodes the audio to Waveform (.wav) using the RF64 wave format (to support output audio larger than 4GB).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the WAV file on job completion.

f=hls-aac

Transcodes the audio to HLS AAC (.m3u8).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the M3U8 file on job completion.

Browser support: all browsers (requires an audio player SDK with HLS support, like Video.js)

f=hls-aac-rt

Transcodes the audio to HLS AAC (.m3u8) and returns the audio while it's being transcoded.

This output format is designed to reduce the wait time for your listeners when the given audio has not been transcoded before. Like the other output formats, this audio format incurs an initial delay while transcoding starts. However, unlike the other formats, once transcoding begins the audio will be streamed to listeners during transcoding. As with the other formats, once transcoded, the resulting audio will be cached and will not need to be transcoded again.

Caveat: This format introduces challenges for some audio players and audio SDKs due to the use of a live M3U8 playlist during transcoding. As such, we generally recommend using one of the asynchronous formats (which don't end with -rt) for a simpler implementation.

Response: M3U8

Browser support: all browsers (requires an audio player SDK with HLS support, like Video.js)

f=html-aac

Returns a webpage with an embedded audio player that's configured to play the requested audio in AAC.

Useful for sharing links to audio files and for previewing/debugging audio transformation parameters.

Response: HTML

This is the default value.

f=meta

Returns metadata for the audio file (duration, codec, etc.).

See the Audio Metadata API docs for more information.

Response: JSON (audio metadata)

Real-time Transcoding rt

rt=auto

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=auto) will be returned to the user while it's being transcoded only if the transcode rate is faster than the playback rate.

Only supported by f=hls-aac-rt and f=html-aac.

This is the default value.

rt=false

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=false) will never be returned to the user while it's being transcoded.

Use this option as a performance optimization (instead of using rt=auto) when you know the variant will always transcode at a slower rate than its playback rate:

•When rt=auto is used, the initial HTTP request for the M3U master manifest will block until the first few segments of each rt=auto and rt=true variants have been transcoded, before returning the initial M3U playlist.

•In general, you want to exclude slow-transcoding HLS variants to reduce this latency.

If none of the HLS variants have rt=true or rt=auto then the fastest variant to transcode will be returned during transcoding.

Only supported by f=hls-aac-rt and f=html-aac.

rt=true

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=auto) will always be returned to the user while it's being transcoded.

Only supported by f=hls-aac-rt and f=html-aac.

Audio Compression API

Use the Audio Compression API to control the file size of your audio.

Bitrate br

br=<int>

Sets the output audio bitrate (kbps).

Supported values for f=aac, f=hls-aac, f=hls-aac-rt and f=html-aac:

•16

•20

•24

•28

•32

•40

•48

•56

•64

•80

•96

•112

•128

•160

•192

•224

•256

•288

•320

•384

•448

•512

•576

Supported values for f=mp3:

•16

•24

•32

•40

•48

•56

•64

•72

•80

•88

•96

•104

•112

•120

•128

•136

•144

•152

•160

•168

•176

•184

•192

•200

•208

•216

•224

•232

•240

•248

•256

•264

•272

•280

•288

•296

Not applicable to f=wav (Waveform audio files do not have a bitrate).

Default: 96

Sample Rate sr

sr=<number>

Sets the output audio sample rate (kHz).

Supported values for f=aac, f=hls-aac, f=hls-aac-rt and f=html-aac:

•8

•12

•16

•22.05

•24

•32

•44.1

•48

•88.2

•96

Supported values for f=mp3:

•22.05

•32

•44.1

•48

Supported values for f=wav:

•8

•16

•22.05

•24

•32

•44.1

•48

•88.2

•96

•192

Note: the sample rate will be automatically adjusted if the provided value is unsupported by the requested bitrate for the requested audio format (for example, AAC only supports sample rates between 32kHz - 48kHz when a bitrate of 96kbps is used).

Default: 48

Audio Trimming API

Use the Audio Trimming API to remove parts of the audio from the start and/or end.

Trim Start ts

ts=<number>

Sets the start position of audio, and removes all audio before that point.

If s exceeds the length of the audio, then an error will be returned.

Supports numbers between 0 - 86399 with up to two decimal places. To provide frame accuracy for audio inputs, decimals will be interpreted as frame numbers, not milliseconds.

Trim End te

te=<number>

Sets the end position of audio, and removes all audio after that point.

If te exceeds the length of the audio, then no error will be returned, and the parameter effectively does nothing.

Supports numbers between 0 - 86399 with up to two decimal places. To provide frame accuracy for audio inputs, decimals will be interpreted as frame numbers, not milliseconds.

Trim Mode tm

tm=after-repeat

Applies the trim specified by ts and/or te after the rp parameter is applied.

tm=before-repeat

Applies the trim specified by ts and/or te before the rp parameter is applied.

This is the default value.

Audio Concatenation API

Use the Audio Concatenation API to append additional audio files to the primary audio file's timeline.

Append append

append=<string>

Appends the audio from another media file (video or audio file) to the output.

You can specify this parameter multiple times to append multiple media files.

If you specify append multiple times, then the media files will be concatenated in the order of the querystring parameters, with the primary input audio (specified on the URL's file path) playing first.

To use: specify the "file path" attribute of another media file as the query parameter's value.

Repeat rp

rp=<int>

Number of times to play the audio file.

If this parameter appears after an append parameter, then it will repeat the appended audio file only.

If this parameter appears before any append parameters, then it will repeat the primary audio file only.

Default: 1

Audio pricing

The Audio Processing API is available on all Bytescale Plans.

Audio price list

Your processing quota (see pricing) is consumed by the output audio file's duration multiplied by a "processing multiplier": the codec of your output audio file determines the "processing multiplier" that will be used.

Audio files can be played an unlimited number of times.

Your processing quota will only be deducted once per URL: for the very first request to the URL.

There is a minimum billable duration of 10 seconds per audio file.

Audio billing example:

A 60-second audio file encoded to AAC would consume 45 seconds (60 × 0.75) from your monthly processing quota.

If the audio file is initially played in January 2024, and is then played 100k times for the following 2 years, then you would be billed 45 seconds in January 2024 and 0 seconds in all the following months. (This assumes you never clear your permanent cache).

Codec	Processing Multiplier
AAC	0.75
MP3	0.75
WAV	1.15

HLS audio pricing

When using f=hls-aac, f=hls-aac-rt or f=html-aac (which uses f=hls-aac-rt internally) your processing quota will be consumed per HLS variant.

When using f=hls-aac-rt each real-time variant (rt=true or rt=auto) will have an additional 10 seconds added to its billable duration.

The default behavior for HLS outputs is to produce one HLS AAC variant.

You can change this behavior using the querystring parameters documented on this page.

HLS pricing example:

Given an input audio file of 60 seconds and the querystring ?f=hls-aac-rt&br=64&br=128&br=256&rt=false, you would be billed:

3×60 seconds for 3× HLS variants (br=64&br=128&br=256).
2×10 seconds for 2× HLS variants using real-time transcoding.
- The first two variants on the querystring (br=64&br=128) do not specify rt parameters, so will default to rt=auto.
- Per the pricing above, real-time variants incur an additional 10 seconds of billable duration.
200 seconds total billed duration: 3×60 + 2×10

Was this section helpful? Yes No

You are using an outdated browser.

This website requires a modern web browser -- the latest versions of these browsers are supported:

UI Widgets

Data Types

More Resources

Audio Processing API

You are using an outdated browser.