Audio Processing ================ Great sites and courses: - [Youtube Channel] Valerio Velardo - The sound of AI: https://www.youtube.com/c/ValerioVelardoTheSoundofAI **Audio files samples:** - [WAV] [STEREO] :download:`audio_stereo.wav ` - [WAV] [MONO] :download:`audio_mono.wav ` - [MP3] [MONO] :download:`audio_mono.mp3 ` - [MP3] [STEREO] :download:`audio_stereo.mp3 ` - [AC3] [5.1] :download:`audio_5_1.ac3 ` - [OGG] [5 channels] :download:`audio_5channels.ogg ` - [FLAC] [STEREO] :download:`audio_stereo.flac ` Reading / Writing Wave files ############################ Reading ******* .. code-block:: python import wave import numpy as np with wave.open("audio_stereo.wav", mode='rb') as wavefile: fs = wavefile.getframerate() ts = 1.0 / fs nb_samples = wavefile.getnframes() nb_channels = wavefile.getnchannels() sample_width = wavefile.getsampwidth() buffer = wavefile.readframes(nb_samples) # Get the corresponding numpy dtype according to the sample width if sample_width == 1: sample_type = np.int8 elif sample_width == 2: sample_type = np.int16 elif sample_width == 4: sample_type = np.int32 elif sample_width == 8: sample_type = np.int64 else: raise NotImplementedError(f"Sampwidth: {sample_width}") # Convert the byte array to the correct data type # data.shape = (nb_samples * nb_channels) data = np.frombuffer(buffer, dtype=sample_type) # Split the channels # data.shape = (nb_channels, nb_samples) data = np.array([data[no_channel::nb_channels] for no_channel in range(nb_channels)]) If you need to read the data iteratively by chunk, you may need to use the function :code:`wavefile.setpos(pos)` to set the file pointer to the specified position. Writing ******* .. code-block:: python import wave import numpy as np import matplotlib.pyplot as plt fs = 8000 # 8kHz ts = 1/fs sample_type = np.int32 nb_channels = 2 sample_width = 4 # Get the corresponding sample width according to the sample type if sample_width == 1: sample_type = np.int8 elif sample_width == 2: sample_type = np.int16 elif sample_width == 4: sample_type = np.int32 elif sample_width == 8: sample_type = np.int64 else: raise NotImplementedError(f"Sampwidth: {sample_width}") # Generating audio frames # 144Hz Sinusoide from left to right channel N = 80000 # nb samples <=> 10 seconds at 8Khz t = np.arange(0, N) * ts channel_right = (np.arange(N) / (N-1)) * np.sin(2 * np.pi * 144 * t) channel_left = (np.arange(N)[::-1] / (N-1)) * np.sin(2 * np.pi * 144 * t) # Plot the audio samples fig = plt.figure(figsize=(8, 4), tight_layout=True) ax1 = fig.add_subplot(211) ax1.grid() ax1.set_ylabel("Amplitude") ax1.set_title("Right Channel") ax1.plot(t, channel_right) ax2 = fig.add_subplot(212, sharex=ax1) ax2.grid() ax2.set_ylabel("Amplitude") ax2.set_xlabel("Time [s]") ax2.set_title("Left Channel") ax2.plot(t, channel_left) fig.savefig("audio_sample.png", dpi=200) # Convert the audio channels in the right dtype # For int32 going from -2 ** 31 to 2 ** 31 - 1 channel_right = (channel_right * (2 ** (sample_width * 8 - 1) - 1)).astype(sample_type) channel_left = (channel_left * (2 ** (sample_width * 8 - 1) - 1)).astype(sample_type) # Create a single flattened array # [spl0_ch0, spl0_ch1, spl1_ch0, spl1_ch1, spl2_ch0, spl2_ch1, ...] data = np.ravel((channel_right, channel_left), order='F') data_bytes = data.tobytes() # Alternatively you can do something like that # import struct # data_bytes = struct.pack(f'<{len(data)}i', *data) with wave.open("audio_out.wav", "wb") as wavefile: wavefile.setnchannels(nb_channels) wavefile.setsampwidth(sample_width) wavefile.setframerate(fs) wavefile.writeframes(data_bytes) .. image:: /assets/audio_processing/audio_sample.png :height: 250pt .. note:: Except wave format, there is no easy and straightforward way to write audio files in python. Therefore, it's best to write first the audio file in the wave format and convert it with tools such as **ffmpeg**. Opening Other audio files formats ################################# The best current library to open audio files other than wave files is **librosa**. You can install **librosa** by running: .. code-block:: bash conda install -c conda-forge librosa .. code-block:: python import librosa data, fs = librosa.load("audio_stereo.mp3", sr=None, mono=False) # Load an audio file as a floating point time series # Audio will be automatically resampled to the given rate (default sr=22050) # To preserve the native sampling rate of the file, use sr=None # Audio will be automatically converted to mono (default mono=True) # data.shape = (nb_channels, nb_samples) **librosa** successfully read all the main audio file format I had: wav, mp3, ogg, flac, ac3 In the backgroud, **librosa** uses **soundfile** and **audioread** to read audio files: - soundfile: https://pysoundfile.readthedocs.io/en/latest/ - audioread: https://github.com/beetbox/audioread From the `librosa `_ documentation: *librosa* uses `soundfile `_ and `audioread `_ for reading audio. As of v0.7, librosa uses `soundfile` by default, and falls back on `audioread` only when dealing with codecs unsupported by `soundfile` (notably, MP3, and some variants of WAV). For a list of codecs supported by `soundfile`, see the *libsndfile* `documentation `_. ------------------------------------------------------------ **Sources**: - wave official documentation: https://docs.python.org/3/library/wave.html - librosa: https://librosa.org/