Yaafe core features¶

Yaafe core audio features.

Available features¶

AmplitudeModulation¶

class yaafelib.yaafe_extensions.yaafefeatures.AmplitudeModulation¶

Tremelo and Grain description, according to [SE2005] and [AE2001].

AmplitudeModulation uses Envelope to describe tremolo and grain. Analyzed frequency ranges are :

Tremolo : 4 - 8 Hz

Grain : 10 - 40 Hz

For each of these ranges, it computes :

Frequency of maximum energy in range

Difference of the energy of this frequency and the mean energy over all frequencies

Difference of the energy of this frequency and the mean energy in range

Product of the two first values.

[AE2001]

A.Eronen, Automatic musical instrument recognition. Master’s Thesis, Tempere University of Technology, 2001.

Parameters:

EnDecim (default=200): Decimation factor to compute envelope
blockSize (default=32768): output frames size
stepSize (default=16384): step between consecutive frames

Declaration example:

AmplitudeModulation EnDecim=200  blockSize=32768  stepSize=16384

See also

Envelope

AutoCorrelation¶

class yaafelib.yaafe_extensions.yaafefeatures.AutoCorrelation¶

Compute autocorrelation coefficients ac on each frames.

System Message: WARNING/2 (ac(k) = \sum_{i=0}^{N-k-1}x(i)x(i+k))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

ACNbCoeffs (default=49): Number of autocorrelation coefficients to keep
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

AutoCorrelation ACNbCoeffs=49  blockSize=1024  stepSize=512

See also

Frames

BeatHistogramSummary¶

class yaafelib.yaafe_extensions.yaafefeatures.BeatHistogramSummary¶

Compute the beat histogram according to [GT2002], but using OnsetDetectionFunction as onset detection function.

[GT2002]

Georges Tzanetakis,

Musical Genre Classification of Audio Signals, IEEE Transactions on speech and audio processing, vol. 10, No. 5, July 2002.

Parameters:

ACPNbPeaks (default=3): Number of autocorrelation peaks to keep
BHSBeatFrameSize (default=128): Number of frames over which autocorrelation peaks is computed
BHSBeatFrameStep (default=64): Number of frames to skip between two consecutive autocorrelation peaks computation
BHSHistogramFrameSize (default=40): Number of beat frames over which histogram is computed
BHSHistogramFrameStep (default=40): Number of beat frames to skip between two consecutive histogram computation
FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
HInf (default=40): Minimal BPM to take into consideration
HNbBins (default=80): Nb bins of histogram
HSup (default=200): Maximal BPM to tage into consideration
NMANbFrames (default=5000): Number of frames to normalize together, -1 means all frames
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

BeatHistogramSummary ACPNbPeaks=3  BHSBeatFrameSize=128  BHSBeatFrameStep=64  BHSHistogramFrameSize=40  BHSHistogramFrameStep=40  FFTLength=0  FFTWindow=Hanning  HInf=40  HNbBins=80  HSup=200  NMANbFrames=5000  blockSize=1024  stepSize=512

See also

OnsetDetectionFunction

CQT¶

class yaafelib.yaafe_extensions.yaafefeatures.CQT¶

Compute the Constant-Q transform according to [CS2010] with improvements from [JPCQT].

[CS2010]

C.Schörkhuber and A.Klapuri, CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING, 7th Sound and Music Conference (SMC‘2010), 2010, Barcelona.

[JPCQT]

J.Prado, Calcul rapide de la transformée à Q constant, http://perso.telecom-paristech.fr/~prado/cqt/cqt_modif.pdf

Parameters:

CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
CQTBinsPerOctave (default=36): Number of bins per octave to consider
CQTMinFreq (default=73.42): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
CQTNbOctaves (default=3): Number of octaves to consider for analysis
stepSize (default=512): step between consecutive frames

Declaration example:

CQT CQTAlign=c  CQTBinsPerOctave=36  CQTMinFreq=73.42  CQTNbOctaves=3  stepSize=512

See also

Frames

CQT2¶

class yaafelib.yaafe_extensions.yaafefeatures.CQT2¶

Compute the Constant-Q transform according to Blankertz’s implementation [BB], with improvments from [JP2010].

[BB]	B.Blankertz, The Constant Q Transform, http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html

[JP2010]

J.Prado, Transformée à Q constant, technical report 2010D004, Institut TELECOM, TELECOM ParisTech, CNRS LTCI, 2010.

Parameters:

CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
CQTBinsPerOctave (default=3): Number of bins per octave to consider
CQTMaxFreq (default=0.5): Maximum frequency. 0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
CQTMinFreq (default=97.999): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
stepSize (default=512): step between consecutive frames

Declaration example:

CQT2 CQTAlign=c  CQTBinsPerOctave=3  CQTMaxFreq=0.5  CQTMinFreq=97.999  stepSize=512

See also

Frames

Chords¶

class yaafelib.yaafe_extensions.yaafefeatures.Chords¶

Chords recognize chords from chromagrams, according to L.Oudre’s algorithm [LO2011].

[LO2011]

Oudre, L. and Grenier, Y. and Fevotte, C., Chord recognition by fitting rescaled chroma vectors to chord templates, IEEE Transactions on Audio, Speech and Language Processing, vol. 19, pages 2222 - 2233, Sep. 2011.

Parameters:

ChordsSmoothing (default=1.5s): Chords smoothing duration
ChordsUse7 (default=0): If 1 then use 7th chords to enrich chord dictionnary, else use only major an minor chords
stepSize (default=512): step between consecutive frames

Declaration example:

Chords ChordsSmoothing=1.5s  ChordsUse7=0  stepSize=512

Chroma¶

class yaafelib.yaafe_extensions.yaafefeatures.Chroma¶

Chroma compute short-term chromagram according to [BP2005].

[BP2005]

Bello, J.P. and Pickens, J. A Robust Mid-level Representation for Harmonic Content in Music Signals. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR-05), London, UK. September 2005.

Parameters:

CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
CQTBinsPerOctave (default=36): Number of bins per octave to consider
CQTMinFreq (default=73.42): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
CQTNbOctaves (default=3): Number of octaves to consider for analysis
CTInitDuration (default=15): Duration on which perform chroma bias initialisation, in seconds.
ChromaSmoothing (default=0.75s): Chroma smoothing duration
stepSize (default=512): step between consecutive frames

Declaration example:

Chroma CQTAlign=c  CQTBinsPerOctave=36  CQTMinFreq=73.42  CQTNbOctaves=3  CTInitDuration=15  ChromaSmoothing=0.75s  stepSize=512

See also

CQT

Chroma2¶

class yaafelib.yaafe_extensions.yaafefeatures.Chroma2¶

Chroma2 compute short-term pitch profile according to [ZK2006].

[ZK2006]

Zhu and M.S. Kankanhalli. Precise pitch profile feature extraction from musical audio for key detection. IEEE Transactions on Multimedia, 2006.

Parameters:

CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
CQTBinsPerOctave (default=48): Number of bins per octave to consider
CQTMinFreq (default=27.5): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
CQTNbOctaves (default=7): Number of octaves to consider for analysis
CZBinsPerSemitone (default=1): number of bins per semitone for the PCP
CZNbCQTBinsAggregatedToPCPBin (default=-1): number of CQT bins which are aggregated for each PCP bin. if -1 then use CQTBinsPerOctave / 24
CZTuning (default=440): frequency of the A4, in Hz.
stepSize (default=512): step between consecutive frames

Declaration example:

Chroma2 CQTAlign=c  CQTBinsPerOctave=48  CQTMinFreq=27.5  CQTNbOctaves=7  CZBinsPerSemitone=1  CZNbCQTBinsAggregatedToPCPBin=-1  CZTuning=440  stepSize=512

See also

CQT

ComplexDomainOnsetDetection¶

class yaafelib.yaafe_extensions.yaafefeatures.ComplexDomainOnsetDetection¶

Compute onset detection using a complex domain spectral flux method [CD2003].

[CD2003]

C.Duxbury et al., Complex domain onset detection for musical signals, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

ComplexDomainOnsetDetection FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Frames

Energy¶

class yaafelib.yaafe_extensions.yaafefeatures.Energy¶

Compute energy as root mean square of an audio Frame.

System Message: WARNING/2 (en = \sqrt\frac{\sum_{i=0}^{N-1}x(i)^2}{N})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

Energy blockSize=1024  stepSize=512

See also

Frames

Envelope¶

class yaafelib.yaafe_extensions.yaafefeatures.Envelope¶

Extract amplitude envelope using hilbert transform, low-pass filtering and decimation.

Parameters:

EnDecim (default=200): Decimation factor to compute envelope
blockSize (default=32768): output frames size
stepSize (default=16384): step between consecutive frames

Declaration example:

Envelope EnDecim=200  blockSize=32768  stepSize=16384

See also

Frames

EnvelopeShapeStatistics¶

class yaafelib.yaafe_extensions.yaafefeatures.EnvelopeShapeStatistics¶

Centroid, spread, skewness and kurtosis of each frame’s amplitude envelope. For more details about moments, see Shape Statistics.

Parameters:

EnDecim (default=200): Decimation factor to compute envelope
blockSize (default=32768): output frames size
stepSize (default=16384): step between consecutive frames

Declaration example:

EnvelopeShapeStatistics EnDecim=200  blockSize=32768  stepSize=16384

See also

Envelope

Frames¶

class yaafelib.yaafe_extensions.yaafefeatures.Frames¶

Segment input signal into frames.

First frame has zeros on left half so that it is centered on time 0s, then consecutive frames are equally spaced. Consequently, frame i (starting from 0) is centered on sample i * stepSize.

Parameters:

blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

Frames blockSize=1024  stepSize=512

LPC¶

class yaafelib.yaafe_extensions.yaafefeatures.LPC¶

Compute the Linear Predictor Coefficients (LPC) of a signal frame. It uses autocorrelation and Levinson-Durbin algorithm. see [JM1975].

[JM1975]

Makoul J., Linear Prediction: A tutorial Review, Proc. IEEE, Vol. 63, pp. 561-580, 1975.

Parameters:

LPCNbCoeffs (default=2): Number of Linear Predictor Coefficients to compute
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

LPC LPCNbCoeffs=2  blockSize=1024  stepSize=512

See also

AutoCorrelation

LSF¶

class yaafelib.yaafe_extensions.yaafefeatures.LSF¶

Compute the Line Spectral Frequency (LSF) coefficients of a signal frame. Algorithm was adapted from ([TB2006], [SH1976]).

[TB2006]

Tom Backstrom, Carlo Magi, Properties of line spectrum pair polynomials–A review, Signal Processing, Volume 86, Issue 11, Special Section: Distributed Source Coding, November 2006, Pages 3286-3298, ISSN 0165-1684, DOI: 10.1016/j.sigpro.2006.01.010.

[SH1976]

Schussler, H., A stability theorem for discrete systems, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.24, no.1, pp. 87-89, Feb 1976

Parameters:

LSFDisplacement (default=1): LSF Displacement parameter: 1 for classical LSF, 0 for Schussler polynomials, >1 is a generalization
LSFNbCoeffs (default=10): Number of Line Spectral Frequencies to compute
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

LSF LSFDisplacement=1  LSFNbCoeffs=10  blockSize=1024  stepSize=512

See also

LPC

Loudness¶

class yaafelib.yaafe_extensions.yaafefeatures.Loudness¶

The loudness coefficients are the energy in each Bark band, normalized by the overall sum. see [GP2004] and [MG1997] for more details.

[MG1997]

Moore, Glasberg, et al., A Model for the Prediction of Thresholds Loudness and Partial Loudness., J. Audio Eng. Soc. 45: 224-240, 1997.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
LMode (default=Relative): “Specific” computes loudness without normalization, “Relative” normalize each band so that they sum to 1, “Total” just returns the sum of Loudness in all bands.
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

Loudness FFTLength=0  FFTWindow=Hanning  LMode=Relative  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

MFCC¶

class yaafelib.yaafe_extensions.yaafefeatures.MFCC¶

Compute the Mel-frequencies cepstrum coefficients [DM1980].

Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:

System Message: WARNING/2 (melfreq = 1127 * log(1 + \frac{freq}{700}))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Each filter is a triangular filter with height

System Message: WARNING/2 (2/(f_{max}-f_{min}))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

. Then MFCCs are computed as following, using DCT II:

System Message: WARNING/2 (mfcc = dct(log(abs(fft(hanning(N).x)).MelFilterBank)))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

[DM1980]

(1, 2) S.B. Davis and P.Mermelstrin, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28 :357-366, 1980.

Parameters:

CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
MelNbFilters (default=40): Number of mel filters
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

MFCC CepsIgnoreFirstCoeff=1  CepsNbCoeffs=13  FFTWindow=Hanning  MelMaxFreq=6854.0  MelMinFreq=130.0  MelNbFilters=40  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

MagnitudeSpectrum¶

class yaafelib.yaafe_extensions.yaafefeatures.MagnitudeSpectrum¶

Compute frame’s magnitude spectrum, using an analysis window (Hanning or Hamming), or not.

System Message: WARNING/2 (y = abs(fft(w*x)))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

MagnitudeSpectrum FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Frames

MelSpectrum¶

class yaafelib.yaafe_extensions.yaafefeatures.MelSpectrum¶

Compute the Mel-frequencies spectrum [DM1980].

Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:

System Message: WARNING/2 (melfreq = 1127 * log(1 + \frac{freq}{700}))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Each filter is a triangular filter with height

System Message: WARNING/2 (2/(f_{max}-f_{min}))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

.

Parameters:

FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
MelNbFilters (default=40): Number of mel filters
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

MelSpectrum FFTWindow=Hanning  MelMaxFreq=6854.0  MelMinFreq=130.0  MelNbFilters=40  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

OBSI¶

class yaafelib.yaafe_extensions.yaafefeatures.OBSI¶

Compute Octave band signal intensity using a trigular octave filter bank ([SE2005]).

[SE2005]

(1, 2) S.Essid, Classification automatique des signaux audio-frequences: reconnaissance des instruments de musique. PhD, UPMC, 2005.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

OBSI FFTLength=0  FFTWindow=Hanning  OBSIMinFreq=27.5  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

OBSIR¶

class yaafelib.yaafe_extensions.yaafefeatures.OBSIR¶

Compute log of OBSI ratio between consecutive octave.

Parameters:

DiffNbCoeffs (default=0): Maximum number of coeffs to keep. 0 keeps N-1 value (with N the input feature size)
FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

OBSIR DiffNbCoeffs=0  FFTLength=0  FFTWindow=Hanning  OBSIMinFreq=27.5  blockSize=1024  stepSize=512

See also

OBSI

OnsetDetectionFunction¶

class yaafelib.yaafe_extensions.yaafefeatures.OnsetDetectionFunction¶

Compute onset detection function (spectral energy flux) according to [MA2005] method.

[MA2005]

M.Alonso, G.Richard, B.David,

EXTRACTING NOTE ONSETS FROM MUSICAL RECORDINGS, International Conference on Multimedia and Expo (IEEE-ICME‘05), Amsterdam, The Netherlands, 2005.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
NMANbFrames (default=5000): Number of frames to normalize together, -1 means all frames
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

OnsetDetectionFunction FFTLength=0  FFTWindow=Hanning  NMANbFrames=5000  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

PerceptualSharpness¶

class yaafelib.yaafe_extensions.yaafefeatures.PerceptualSharpness¶

Compute the sharpness of Loudness coefficients, according to [GP2004].

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

PerceptualSharpness FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Loudness

PerceptualSpread¶

class yaafelib.yaafe_extensions.yaafefeatures.PerceptualSpread¶

Compute the spread of Loudness coefficients, according to [GP2004].

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

PerceptualSpread FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Loudness

SpectralCrestFactorPerBand¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralCrestFactorPerBand¶

Compute spectral crest factor per log-spaced band of 1/4 octave.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralCrestFactorPerBand FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralDecrease¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralDecrease¶

Compute spectral decrease accoding to [GP2004].

System Message: WARNING/2 (S_{decrease} = \frac{1}{\sum_{k=2}^{K}a_{k}} \sum_{k=2}^{K}\frac{a_{k}-a_{1}}{k-1})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralDecrease FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralFlatness¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralFlatness¶

Compute global spectral flatness using the ratio between geometric and arithmetic mean.

System Message: WARNING/2 (S_{flatness} = \frac{exp(\frac{1}{N}\sum_{k}log(a_{k}))} {\frac{1}{N}\sum_{k}a_{k}})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralFlatness FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralFlatnessPerBand¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralFlatnessPerBand¶

Compute spectral flatness per log-spaced band of 1/4 octave, as proposed in MPEG7 standard.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralFlatnessPerBand FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralFlux¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralFlux¶

Compute flux of spectrum between consecutives frames.

System Message: WARNING/2 (S_{flux} = \frac{\sum_{k}(a_{k}(t) - a_{k}(t-1))^2} {\sqrt{\sum_{k}a_{k}(t-1)^2} \sqrt{\sum_{k}a_{k}(t)^2}})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
FluxSupport (default=All): support of flux computation. if ‘All’ then use all bins (default), if ‘Increase’ then use only bins which are increasing
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralFlux FFTLength=0  FFTWindow=Hanning  FluxSupport=All  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralIrregularity¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralIrregularity¶

Compute difference between consecutive CQT bins, see [Brown2000].

[Brown2000]

J.C. Brown, O.Houix, Stephen McAdams, Feature dependence in the automatic identification of musical woodwind instruments., Journal of the Acoustical Society of America, 109: 1064-1072, 2000.

Parameters:

CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
CQTBinsPerOctave (default=36): Number of bins per octave to consider
CQTMinFreq (default=73.42): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
CQTNbOctaves (default=3): Number of octaves to consider for analysis
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralIrregularity CQTAlign=c  CQTBinsPerOctave=36  CQTMinFreq=73.42  CQTNbOctaves=3  stepSize=512

See also

CQT

SpectralRolloff¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralRolloff¶

Spectral roll-off is the frequency so that 99% of the energy is contained below. see [SS1997].

[SS1997]

(1, 2) E.Scheirer, M.Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331-1334, 1997.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralRolloff FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralShapeStatistics¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralShapeStatistics¶

Compute shape statistics of MagnitudeSpectrum, (see [GR2004]).

Shape Statistics are centroid, spread, skewness and kurtosis, defined as follow:

System Message: WARNING/2 (\mu_{i} &= \frac{\sum_{n=1}^{N}f_{k}^{i}*a_{k}} {\sum_{n=1}^{N}a_{k}}\\ centroid &= \mu_{1}\\ spread &= \sqrt{\mu_{2}-\mu_{1}^{2}} \\ skewness &= \frac{2\mu_{1}^{3} - 3\mu_{1}\mu_{2} + \mu_{3}} {spread^{3}} \\ kurtosis &= \frac{-3\mu_{1}^{4} + 6\mu_{1}\mu_{2} - 4\mu_{1}\mu_{3} + \mu_{4}}{spread^{4}} - 3)

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

[GR2004]

O.Gillet, G.Richard, Automatic transcription of drum loops. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, Canada, 2004.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralShapeStatistics FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralSlope¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralSlope¶

SpectralSlope is computed by linear regression of the spectral amplitude. (see [GP2004])

System Message: WARNING/2 (S_{slope} = \frac{K\sum_{k}f_{k}a_{k} -\sum_{k}f_{k}\sum_{k}a_{k}} {K\sum_{k}f_{k}^2-(\sum_{k}a_{k})^2})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralSlope FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

SpectralVariation¶

class yaafelib.yaafe_extensions.yaafefeatures.SpectralVariation¶

SpectralVariation is the normalized correlation of spectrum between consecutive frames. (see [GP2004])

System Message: WARNING/2 (S_{var} = 1 - \frac{\sum_{k}a_{k}(t-1)a_{k}(t)} {\sqrt{\sum_{k}a_{k}(t-1)^2} \sqrt{\sum_{k}a_{k}(t)^2}})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

[GP2004]

(1, 2, 3, 4, 5, 6) Geoffroy Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004.

Parameters:

FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

SpectralVariation FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

MagnitudeSpectrum

TemporalShapeStatistics¶

class yaafelib.yaafe_extensions.yaafefeatures.TemporalShapeStatistics¶

Compute shape statistics of signal frames.

Parameters:

blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

TemporalShapeStatistics blockSize=1024  stepSize=512

See also

Frames

ZCR¶

class yaafelib.yaafe_extensions.yaafefeatures.ZCR¶

Compute zero-crossing rate in frames. see [SS1997].

Parameters:

blockSize (default=1024): output frames size
stepSize (default=512): step between consecutive frames

Declaration example:

ZCR blockSize=1024  stepSize=512

See also

Frames

Available feature transforms¶

AutoCorrelationPeaksIntegrator¶

class yaafelib.yaafe_extensions.yaafefeatures.AutoCorrelationPeaksIntegrator¶

Feature transform that compute peaks of the autocorrelation function, outputs peaks and amplitude.

Parameters:

ACPInterPeakMinDist (default=5): Minimal distance between consecutive autocorrelation peaks, expressed in lags.
ACPNbPeaks (default=3): Number of autocorrelation peaks to keep
ACPNorm (default=No): can be No|BPM|Hz. Normalize output to be expressed respectively in lag, BPM, Hz
NbFrames (default=60): Number of frames to integrate together
StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

AutoCorrelationPeaksIntegrator ACPInterPeakMinDist=5  ACPNbPeaks=3  ACPNorm=No  NbFrames=60  StepNbFrames=30

Cepstrum¶

class yaafelib.yaafe_extensions.yaafefeatures.Cepstrum¶

Feature transform that compute cepstrum coefficients of input feature frames. (use DCT II)

System Message: WARNING/2 (cep = dct(log(x)))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Parameters:

CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.

Declaration example:

Cepstrum CepsIgnoreFirstCoeff=1  CepsNbCoeffs=13

Derivate¶

class yaafelib.yaafe_extensions.yaafefeatures.Derivate¶

Compute temporal derivative of input feature. The derivative is approximated by an orthogonal polynomial fit over a finite length window. (see [RR1993] p.117).

System Message: WARNING/2 (\frac{\partial x(t)}{\partial t} = \mu \sum_{k=-N}^{N}k.x(t+k) where \: \mu = \sum_{k=-N}^{N}k^2)

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

[RR1993]

L.R.Rabiner, Fundamentals of Speech Processing. Prentice Hall Signal Processing Series. PTR Prentice-Hall, 1993.

Parameters:

DO1Len (default=4): Horizon used to compute order 1 derivative.
DO2Len (default=1): Horizon used to compute order 2 derivative. Useless if DOrder=1.
DOrder (default=1): Order of the derivative to compute.

Declaration example:

Derivate DO1Len=4  DO2Len=1  DOrder=1

HistogramIntegrator¶

class yaafelib.yaafe_extensions.yaafefeatures.HistogramIntegrator¶

Feature transform that compute histogram of input values

Parameters:

HInf (default=0): Minimal value to take into consideration
HNbBins (default=10): Nb bins of histogram
HSup (default=1): Maximal value to take into consideration
HWeighted (default=0): Set it to 1 if input values are weighted. If 1, input is considered to be a list of couple (value,weight).
NbFrames (default=60): Number of frames to integrate together
StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

HistogramIntegrator HInf=0  HNbBins=10  HSup=1  HWeighted=0  NbFrames=60  StepNbFrames=30

SlopeIntegrator¶

class yaafelib.yaafe_extensions.yaafefeatures.SlopeIntegrator¶

Feature transform that compute the slope of input feature over the given number of frames.

Parameters:

NbFrames (default=60): Number of frames to integrate together
StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

SlopeIntegrator NbFrames=60  StepNbFrames=30

StatisticalIntegrator¶

class yaafelib.yaafe_extensions.yaafefeatures.StatisticalIntegrator¶

Feature transform that compute the temporal mean and variance of input feature over the given number of frames.

Parameters:

NbFrames (default=60): Number of frames to integrate together
SICompute (default=MeanStddev): if ‘MeanStddev’ then compute mean and standard deviation, if ‘Mean’ compute only mean, if ‘Stddev’ compute only stantard deviation.
StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

StatisticalIntegrator NbFrames=60  SICompute=MeanStddev  StepNbFrames=30