Yaafe core features

Yaafe core audio features.

Available features

AmplitudeModulation

class yaafelib.yaafe_extensions.yaafefeatures.AmplitudeModulation

Tremelo and Grain description, according to [SE2005] and [AE2001].

AmplitudeModulation uses Envelope to describe tremolo and grain. Analyzed frequency ranges are :

  • Tremolo : 4 - 8 Hz
  • Grain : 10 - 40 Hz

For each of these ranges, it computes :

  1. Frequency of maximum energy in range
  2. Difference of the energy of this frequency and the mean energy over all frequencies
  3. Difference of the energy of this frequency and the mean energy in range
  4. Product of the two first values.
[AE2001]A.Eronen, Automatic musical instrument recognition. Master’s Thesis, Tempere University of Technology, 2001.
Parameters:
  • EnDecim (default=200): Decimation factor to compute envelope
  • blockSize (default=32768): output frames size
  • stepSize (default=16384): step between consecutive frames

Declaration example:

AmplitudeModulation EnDecim=200  blockSize=32768  stepSize=16384

See also

Envelope

AutoCorrelation

class yaafelib.yaafe_extensions.yaafefeatures.AutoCorrelation

Compute autocorrelation coefficients ac on each frames.

System Message: WARNING/2 (ac(k) = \sum_{i=0}^{N-k-1}x(i)x(i+k) )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • ACNbCoeffs (default=49): Number of autocorrelation coefficients to keep
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

AutoCorrelation ACNbCoeffs=49  blockSize=1024  stepSize=512

See also

Frames

BeatHistogramSummary

class yaafelib.yaafe_extensions.yaafefeatures.BeatHistogramSummary

Compute the beat histogram according to [GT2002], but using OnsetDetectionFunction as onset detection function.

[GT2002]Georges Tzanetakis,

Musical Genre Classification of Audio Signals, IEEE Transactions on speech and audio processing, vol. 10, No. 5, July 2002.

Parameters:
  • ACPNbPeaks (default=3): Number of autocorrelation peaks to keep
  • BHSBeatFrameSize (default=128): Number of frames over which autocorrelation peaks is computed
  • BHSBeatFrameStep (default=64): Number of frames to skip between two consecutive autocorrelation peaks computation
  • BHSHistogramFrameSize (default=40): Number of beat frames over which histogram is computed
  • BHSHistogramFrameStep (default=40): Number of beat frames to skip between two consecutive histogram computation
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • HInf (default=40): Minimal BPM to take into consideration
  • HNbBins (default=80): Nb bins of histogram
  • HSup (default=200): Maximal BPM to tage into consideration
  • NMANbFrames (default=5000): Number of frames to normalize together, -1 means all frames
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

BeatHistogramSummary ACPNbPeaks=3  BHSBeatFrameSize=128  BHSBeatFrameStep=64  BHSHistogramFrameSize=40  BHSHistogramFrameStep=40  FFTLength=0  FFTWindow=Hanning  HInf=40  HNbBins=80  HSup=200  NMANbFrames=5000  blockSize=1024  stepSize=512

CQT

class yaafelib.yaafe_extensions.yaafefeatures.CQT

Compute the Constant-Q transform according to [CS2010] with improvements from [JPCQT].

[CS2010]C.Schörkhuber and A.Klapuri, CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING, 7th Sound and Music Conference (SMC‘2010), 2010, Barcelona.
[JPCQT]J.Prado, Calcul rapide de la transformée à Q constant, http://perso.telecom-paristech.fr/~prado/cqt/cqt_modif.pdf
Parameters:
  • CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
  • CQTBinsPerOctave (default=36): Number of bins per octave to consider
  • CQTMinFreq (default=73.42): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
  • CQTNbOctaves (default=3): Number of octaves to consider for analysis
  • stepSize (default=512): step between consecutive frames

Declaration example:

CQT CQTAlign=c  CQTBinsPerOctave=36  CQTMinFreq=73.42  CQTNbOctaves=3  stepSize=512

See also

Frames

CQT2

class yaafelib.yaafe_extensions.yaafefeatures.CQT2

Compute the Constant-Q transform according to Blankertz’s implementation [BB], with improvments from [JP2010].

[BB]B.Blankertz, The Constant Q Transform, http://wwwmath.uni-muenster.de/logik/Personen/blankertz/constQ/constQ.html
[JP2010]J.Prado, Transformée à Q constant, technical report 2010D004, Institut TELECOM, TELECOM ParisTech, CNRS LTCI, 2010.
Parameters:
  • CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
  • CQTBinsPerOctave (default=3): Number of bins per octave to consider
  • CQTMaxFreq (default=0.5): Maximum frequency. 0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
  • CQTMinFreq (default=97.999): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
  • stepSize (default=512): step between consecutive frames

Declaration example:

CQT2 CQTAlign=c  CQTBinsPerOctave=3  CQTMaxFreq=0.5  CQTMinFreq=97.999  stepSize=512

See also

Frames

Chords

class yaafelib.yaafe_extensions.yaafefeatures.Chords

Chords recognize chords from chromagrams, according to L.Oudre’s algorithm [LO2011].

[LO2011]Oudre, L. and Grenier, Y. and Fevotte, C., Chord recognition by fitting rescaled chroma vectors to chord templates, IEEE Transactions on Audio, Speech and Language Processing, vol. 19, pages 2222 - 2233, Sep. 2011.
Parameters:
  • ChordsSmoothing (default=1.5s): Chords smoothing duration
  • ChordsUse7 (default=0): If 1 then use 7th chords to enrich chord dictionnary, else use only major an minor chords
  • stepSize (default=512): step between consecutive frames

Declaration example:

Chords ChordsSmoothing=1.5s  ChordsUse7=0  stepSize=512

Chroma

class yaafelib.yaafe_extensions.yaafefeatures.Chroma

Chroma compute short-term chromagram according to [BP2005].

[BP2005]Bello, J.P. and Pickens, J. A Robust Mid-level Representation for Harmonic Content in Music Signals. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR-05), London, UK. September 2005.
Parameters:
  • CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
  • CQTBinsPerOctave (default=36): Number of bins per octave to consider
  • CQTMinFreq (default=73.42): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
  • CQTNbOctaves (default=3): Number of octaves to consider for analysis
  • CTInitDuration (default=15): Duration on which perform chroma bias initialisation, in seconds.
  • ChromaSmoothing (default=0.75s): Chroma smoothing duration
  • stepSize (default=512): step between consecutive frames

Declaration example:

Chroma CQTAlign=c  CQTBinsPerOctave=36  CQTMinFreq=73.42  CQTNbOctaves=3  CTInitDuration=15  ChromaSmoothing=0.75s  stepSize=512

See also

CQT

Chroma2

class yaafelib.yaafe_extensions.yaafefeatures.Chroma2

Chroma2 compute short-term pitch profile according to [ZK2006].

[ZK2006]
  1. Zhu and M.S. Kankanhalli. Precise pitch profile feature extraction from musical audio for key detection. IEEE Transactions on Multimedia, 2006.
Parameters:
  • CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
  • CQTBinsPerOctave (default=48): Number of bins per octave to consider
  • CQTMinFreq (default=27.5): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
  • CQTNbOctaves (default=7): Number of octaves to consider for analysis
  • CZBinsPerSemitone (default=1): number of bins per semitone for the PCP
  • CZNbCQTBinsAggregatedToPCPBin (default=-1): number of CQT bins which are aggregated for each PCP bin. if -1 then use CQTBinsPerOctave / 24
  • CZTuning (default=440): frequency of the A4, in Hz.
  • stepSize (default=512): step between consecutive frames

Declaration example:

Chroma2 CQTAlign=c  CQTBinsPerOctave=48  CQTMinFreq=27.5  CQTNbOctaves=7  CZBinsPerSemitone=1  CZNbCQTBinsAggregatedToPCPBin=-1  CZTuning=440  stepSize=512

See also

CQT

ComplexDomainOnsetDetection

class yaafelib.yaafe_extensions.yaafefeatures.ComplexDomainOnsetDetection

Compute onset detection using a complex domain spectral flux method [CD2003].

[CD2003]C.Duxbury et al., Complex domain onset detection for musical signals, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

ComplexDomainOnsetDetection FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Frames

Energy

class yaafelib.yaafe_extensions.yaafefeatures.Energy

Compute energy as root mean square of an audio Frame.

System Message: WARNING/2 (en = \sqrt\frac{\sum_{i=0}^{N-1}x(i)^2}{N} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

Energy blockSize=1024  stepSize=512

See also

Frames

Envelope

class yaafelib.yaafe_extensions.yaafefeatures.Envelope

Extract amplitude envelope using hilbert transform, low-pass filtering and decimation.

Parameters:
  • EnDecim (default=200): Decimation factor to compute envelope
  • blockSize (default=32768): output frames size
  • stepSize (default=16384): step between consecutive frames

Declaration example:

Envelope EnDecim=200  blockSize=32768  stepSize=16384

See also

Frames

EnvelopeShapeStatistics

class yaafelib.yaafe_extensions.yaafefeatures.EnvelopeShapeStatistics

Centroid, spread, skewness and kurtosis of each frame’s amplitude envelope. For more details about moments, see Shape Statistics.

Parameters:
  • EnDecim (default=200): Decimation factor to compute envelope
  • blockSize (default=32768): output frames size
  • stepSize (default=16384): step between consecutive frames

Declaration example:

EnvelopeShapeStatistics EnDecim=200  blockSize=32768  stepSize=16384

See also

Envelope

Frames

class yaafelib.yaafe_extensions.yaafefeatures.Frames

Segment input signal into frames.

First frame has zeros on left half so that it is centered on time 0s, then consecutive frames are equally spaced. Consequently, frame i (starting from 0) is centered on sample i * stepSize.

Parameters:
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

Frames blockSize=1024  stepSize=512

LPC

class yaafelib.yaafe_extensions.yaafefeatures.LPC

Compute the Linear Predictor Coefficients (LPC) of a signal frame. It uses autocorrelation and Levinson-Durbin algorithm. see [JM1975].

[JM1975]Makoul J., Linear Prediction: A tutorial Review, Proc. IEEE, Vol. 63, pp. 561-580, 1975.
Parameters:
  • LPCNbCoeffs (default=2): Number of Linear Predictor Coefficients to compute
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

LPC LPCNbCoeffs=2  blockSize=1024  stepSize=512

See also

AutoCorrelation

LSF

class yaafelib.yaafe_extensions.yaafefeatures.LSF

Compute the Line Spectral Frequency (LSF) coefficients of a signal frame. Algorithm was adapted from ([TB2006], [SH1976]).

[TB2006]Tom Backstrom, Carlo Magi, Properties of line spectrum pair polynomials–A review, Signal Processing, Volume 86, Issue 11, Special Section: Distributed Source Coding, November 2006, Pages 3286-3298, ISSN 0165-1684, DOI: 10.1016/j.sigpro.2006.01.010.
[SH1976]Schussler, H., A stability theorem for discrete systems, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.24, no.1, pp. 87-89, Feb 1976
Parameters:
  • LSFDisplacement (default=1): LSF Displacement parameter: 1 for classical LSF, 0 for Schussler polynomials, >1 is a generalization
  • LSFNbCoeffs (default=10): Number of Line Spectral Frequencies to compute
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

LSF LSFDisplacement=1  LSFNbCoeffs=10  blockSize=1024  stepSize=512

See also

LPC

Loudness

class yaafelib.yaafe_extensions.yaafefeatures.Loudness

The loudness coefficients are the energy in each Bark band, normalized by the overall sum. see [GP2004] and [MG1997] for more details.

[MG1997]Moore, Glasberg, et al., A Model for the Prediction of Thresholds Loudness and Partial Loudness., J. Audio Eng. Soc. 45: 224-240, 1997.
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • LMode (default=Relative): “Specific” computes loudness without normalization, “Relative” normalize each band so that they sum to 1, “Total” just returns the sum of Loudness in all bands.
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

Loudness FFTLength=0  FFTWindow=Hanning  LMode=Relative  blockSize=1024  stepSize=512

MFCC

class yaafelib.yaafe_extensions.yaafefeatures.MFCC

Compute the Mel-frequencies cepstrum coefficients [DM1980].

Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:

System Message: WARNING/2 (melfreq = 1127 * log(1 + \frac{freq}{700}) )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Each filter is a triangular filter with height

System Message: WARNING/2 (2/(f_{max}-f_{min}))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
. Then MFCCs are computed as following, using DCT II:

System Message: WARNING/2 (mfcc = dct(log(abs(fft(hanning(N).x)).MelFilterBank)) )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
[DM1980](1, 2) S.B. Davis and P.Mermelstrin, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28 :357-366, 1980.
Parameters:
  • CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
  • CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
  • MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
  • MelNbFilters (default=40): Number of mel filters
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

MFCC CepsIgnoreFirstCoeff=1  CepsNbCoeffs=13  FFTWindow=Hanning  MelMaxFreq=6854.0  MelMinFreq=130.0  MelNbFilters=40  blockSize=1024  stepSize=512

MagnitudeSpectrum

class yaafelib.yaafe_extensions.yaafefeatures.MagnitudeSpectrum

Compute frame’s magnitude spectrum, using an analysis window (Hanning or Hamming), or not.

System Message: WARNING/2 (y = abs(fft(w*x)) )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

MagnitudeSpectrum FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Frames

MelSpectrum

class yaafelib.yaafe_extensions.yaafefeatures.MelSpectrum

Compute the Mel-frequencies spectrum [DM1980].

Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:

System Message: WARNING/2 (melfreq = 1127 * log(1 + \frac{freq}{700}) )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)

Each filter is a triangular filter with height

System Message: WARNING/2 (2/(f_{max}-f_{min}))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
.

Parameters:
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
  • MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
  • MelNbFilters (default=40): Number of mel filters
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

MelSpectrum FFTWindow=Hanning  MelMaxFreq=6854.0  MelMinFreq=130.0  MelNbFilters=40  blockSize=1024  stepSize=512

OBSI

class yaafelib.yaafe_extensions.yaafefeatures.OBSI

Compute Octave band signal intensity using a trigular octave filter bank ([SE2005]).

[SE2005](1, 2) S.Essid, Classification automatique des signaux audio-frequences: reconnaissance des instruments de musique. PhD, UPMC, 2005.
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

OBSI FFTLength=0  FFTWindow=Hanning  OBSIMinFreq=27.5  blockSize=1024  stepSize=512

OBSIR

class yaafelib.yaafe_extensions.yaafefeatures.OBSIR

Compute log of OBSI ratio between consecutive octave.

Parameters:
  • DiffNbCoeffs (default=0): Maximum number of coeffs to keep. 0 keeps N-1 value (with N the input feature size)
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

OBSIR DiffNbCoeffs=0  FFTLength=0  FFTWindow=Hanning  OBSIMinFreq=27.5  blockSize=1024  stepSize=512

See also

OBSI

OnsetDetectionFunction

class yaafelib.yaafe_extensions.yaafefeatures.OnsetDetectionFunction

Compute onset detection function (spectral energy flux) according to [MA2005] method.

[MA2005]M.Alonso, G.Richard, B.David,

EXTRACTING NOTE ONSETS FROM MUSICAL RECORDINGS, International Conference on Multimedia and Expo (IEEE-ICME‘05), Amsterdam, The Netherlands, 2005.

Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • NMANbFrames (default=5000): Number of frames to normalize together, -1 means all frames
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

OnsetDetectionFunction FFTLength=0  FFTWindow=Hanning  NMANbFrames=5000  blockSize=1024  stepSize=512

PerceptualSharpness

class yaafelib.yaafe_extensions.yaafefeatures.PerceptualSharpness

Compute the sharpness of Loudness coefficients, according to [GP2004].

Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

PerceptualSharpness FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Loudness

PerceptualSpread

class yaafelib.yaafe_extensions.yaafefeatures.PerceptualSpread

Compute the spread of Loudness coefficients, according to [GP2004].

Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

PerceptualSpread FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

See also

Loudness

SpectralCrestFactorPerBand

class yaafelib.yaafe_extensions.yaafefeatures.SpectralCrestFactorPerBand

Compute spectral crest factor per log-spaced band of 1/4 octave.

Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralCrestFactorPerBand FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralDecrease

class yaafelib.yaafe_extensions.yaafefeatures.SpectralDecrease

Compute spectral decrease accoding to [GP2004].

System Message: WARNING/2 (S_{decrease} = \frac{1}{\sum_{k=2}^{K}a_{k}} \sum_{k=2}^{K}\frac{a_{k}-a_{1}}{k-1} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralDecrease FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralFlatness

class yaafelib.yaafe_extensions.yaafefeatures.SpectralFlatness

Compute global spectral flatness using the ratio between geometric and arithmetic mean.

System Message: WARNING/2 (S_{flatness} = \frac{exp(\frac{1}{N}\sum_{k}log(a_{k}))} {\frac{1}{N}\sum_{k}a_{k}} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralFlatness FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralFlatnessPerBand

class yaafelib.yaafe_extensions.yaafefeatures.SpectralFlatnessPerBand

Compute spectral flatness per log-spaced band of 1/4 octave, as proposed in MPEG7 standard.

Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralFlatnessPerBand FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralFlux

class yaafelib.yaafe_extensions.yaafefeatures.SpectralFlux

Compute flux of spectrum between consecutives frames.

System Message: WARNING/2 (S_{flux} = \frac{\sum_{k}(a_{k}(t) - a_{k}(t-1))^2} {\sqrt{\sum_{k}a_{k}(t-1)^2} \sqrt{\sum_{k}a_{k}(t)^2}} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • FluxSupport (default=All): support of flux computation. if ‘All’ then use all bins (default), if ‘Increase’ then use only bins which are increasing
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralFlux FFTLength=0  FFTWindow=Hanning  FluxSupport=All  blockSize=1024  stepSize=512

SpectralIrregularity

class yaafelib.yaafe_extensions.yaafefeatures.SpectralIrregularity

Compute difference between consecutive CQT bins, see [Brown2000].

[Brown2000]J.C. Brown, O.Houix, Stephen McAdams, Feature dependence in the automatic identification of musical woodwind instruments., Journal of the Acoustical Society of America, 109: 1064-1072, 2000.
Parameters:
  • CQTAlign (default=c): Alignment of cqt kernels on analysis frame. ‘l’ to the left, ‘c’ to the center, ‘r’ to the right
  • CQTBinsPerOctave (default=36): Number of bins per octave to consider
  • CQTMinFreq (default=73.42): Minimal frequency. If <0.5 then assume it’s a factor of sampleRate else assume it’s expressed in Hertz.
  • CQTNbOctaves (default=3): Number of octaves to consider for analysis
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralIrregularity CQTAlign=c  CQTBinsPerOctave=36  CQTMinFreq=73.42  CQTNbOctaves=3  stepSize=512

See also

CQT

SpectralRolloff

class yaafelib.yaafe_extensions.yaafefeatures.SpectralRolloff

Spectral roll-off is the frequency so that 99% of the energy is contained below. see [SS1997].

[SS1997](1, 2) E.Scheirer, M.Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331-1334, 1997.
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralRolloff FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralShapeStatistics

class yaafelib.yaafe_extensions.yaafefeatures.SpectralShapeStatistics

Compute shape statistics of MagnitudeSpectrum, (see [GR2004]).

Shape Statistics are centroid, spread, skewness and kurtosis, defined as follow:

System Message: WARNING/2 (\mu_{i} &= \frac{\sum_{n=1}^{N}f_{k}^{i}*a_{k}} {\sum_{n=1}^{N}a_{k}}\\ centroid &= \mu_{1}\\ spread &= \sqrt{\mu_{2}-\mu_{1}^{2}} \\ skewness &= \frac{2\mu_{1}^{3} - 3\mu_{1}\mu_{2} + \mu_{3}} {spread^{3}} \\ kurtosis &= \frac{-3\mu_{1}^{4} + 6\mu_{1}\mu_{2} - 4\mu_{1}\mu_{3} + \mu_{4}}{spread^{4}} - 3)

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
[GR2004]O.Gillet, G.Richard, Automatic transcription of drum loops. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, Canada, 2004.
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralShapeStatistics FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralSlope

class yaafelib.yaafe_extensions.yaafefeatures.SpectralSlope

SpectralSlope is computed by linear regression of the spectral amplitude. (see [GP2004])

System Message: WARNING/2 (S_{slope} = \frac{K\sum_{k}f_{k}a_{k} -\sum_{k}f_{k}\sum_{k}a_{k}} {K\sum_{k}f_{k}^2-(\sum_{k}a_{k})^2} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralSlope FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

SpectralVariation

class yaafelib.yaafe_extensions.yaafefeatures.SpectralVariation

SpectralVariation is the normalized correlation of spectrum between consecutive frames. (see [GP2004])

System Message: WARNING/2 (S_{var} = 1 - \frac{\sum_{k}a_{k}(t-1)a_{k}(t)} {\sqrt{\sum_{k}a_{k}(t-1)^2} \sqrt{\sum_{k}a_{k}(t)^2}} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
[GP2004](1, 2, 3, 4, 5, 6) Geoffroy Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004.
Parameters:
  • FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
  • FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

SpectralVariation FFTLength=0  FFTWindow=Hanning  blockSize=1024  stepSize=512

TemporalShapeStatistics

class yaafelib.yaafe_extensions.yaafefeatures.TemporalShapeStatistics

Compute shape statistics of signal frames.

Parameters:
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

TemporalShapeStatistics blockSize=1024  stepSize=512

See also

Frames

ZCR

class yaafelib.yaafe_extensions.yaafefeatures.ZCR

Compute zero-crossing rate in frames. see [SS1997].

Parameters:
  • blockSize (default=1024): output frames size
  • stepSize (default=512): step between consecutive frames

Declaration example:

ZCR blockSize=1024  stepSize=512

See also

Frames

Available feature transforms

AutoCorrelationPeaksIntegrator

class yaafelib.yaafe_extensions.yaafefeatures.AutoCorrelationPeaksIntegrator

Feature transform that compute peaks of the autocorrelation function, outputs peaks and amplitude.

Parameters:
  • ACPInterPeakMinDist (default=5): Minimal distance between consecutive autocorrelation peaks, expressed in lags.
  • ACPNbPeaks (default=3): Number of autocorrelation peaks to keep
  • ACPNorm (default=No): can be No|BPM|Hz. Normalize output to be expressed respectively in lag, BPM, Hz
  • NbFrames (default=60): Number of frames to integrate together
  • StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

AutoCorrelationPeaksIntegrator ACPInterPeakMinDist=5  ACPNbPeaks=3  ACPNorm=No  NbFrames=60  StepNbFrames=30

Cepstrum

class yaafelib.yaafe_extensions.yaafefeatures.Cepstrum

Feature transform that compute cepstrum coefficients of input feature frames. (use DCT II)

System Message: WARNING/2 (cep = dct(log(x)) )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
Parameters:
  • CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
  • CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.

Declaration example:

Cepstrum CepsIgnoreFirstCoeff=1  CepsNbCoeffs=13

Derivate

class yaafelib.yaafe_extensions.yaafefeatures.Derivate

Compute temporal derivative of input feature. The derivative is approximated by an orthogonal polynomial fit over a finite length window. (see [RR1993] p.117).

System Message: WARNING/2 (\frac{\partial x(t)}{\partial t} = \mu \sum_{k=-N}^{N}k.x(t+k) where \: \mu = \sum_{k=-N}^{N}k^2)

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/MacPorts 2016_1) (preloaded format=latex) restricted \write18 enabled. —! /opt/local/var/db/texmf/web2c/pdftex/latex.fmt doesn’t match pdftex.pool (Fatal format file error; I’m stymied)
[RR1993]L.R.Rabiner, Fundamentals of Speech Processing. Prentice Hall Signal Processing Series. PTR Prentice-Hall, 1993.
Parameters:
  • DO1Len (default=4): Horizon used to compute order 1 derivative.
  • DO2Len (default=1): Horizon used to compute order 2 derivative. Useless if DOrder=1.
  • DOrder (default=1): Order of the derivative to compute.

Declaration example:

Derivate DO1Len=4  DO2Len=1  DOrder=1

HistogramIntegrator

class yaafelib.yaafe_extensions.yaafefeatures.HistogramIntegrator

Feature transform that compute histogram of input values

Parameters:
  • HInf (default=0): Minimal value to take into consideration
  • HNbBins (default=10): Nb bins of histogram
  • HSup (default=1): Maximal value to take into consideration
  • HWeighted (default=0): Set it to 1 if input values are weighted. If 1, input is considered to be a list of couple (value,weight).
  • NbFrames (default=60): Number of frames to integrate together
  • StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

HistogramIntegrator HInf=0  HNbBins=10  HSup=1  HWeighted=0  NbFrames=60  StepNbFrames=30

SlopeIntegrator

class yaafelib.yaafe_extensions.yaafefeatures.SlopeIntegrator

Feature transform that compute the slope of input feature over the given number of frames.

Parameters:
  • NbFrames (default=60): Number of frames to integrate together
  • StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

SlopeIntegrator NbFrames=60  StepNbFrames=30

StatisticalIntegrator

class yaafelib.yaafe_extensions.yaafefeatures.StatisticalIntegrator

Feature transform that compute the temporal mean and variance of input feature over the given number of frames.

Parameters:
  • NbFrames (default=60): Number of frames to integrate together
  • SICompute (default=MeanStddev): if ‘MeanStddev’ then compute mean and standard deviation, if ‘Mean’ compute only mean, if ‘Stddev’ compute only stantard deviation.
  • StepNbFrames (default=30): Number of frames to skip between two integration

Declaration example:

StatisticalIntegrator NbFrames=60  SICompute=MeanStddev  StepNbFrames=30