DeepDrum: An Adaptive Conditional Neural Network for generating drum rhythms

DeepDrum: An Adaptive Conditional Neural Network for generating drum rhythms

Considering music as a sequence of events with multiple complex dependencies, the Long Short-Term Memory (LSTM) architecture has proven very efficient in learning and reproducing musical styles. However, the generation of rhythms requires additional information regarding musical structure and accompanying instruments. DeepDrum is an adaptive Neural Network capable of generating drum rhythms under constraints imposed by Feed-Forward (Conditional) Layers which contain information about:

  • Guitar Performance and Polyphony
  • Bass Performance
  • Tempo Information
  • Metrical Structure
  • Grouping(Phrasing)


Training Data & Architecture

The utilised corpus consists of 70 songs from two progressive rock bands, that have common musical characteristics, and were collected manually from web tablature learning sources. Following the same methodology for conditional composition, similarly to (Makris et al., 2017 & Makris et al., 2018), the representation of input training data was based on text words with one-hot encodings. The dataset is available upon request.

The proposed architecture comprises separate modules for predicting the next drum event. The LSTM module learns sequences of consecutive drum events, while the Conditional (FF) module handle musical information regarding guitar, bass, metrical structure, tempo and grouping. This information is the sum of features of consecutive time-steps, in one-hot encodings, of the Conditional Input space within a moving window giving information about the past, current and future time-steps. DeepDrum has 3 input spaces for different elements of drums, thus leading to 3 LSTM Block modules, while the Conditional input space is separated to 2 FF modules. The Pre-FF carries information for the past time-steps and is merged with each corresponding drum input. The Post-FF contains information for current and future time-steps which are merged with each LSTM block output, thus leading to independent softmax outputs.

DeepDrum Neural Architecture

Concerning the configuration, two stacked LSTM layers and single FF layers (linear activation) with 256 Hidden Units were used along with dropouts of 0.2 in every connection. In our experiments we used Keras  library with Tensorflow deep learning framework as backend.

Experimental Setup & Generations

We validated the proposed architecture by producing drums rhythms according to the conditions given by pieces that were not included in the training set, with some of them having musical characteristics that have been not encountered in any piece of the training corpus (e.g. time signatures 3/8, 9/8). These four pieces, however, pertain to the learned style of the training corpus (denoted as PT(Porcupine Tree)PF(Pink Floyd)). In addition we used 2 pieces from a different genre (in Disco style denoted as AB(Abba)). Multiple generations were produced using initial seed-sentences, in different stages of the learning process with adjustable diversity parameter.

  1. PF1 with 4/4 Time Signature and low tempo. 
  2. PF2 with 4/4 Time Signature and moderate tempo. 
  3. PT1 with 7/8 Time Signature and moderate tempo. 
  4. PT2 with continuous Time Signature changes (4/4, 3/8, 5/8 and 7/8) and high tempo. 
  5. AB1 with sparse Time Signature changes (2/4 and 4/4) and high tempo. 
  6. AB2 with 4/4 Time Signature and high tempo.

Finally you can listen to a compilation of an early version of the CNSL Drum Generator using the aboive training data in the style of PT and PF .

Please read and cite our work if you like:

Makris, Dimos, Maximos Kaliakatsos-Papakostas, and Katia Lida Kermanidis. “DeepDrum: An Adaptive Conditional Neural Network.” arXiv preprint arXiv:1809.06127 (2018).

For more details and analysis please visit the Conditional Neural Sequence Learners for Drums’ Generation page and read the paper published from Sprigner.

Makris, Dimos, et al. “Conditional neural sequence learners for generating drums’ rhythms.” Neural Computing and Applications (2018): 1-12.