A Statistical Investigation of Long Memory in Language and Music

This repository contains code, data, and illustrative Juypiter notebook detailing the analysis of natural language and music data, along with RNN models trained on such data, using statistical tools developed for long memory stochastic processes. Full details of this methodology and the associated experimental results can be found in the paper:

A Statistical Investigation of Long Memory in Language and Music
Alexander Greaves-Tunnell and Zaid Harchaoui

Structure of the repository

The main concepts and experimental results of the paper are illustrated in the Jupyter notebook Long_Memory_in_Language_and_Music.ipynb; we recommend any interested users to start here for an introduction to the statistical long memory analysis of RNNs. The companion notebook Data_Downloads_and_Embeddings.ipynb offers further information and additional code for users interested in the details of the experiments reported in the paper.

Dependencies

Code for this notebook is implemented in Python 3. The major dependencies are on

NumPy
SciPy
PyTorch (Code was implemented in PyTorch 0.4 but is compatible with version 1.0)
matplotlib

Setup

We recommend using Anaconda to create a virtual environment in which this code can be run. With Anaconda installed, a user can clone this repo and set up an environment with the required dependencies via:

git clone https://github.com/alecgt/RNN_long_memory.git
cd RNN_long_memory/
conda env create -f RNN_long_memory.yml

Note: Access to GPU resources is neither assumed nor required to run the examples in Long_Memory_in_Language_and_Music.ipynb or to run the long memory evaluation tools in src/eval/longmem_estimation.py. However, these are highly recommended for users wishing to train their own RNNs.

Data

We have provided some example text and music data that will be required for users interested in running the code in Long_Memory_in_Language_and_Music.ipynb. Two simple additional steps will also be required:

Unzip the embedded music data:

cd data/
tar -xvzf bach_cello_suite.tar.gz

Obtain the GloVe word vectors used to embed the text data. Specifically, we require that the file glove.6B.50d.txt be downloaded to the data/ directory. This is available from Stanford NLP:

cd data/
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip

Citation

If you find these tools useful in your own work, please cite:

Greaves-Tunnell, A. & Harchaoui, Z. (2019). A Statistical Investigation of Long Memory in Language
  and Music. Proceedings of the 36th International Conference on Machine Learning, in PMLR 97:2394-2403

The BibTex reference is:

@InProceedings{pmlr-v97-greaves-tunnell19a,
  title = 	 {A Statistical Investigation of Long Memory in Language and Music},
  author = 	 {Greaves-Tunnell, Alexander and Harchaoui, Zaid},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2394--2403},
  year = 	 {2019},
  volume = 	 {97}
}

Acknowledgments

This work was supported by the Big Data for Genomics and Neuroscience Training Grant 8T32LM012419, NSF TRIPODS Award CCF-1740551, the program ``Learning in Machines and Brains" of CIFAR, and faculty research awards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Statistical Investigation of Long Memory in Language and Music

Structure of the repository

Dependencies

Setup

Data

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
models		models
src		src
Data_Downloads_and_Embedding.ipynb		Data_Downloads_and_Embedding.ipynb
Long_Memory_in_Language_and_Music.ipynb		Long_Memory_in_Language_and_Music.ipynb
README.md		README.md
RNN_long_memory.yml		RNN_long_memory.yml

alecgt/RNN_long_memory

Folders and files

Latest commit

History

Repository files navigation

A Statistical Investigation of Long Memory in Language and Music

Structure of the repository

Dependencies

Setup

Data

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages