# pliers
A Python 3 package for automated feature extraction.
## Status
* [](https://travis-ci.org/tyarkoni/pliers)
* [](https://coveralls.io/github/tyarkoni/pliers?branch=master)
## Overview
Pliers is a Python package for automated extraction of features from multimodal stimuli. It provides a unified, standardized interface to dozens of different feature extraction tools and services--including many state-of-the-art deep learning-based APIs. It's designed to let you rapidly and flexibly extract all kinds of useful information from videos, images, audio, and text.
You might benefit from pliers if you need to accomplish any of the following tasks (and many others!):
* Identify objects or faces in a series of images
* Transcribe the speech in an audio or video file
* Apply sentiment analysis to text
* Extract musical features from an audio clip
* Apply a part-of-speech tagger to a block of text
Each of the above tasks can typically be accomplished in 2 - 3 lines of code with pliers. Combining them *all*--and returning a single, standardized, integrated DataFrame as the result--might take a bit more work. Say maybe 5 or 6 lines.
In a nutshell, pliers provides an extremely high-level, unified interface to a very large number of feature extraction tools that span a wide range of modalities.
## How to cite
If you use pliers in your work, please cite both the pliers GitHub repository (http://github.com/tyarkoni/pliers) and the following paper:
> McNamara, Q., De La Vega, A., & Yarkoni, T. (2017, August). [Developing a comprehensive framework for multimodal feature extraction](https://dl.acm.org/citation.cfm?id=3098075). In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1567-1574). ACM.
## Documentation
The official pliers documentation is quite thorough, and contains a comprehensive [quickstart](http://tyarkoni.github.io/pliers/quickstart.html) doc (also available below), [user guide](http://tyarkoni.github.io/pliers/) and complete [API Reference](http://tyarkoni.github.io/pliers/reference.html).
## Installation
For the latest release:
> pip install pliers
Or, if you want to work on the bleeding edge:
> pip install pliers git+https://github.com/tyarkoni/pliers.git
### Dependencies
By default, installing pliers with pip will only install third-party libraries that are essential for pliers to function properly. These libraries are listed in requirements.txt. However, because pliers provides interfaces to a large number of feature extraction tools, there are literally dozens of other optional dependencies that may be required depending on what kinds of features you plan to extract (see optional-dependencies.txt). To be on the safe side, you can install all of the optional dependencies with pip:
> pip install -r optional-dependencies.txt
Note, however, that some of these Python dependencies have their own (possibly platform-dependent) requirements. For example, python-magic requires libmagic (see here for installation instructions), and without this, you’ll be relegated to loading all your stims explicitly rather than passing in filenames (i.e., `stim = VideoStim('my_video.mp4')` will work fine, but passing 'my_video.mp4' directly to an `Extractor` may not). Additionally, the Python OpenCV bindings require OpenCV3--but relatively few of the feature extractors in pliers currently depend on OpenCV, so you may not need to bother with this. Similarly, the `TesseractConverter` requires the tesseract OCR library, but no other `Transformer` does, so unless you’re planning to capture text from images, you’re probably safe.
### API Keys
While installing pliers itself is usually straightforward, setting up some of the web-based feature extraction APIs that pliers interfaces with can take a bit more effort. For example, pliers includes support for face and object recognition via Google’s Cloud Vision API, and enables conversion of audio files to text transcripts via several different speech-to-text services. While some of these APIs are free to use (and virtually all provide a limited number of free monthly calls), they all require each user to register for their own API credentials. This means that, in order to get the most out of pliers, you’ll probably need to spend some time registering accounts on a number of different websites. More details on API key setup are available [here](http://tyarkoni.github.io/pliers/installation.html#api-keys).
## Quickstart
A detailed user guide can be found in the [pliers documentation](http://tyarkoni.github.io/pliers/); below we provide a few brief examples illustrating the flexibility and utility of the package. An executable Jupyter Notebook containing all of the examples can be found [here](https://github.com/tyarkoni/pliers/blob/master/examples/Quickstart.ipynb).
### Face detection
This first example uses the face_recognition package's location extraction method to detect the location of Barack Obama's face within a single image. The tools used to do this are completely local (i.e., the image isn't sent to an external API).
We output the result as a pandas DataFrame; the `'face_locations`' column contains the coordinates of the bounding box in CSS format (i.e., top, right, bottom, and left edges).
```python
from pliers.extractors import FaceRecognitionFaceLocationsExtractor
# A picture of Barack Obama
image = join(get_test_data_path(), 'image', 'obama.jpg')
# Initialize Extractor
ext = FaceRecognitionFaceLocationsExtractor()
# Apply Extractor to image
result = ext.transform(image)
result.to_df()
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>onset</th>
<th>order</th>
<th>duration</th>
<th>object_id</th>
<th>face_locations</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>(142, 349, 409, 82)</td>
</tr>
</tbody>
</table>
</div>
### Face detection with multiple inputs
What if we want to run the face detector on multiple images? Naively, we could of course just loop over input images and apply the Extractor to each one. But pliers makes this even easier for us, by natively accepting iterables as inputs. The following code is almost identical to the above snippet. The only notable difference is that, because the result we get back is now also a list (because the features extracted from each image are stored separately), we need to explicitly combine the results using the `merge_results` utility.
```python
from pliers.extractors import FaceRecognitionFaceLocationsExtractor, merge_results
images = ['apple.jpg', 'obama.jpg', 'thai_people.jpg']
images = [join(get_test_data_path(), 'image', img) for img in images]
ext = FaceRecognitionFaceLocationsExtractor()
results = ext.transform(images)
df = merge_results(results)
df
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>source_file</th>
<th>onset</th>
<th>class</th>
<th>filename</th>
<th>stim_name</th>
<th>history</th>
<th>duration</th>
<th>order</th>
<th>object_id</th>
<th>FaceRecognitionFaceLocationsExtractor#face_locations</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>/Users/tal/Dropbox/Code/pliers/pliers/tests/da...</td>
<td>NaN</td>
<td>ImageStim</td>
<td>/Users/tal/Dropbox/Code/pliers/pliers/tests/da...</td>
<td>obama.jpg</td>
<td></td>
<td>NaN</td>
<td>NaN</td>
<td>0</td>
<td>(142, 349, 409, 82)</td>
</tr>
<tr>
<th>1</th>
<td>/Users/tal/Dropbox/Code/pliers/pliers/tests/da...</td>
<td>NaN</td>