iPhylo: cloud

Roderic D. M. Page

Showing posts with label cloud. Show all posts

Tuesday, June 15, 2021

Compiling a C++ application to run on Heroku

TL;DR Use a buildpack and set "LDFLAGS=--static" --disable-shared

I use Heroku to host most of my websites, and since I mostly use PHP for web development this has worked fine. However, every so often I write an app that calls an external program written in, say, C++. Up until now I've had to host these apps on my own web servers. Today I finally bit the bullet and learned how to add a C++ program to a Heroku-hosted site.

In this case I wanted to add CRF++ to an app for parsing citations. I'd read on Stack Overflow that you could simply log into your Heroku instance using

heroku run bash

and compile the code there. I tried that for CRF++ but got a load of g++ errors, culminating in:

configure: error: Your compiler is not powerful enough to compile CRF++.

Turns out that the g++ compiler is only available at build time, that is, when the Heroku instance is being built before it is deployed. Once it is deployed g++ is no longer available (I'm assuming because Heroku tries to keep each running instance as small as possible).

So, next I tried using a buildpack, specifically felkr/heroku-buildpack-cpp. I forked this buildpack, and added it to my Heroku app (using the "Settings" tab). I put the source code for CRF++ into the root folder of the GitHub repository for the app (which makes things messy but this is where the buildpack looks for either Makefile or configure) then when the app is deployed CRF++ is compiled. Yay! Update: with a couple of tweaks I moved all the code into a folder called src and now things are a bit tidier.

Not so fast, I then did

heroku run bash

and tried running the executable:


heroku run bash -a <my app name>

./crf_learn

/app/.libs/crf_learn: error while loading shared libraries: libcrfpp.so.0: cannot open shared object file: No such file or directory

For whatever reason the executable is looking for a shared library which doesn’t exist (this brought back many painful memories of dealing with C++ compilers on Macs, Windows, and Linux back in the day). To fix this I edited the buildpack compile script to set the "LDFLAGS=--static" --disable-shared flags for configure. This tells the compiler to build static versions of the libraries and executable. Redeploying the app once again everything now worked!

The actual website itself is a mess at the moment ~~so I won't share the link just yet~~Update: see Citation parsing tool released for details, but it's great to know that I can have both a C++ executable and a PHP script hosted together without (too much) pain. As always, Google and Stack Overflow are your friends.

Friday, April 17, 2020

A planetary computer for Earth

Came across Microsoft's announcement of a "A planetary computer for a sustainable future through the power of AI", complete with a glossy video featuring Lucas Joppa @lucasjoppa (see also @Microsoft_Green and #AIforEarth).

On the one hand it's great to see super smart people with lots of resources tackling important questions, but it's hard not to escape the feeling that this is the classic technology company approach of framing difficult problems in ways that match the solutions they have to offer. Is the reason that biodiversity is declining simply because we have lacked computational resources, that our AI isn't good enough? And while forests that have been stripped of both their mega fauna and previous human inhabitants make for photogenic backdrops, biodiversity can be a lot messier (and dangerous). Still, it will be interesting to see how this plays out, and what sort of problems the planetary computer is used to tackle.

Friday, July 31, 2015

Towards a new BioStor

One of my pet projects is BioStor, which has been running since 2009 (gulp). BioStor extracts articles from the Biodiversity Heritage Library (details here: http://dx.doi.org/10.1186/1471-2105-12-187), and currently has over 110,000 articles, all open access. The site itself is showing its age, both in terms of performance and design, so I've wanted to update it for a while now. I made a demo in 2012 of BioStor in the Cloud, but other stuff got in the way of finishing it, and the service that it ran on (Pagodabox) released a new version of their toolkit, so BioStor in the Cloud died.

At last I've found the time to tackle this again, motivated in part because I've had to move BioStor to a new server, and it's performance has been pretty poor. The next version of BioStor is currently sitting at http://biostor.gopagoda.io (the images and map views are good ways to enter the site). It's still being populated, and there is code to tweak, but it's starting to look good enough to use. It has a cleaner article display, built in search (making things much more findable), support for citation styles using citeproc-js, and display of altmetrics (e.g., Another variation on the gymnure theme: description of a new species of Hylomys (Lipotyphla, Erinaceidae, Galericinae).

Once all the data has been moved across and I've cleaned up a few things I plan to make bistor.org point to this new version.

Thursday, November 22, 2012

BioStor in the cloud

Quick note on an experimental version of BioStor that is (mostly) hosted in the cloud. BioStor currently runs on a Mac Mini and uses MySQL as the database. For a number of reasons (it's running on a Mac Mini and my knowledge of optimising MySQL is limited) BioStor is struggling a bit. It's also gathered a lot of cruff as I've worked on ways to map article citations to the rather messy metadata in BHL.

So, I've started to play with a version that runs in the cloud using my favourite database, CouchDB. The data is hosted by Cloudant, which now provides full text search powered by Lucene. Essentially, I simply take article-level metadata from BioStor in BibJSON format and push that to Cloudant. I then wrote a simple wrapper around querying CouchDB, couple that with the Documentcloud Viewer to display articles and citeproc-js to format the citations (not exactly fun, but someone is bound to ask for them), and a we have a simple, searchable database of literature.

If you want to try the cloud-based version go to http://biostor-cloud.pagodabox.com/ (code on Github).

Bcloud

I've been wanting to do this for a while, partly because this is how I will implement my entry in EOL's computational data challenge, but also because CrossRef's Metadata search shows the power of finding references simply by using full text search (I've shamelessly borrowed some of the interface styling from Karl Ward's code). David Shorthouse demonstrates what you can do using CrossRef's tool in his post Conference Tweets in the Age of Information Overconsumption. Given how much time I spend trying to parse taxonomic citations and match them to articles in CrossRef's database, or BioStor, I'm looking forward to making this easier.

There are two major limitations of this cloud version of BioStor (aprt from the fact it has only a subset of the articles in BioStor). The first is that the page images are still being served from my Mac Mini, so they can be a bit slow to load. I've put the metadata and the search engine in the cloud, but not the images (we're talking a terabyte or two of bitmaps).

The other limitation is that there's no API. I hope to address this shortly, perhaps mimicking the CrossRef API so if one has code that talks to CrossRef it could just as easily talk to BioStor.

Tuesday, July 14, 2009

Zotero: creating bibliographies in the cloud

Lately I've become more and more interested in moving data off my machine(s) and into the cloud. I'm keen to do this partly to avoid having data in one place (e.g., a machine at work) when I need it someplace else (e.g., at home), and there are great tools for doing this (such as the wonderful Dropbox).

As a developer, the cloud appeals, not so much because of the compute power that some are salivating over, but because it may free me from having to create my own software. For example, some time ago I have created an OpenURL resolver to help me find articles online. I harvest a bunch of sources, such as CrossRef, PubMed, some OPAI respositories, etc., but there's always times where I find a reference online that I'd like to add, and that reference doesn't have an identifier such as a DOI.

Typically I add these manually, or by importing a file. I could write some interface code to add (and edit) a bibliographic reference (and, indeed I did some time back), but wouldn't it be great if somebody else had done this for me?

Well, there are some tools out there for handling bibliographies online, such as Connotea, Mendeley, and Zotero (a Firefox add-on). Initially I was skeptical of Zotero (and I'm not a big Firefox user), but now that I'm looking for a place to store obscure papers it's rapidly growing on me. I like the fact that I can add references in situ, and that I can upload PDFs (which can be stored remotely on a WebDAV disk such as an iDisk). But what makes Zotero even more attractive is that it generates an RSS feed of my bibliography, which I can then harvest just as I harvest other resources.

Using a resource like Zotero saves me the hassle of having to write my own bibliographic editor, plus I benefit from using a tool that's a lot more polished than one I could make. Because of this, and my experience with the Google Spreadsheets API, I'm ultimately aiming to never have to write a user interface again. If I write services, and rely on third parties to make tools that can either generate services I can use, or consume my services, then my life becomes a lot simpler.

OK, perhaps I exaggerate. I like making interfaces, such as my eBio09 entry, or the experiment with SpaceTree. However, I can imagine a situation where I don't have to write a data entry interface ever again.

Tuesday, April 21, 2009

CrossRef fail - at least we're not alone...

CrossRef has been having some issues with it's OpenURL resolver over the weekend, which means that attempts to retrieve metadata from a DOI, or to find a DOI from metadata, have been thwarted. While annoying (see The dangers of the ‘free’ cloud: The Case of CrossRef), in one sense it's reassuring that it's not just biodiversity data providers that are having problems with service availability.