Web-enable your Research/Project with an API
This post is intended to help data scientists and engineers who, in some capacity, have implemented routines/algorithms/data that does a specialized function (e.g. machine learning) using a dynamically typed language, such as Python. The goal is to web-enable these routines/algorithms using an application programming interface (API).
Exposing these functions/data as an API allows for:
- Easier, faster, and consistent sharing of functionality/data that could further progress the research. A good example of this is the Materials Project from the Lawrence Berkeley National Laboratory. They deemed it necessary for the scientific community to have access to their data, hence exposing it as an API.
- Generating revenue, either as a business entity (e.g. university spinout) or simply to cover the costs of running the API infrastructure. One case that comes to mind is Text Processing, respectively. You can also check out a list of 50+ Machine Learning APIs (some require a paid subscription)
Another general reason for implementing an API layer on your functionality/data is that the web technology and the available libraries and tools have progressed to a point where complexity is kept at a minimum e.g. enabling one to pick and choose their preferred technology stack with ease.
For this post, we will use Python as the programming language because aside from its deep history with the scientific community, it has several scientific and numeric packages available for it (e.g. NumPy, SciPy, etc). If you’re already using some of these, then this post applies to you (even if we’re not going to use these libraries). If Python is not your cup of tea, rest assured that the concepts described here apply all the same. The approach/steps below assume that you could be coming from different languages/platforms.
To add an API layer to your project, we would need three main ingredients:
- Web hosting
- Your project, and
- Framework/library to implement a REST API
Let’s represent them as diagrams so you can visualize how the will fit together.
The final implementation will look like the following:
What is REST?
REST stands for Representational State Transfer. It is a web “architecture” that lets us access/modify web resources through HTTP (yes similar to the browser “http://” prefix). For example, under the REST architecture, your endpoint ideally should be accessible using a browser as “http://yourdomain.org/yourproject/endpoint?parameter=value”. This is an alternative to web services (SOAP, XML, etc). For most programming languages, there is a REST framework available.
4 Steps to gathering and putting this all together:
- Searching for web hosting
- Searching for a REST framework library
- Choosing your endpoints
- Putting it all together
1. Searching for web hosting
Web hosting is a service that allows us to host our application to be served over the web. The server where our application resides will “serve” up requests from external people/software that want to access our API. Depending on the language you used for your application, there are usually free or cheap options for web hosting. In most cases you’d probably want to try a Freemium model where usage have limits (e.g. 500MB storage, 1GHz processor, etc). There are also “cloud” options where you can “spin” up virtual machines on demand (useful for cases where you’re not expecting demand for your API to spike up as a result of being featured in a popular syndication site like kdnuggets.com). These cloud options are usually better because they have additional bells and whistles such as an integrated development environment
Since we can’t account for all permutations of programming languages, hosting, and cloud options out there, you could cut through the Google search and go straight to Quora.
Quora is a question and answer website moderated by a community of users. With Quora, people can “vote up” answers they like, eventually putting the best answer at the top. But more important, these answers have additional commentary that provides context. Here’s a search for “python hosting”. Another site to check out is Stackoverflow, which caters to more technical and specific programming questions.
The last result at the bottom seems relevant to us – http://www.quora.com/Python-programming-language-1/Where-do-you-host-your-Python-based-web-apps
There’s no silver bullet to choosing the right hosting, that is unless you test them all out individually. In some cases, you don’t have to deploy a fully working application to decide whether it’s the right platform/hosting or not. If it’s starting to “get in the way” (e.g. getting too many errors, few support systems, etc.) while you’re trying to do something, then it’s safe to try another one. It’s that simple.
This is of course biased, given that everyone will have different levels of technical background, experience, etc. The key thing here is to choose the one that helps you.
In my case, I eventually chose PythonAnywhere simply because they have this hovering help box that guides you throughout the process of deploying a “Hello World” application:
What is Hello World?
Hello World is a term that refers to the simplest code one could ever write on a specific language to get it to print out “Hello World”. It is a good way to introduce novice programmers trying out a new language or platform.
2. Searching for a REST framework library
Now that we know about Quora, this should be quick – What is a good Python framework for building a RESTful API?
The majority seems to like “flask”. It just so happens that PythonAnywhere supports it:
We’re in luck! (Actually I did the two Quora searches above simultaneously before I decided on a combination of PythonAnywhere and Flask. So I’m cheating. But you get the point :P).
3. Choosing your application (endpoints)
If you already have an application that you want to put an API layer on, then you don’t have to search for a project. In my case though, I had to find an example that I can use for this post.
You do however have to figure out the application functionality or data that you want to expose as an API endpoint. This will dictate the REST structure of your endpoint. Let’s use an example Python application I picked to demonstrate this.
Just a few days ago someone in Hacker News posted an algorithm for summarizing paragraphs of text using Python. I figured the programming world can benefit from it if the algorithm is exposed as an API. But what information does my API need to summarizing text? Looking at the source code snippet for the application:
The mainSummaryTool object accepts “content” andconsequently a “title”. As a REST endpoint, this could look like “http://yourdomain.org/yourproject/summarize?title=&content=<content here>”
4. Putting it all together
I would assume that a majority of you at this point have only been reading and haven’t actually signed up/tried the service/library I mentioned above. That’s ok. Now’s the time to do that.
Let’s start with PythonAnywhere and Flask:
a. Go to https://www.pythonanywhere.com and sign up for a “Noob” account. Like I mentioned above, you can try out these services first and decide later if you want to commit.
After signing up choose “I want to create a web application”
b. It’s pretty much smooth-sailing from here. Just follow the instructions at the green box at the top that will show you how you can create your Hello World application. We will build on top of this Hello World app later to import the Flask library and add the Summarization code.
There are two interesting points here that you need to be mindful of during this process:
Your domain will look similar to this:
It’s ok *not* to upgrade at this point.
Next is the framework. Click on “Flask”, then click Next to choose the default file name for your project.
It will take a few minutes to spin up your new application, but that should be all you have to do to have a working Hello World Python running on PythonAnywhere with flask.
c. You can go to the browser to access it:
Now let’s integrate our Summarization code with this new Hello World app. I’ve pasted the code here for your convenience.
But here are the relevant parts that were added:
The first yellow box at the top highlights the libraries required by the Summarization algorithm (copied over at the bottom in the picture), plus the urllib2. The 2nd yellow box highlights imports that are required by our additional code highlighted in the 3rd yellow box.
The 3rd yellow box highlights the main addition to the original Hello World code, which describes the new endpoint route “/summarize” accepting two parameters, “title” and “content”. We then pass them to the SummaryTool object, and the results are sent back as JSON to the calling application.
What is JSON?
Once you’ve edited your Hello World application like this, make sure to save “Reload” your application to apply the changes. You can reload your app from the Dashboard -> Web tab:
Our API is now ready!
Try it out by clicking this . You should get a summarized output based on the title and content provided to the API.
To recap: We have API-enabled our Python app using a web hosting/cloud service, and a REST framework library. This will make it accessible to the rest of the software community and hopefully further progress of its research and foster innovative implementations/mashups.
There are other considerations not covered in this post such as security (OAuth, HTTP Basic, Query key parameters, etc), scalability, rate-limiting/throttling, and other API “to-dos”. I would suggest that you check this exhaustive list of 43 Things to think about when Designing, Testing, and Releasing your API. There’s quite a number of things to check there. Your API journey has just begun!
I’ve saved the best step for last – promoting your API to developers through Mashape.
Mashape is a a marketplace where developers can discover, distribute, and monetize private and public APIs. We make it easy for you to make your API known to the developer world. It’s free to join and add your API to Mashape. You can check out these tutorials on how you can add your API to Mashape – http://docs.mashape.com/add-api
In fact, I put the API we created here up on Mashape. You can test it here – https://www.mashape.com/ismaelc/summarizer-tool
Once your API is up on Mashape, it is just as easy to start adding pricing plans to your API.
Hope this post inspired/help you to turn your research/projects/apps into APIs. If you have questions or ideas for future posts feel free to email me at firstname.lastname@example.org