Mastering the Art of APIs – Part 1
By Simon Plenderleith
07 September 2010 | Category: Uncategorized
Editors Note: In his first article for Think Vitamin Simon Plenderleith looks at the wonderful world of APIs. Simon will be back with two further posts on APIs complete with code examples to get you started.
Many sites and web services now provide APIs for us data hungry developers to consume.
These APIs deliver a wealth of valuable data that you can integrate into your sites and apps to create a richer user experience. Like most features however, using third party APIs that you have no direct control over can have its costs too.
In this article I’m going to discuss some of the key issues you should try and think through before you crack out the code.
As the old adage goes, failing to plan is planning to fail, and there’s nothing more embarrassing than discovering that the sexy new API you’ve just integrated into your site is ruining its stability and turning off users. With a little extra planning you can ensure a smooth ride for everyone.
A Good Plan Today is Better than a Perfect Plan Tomorrow
Using APIs needn’t be a hassle, but you really need to know an API’s requirements and the potential issues it may cause for your site as early on as possible.
Ideally, before you get coding, you should note down a rough plan of how everything will interact and details of the data that will be passed back and forth. This should help you to identify potential bottlenecks before they arise, and also prepare you for coding around any API specific caveats.
For example, many APIs that provide “live” data streams have a set interval for refreshing their data (often caching data that is served up by their API so that requests for data don’t constantly hit their database).
In the case of an API that provides weather forecast data, it could mean that you won’t gain any advantage by making requests for new data on every page load, so it’ll be more efficient to set a five minute interval between your API requests. One way this kind of issue can be handled, is by Cron scheduling a script to retrieve the data (which I’ll be covering further on).
You may also want to display a “Last updated on” date and time when you display the data on the page, so that users know how recent the information is. Giving your users context is really important in ensuring that the information you’re providing them with is as rich and useful as possible.
Authentication & API Keys
Delayed data streams are just one thing that you may have to take into account. Another thing to consider is API authentication. Many APIs require you to authenticate with a username or password, or by using OAuth. Others simply require you to pass an API key through with every request, in which case you’ll likely need to request one from the service provider.
Most services will issue you with an API key straight away, but some require approval of your site or application before they give you access. Make sure you factor this into your development schedule, especially if it’s tight and there are critical features that are dependent on API access.
If you spend some time getting familiar with the API you intend to use, and know the specifics of how and what data will be passed around, you should be in pretty good shape to get coding. Remember to keep it simple. Working with APIs isn’t usually too difficult, but planning the finer details will allow the bigger parts of the picture to come together more easily.
Keep it Light
One of the keys to painless API integration is to try and keep everything as light as possible; whether that be in terms of the amount of data transferred, the frequency of API requests or the loading speed of pages which are dependent on an API. If you’re not careful, using third party APIs can end up causing a significant bottleneck on your site.
When you have the option, it’s best to request data in the format that is most appropriate to the scripting language that you’re using and the data that’s being transferred. You can usually pull back data in at least one of the following formats: JSON, XML, serialized PHP, YAML or CSV.
Each of these formats has different native language support and data structure types. For example, PHP natively supports JSON parsing (since 5.2.0), as do the JavaScript engines of many modern web browsers, whereas other languages such as Ruby have a variety of supporting libraries and toolkits.
PHP also natively supports XML parsing, and most scripting languages have good libraries and toolkits for this as well.
JSON Vs XML
The major advantage of JSON over XML is that it’s an incredibly light format with minimal cruft around the actual data, meaning that there’s less data to be transferred and parsed. This is because JSON is a “data interchange” format, whereas XML is a document markup language (like HTML).
It makes sense then to use JSON for small chunks of data and try to only use XML when you’re dealing with fully formed documents or XML-RPC based web services.
By using an efficient data format your API calls will be faster as there is less data to be transferred, and your scripts will use less memory too as there is less data to be parsed. There is always going to be an overhead incurred in receiving and parsing data from an API, but if you can request data in a format that is easy to work with in your scripting language of choice then it will make life a lot easier.
The Not Quite “Real Time Web”
Despite the popular idea of the “real-time web”, a lot of the data that you display to your users doesn’t have to be current to the second. When you’re planning how to integrate an API, it’s worth taking some time to figure out what requests can be cached or delayed (e.g. by being added to a queueing system).
In the earlier example of using an API that provides weather forecast data, if the service is only updating the data that is available via the API every five minutes, then it makes sense to cache the data that you receive and only refresh this cache with new data every six minutes.
In this particular scenario it would be best to have a script on your server that calls the API on a Cron schedule and saves the data that is returned to a file or database, or even better a fast in-memory caching system such as memcached.
This cached data can then be read and displayed to your users by the front-end scripts on your website. You’ll now be able to serve up pages to faster to your users, as you don’t need to make an API call on every page request, and you’ll also be reducing your dependency on the API itself.
Queuing
Another way to help keep things speedy for your users is to put your API calls in a queue. If your users don’t need to see the results from an API call straight away then you can just drop the API requests into a database table. This database table can then function as a queue, with a Cron scheduled script on the server pulling back the records periodically and making the API calls.
Once the API call has been processed from the queue you can then add the results of all the processing to the database so that the user can view them in your application. For more complex queuing solutions you might want to consider using something like Beanstalk, which is a fast in-memory workqueue service.
These are just simple examples of where caching and queueing can come in handy when working with APIs, but I’ll be exploring these subjects in more detail in my next article.
Know Your Rates
It’s important to know of any rate limits that APIs you’re using have. Most services are pretty generous with their allowances for API usage, but if you’re expecting to put a heavy load on an API make sure you speak to the company who’s providing the service.
It’s usually greatly appreciated – nobody wants to be up at 3am to fix servers that have been overloaded by a heavy API user – and often you can arrange something that keeps everyone happy. It’s not only polite, but if you take down their API it will affect the stability of your own website or app, and it’s unlikely that they’re going to let you continue using their service!
It’s easy to forget to plan for high levels of API usage and its effects when you’re developing shiny new features, or when traffic is low, but it could take you offline if you don’t. If you’re always planning for heavy loads and worst case scenarios, then ultimately you’ll be able to build a more robust and scalable application.
Don’t Pick Sides, Sit on the Fence
The beauty of APIs is that as long as you send them the right information, it doesn’t matter how you call them. On the web, this means that you can mix and match server-side (PHP, Ruby, Python etc.) and client-side (JavaScript) scripting when working with APIs.
One of the biggest potential issues with making API requests through a server-side script is that a bunch more processing power and memory will be sucked up on your server, especially if you’re pulling in large chunks of data.
If you’re sending or receiving sensitive or private data then unfortunately this is the only viable option, as calling an API through client-side scripting potentially exposes the data that is being transferred back and forth. However, if this doesn’t matter for some of your API requests (perhaps even all of them) then it’s a piece of cake to make API calls on the client-side.
The most common method of calling APIs on the client-side is with JavaScript and JSONP. To make things even easier you can use a JavaScript library like jQuery which will handle the nitty gritty of making the API call via XMLHttpRequest (typically referred to as ‘AJAX’) and parsing the returned JSON.
The great thing about making API calls directly in this way is that there’s no resource hit on your server, as the JavaScript is executed in the client’s web browser and this then interacts directly with the API. This means that even if you’re receiving heavy traffic you don’t have to worry about the API calls causing a bottleneck on your server.
Options
As with server-side scripts, caching is your friend in helping lighten the load on the client-side. You can reduce future requests to the API and the subsequent processing on the client-side by caching data in JavaScript cookies, or for larger pieces of data you might want to consider Web Storage (although it is still a W3C draft and not all browsers fully support it).
The great part about all of this is that you have options in terms of how you work with APIs. If your server’s getting bogged down with server-side scripts calling an API then you might be able to make some of those calls on the client-side and eliminate any impact they’re having on your server.
The nice thing about calling APIs from server-side scripts however, is that it gives you more control over how you handle things like processing the data and caching it.
Unless you’re dealing with sensitive data there’s no right or wrong way of calling APIs, but there may be significant benefits to handling things on the client-side instead of the server-side and vice versa. Knowing how you’re going to tackle these kinds of challenges will make your planning far more complete and reduce the chance of growing pains further down the line.
Wrapping Up
Hopefully this article has given you some useful starting points for thinking about how you work with APIs. Over the coming weeks, I’m going to be writing a couple more articles with code examples that will take a more in depth look at the issues I’ve covered here.
Meanwhile, if you fancy digging further into everything API related, here are some handy links (as well as a recap of some of the links in this article):
- Apigee – Free API analytics and management
- OAuth Community Site
- Introducing JSON
- JSONP on Wikipedia
- Cron on Wikipedia
- Memcached – Free & open source, high-performance, distributed memory object caching system
- Beanstalk – A simple, fast workqueue service

