Use APIs to do market research

By Andrew Odewahn
July 29, 2009 | Comments: 4

The question "How long is the ideal technical video?" recently came up in an internal O'Reilly discussion. People chimed in with a variety of opinions, ranging from "I like something an hour long that's really meaty" to "If it's longer than 2 minutes, there's no way I'll watch it." These sorts of basic product attribute questions are crucial elements in any product or marketing strategy, but it's often simply too difficult or expensive to get timely information when you're trying to get a new project going.

However, a quick script that pulls data from a relevant website's API can often give you an answer that's good enough. It's certainly not a definitive answer, but introducing real-world data into a conversation has a wonderful way of helping people focus on core assumptions.

This post gives you a few tips on using this great resource for marketing research, and provides a case study on how we used the YouTube API to do some off the cuff research about technical videos.

Identify a relevant API

The first step is to find a credible API that has enough data to be relevant and reliable. Programmableweb is a great place to start. With over 63 categories, you can usually find something relevant. For example, want to know the best time of year to host an event? Use an Event API to find out when events in your industry are held. (For extra credit, cross-reference this against a Mapping to find popular locations.) Want to do some competitive research about a new blog or content site you're launching? Use an API in the Feed category. Looking for the market penetration of Panko? Use one of the Food APIs.

For O'Reilly's question about video length, the YouTube Search API was a no brainer. The API call looks something like this:

http://gdata.youtube.com/feeds/api/videos?q=java+programming

It returns results in a variety of formats (XML, JSON, etc), comes from a reputable source, and is easily query-able. Bingo.

Explore the API's query string options
Once you've identified the API you want to work with, the next step is to determine the query strings that control the results that are returned. This is often the hardest part, as APIs are usully poorly documented or inconsistent. However, you can usually find what you're looking for if you are persistent and willing to dig.

For example, I found a a great video called YouTube APIs: Search Explained that had practically everything you'd want to know about the query string parameters for the YouTube API. Most of the major APIs will have something similar, but again, you'll probably have to dig.

Here are the parameters that I wound up using for my video project:

Parameter Description Value
q The search term used to find videos; note that this must be URL encoded java+programming
mysql
etc...
max-results The number of results returned on a single page; if you want more than 10 results, you have to pull out the link for the next page of results 10
orderBy Determines how the results are sorted (i.e., views, rating, etc). viewCount
alt Determines how the format of for how the results are returned (XML, json, etc). I like to use json because it's easy to pull out the data you want. json json

Here's the format to pass these parameters in the URL for YouTube API:

http://gdata.youtube.com/feeds/api/videos?q=java+programming&max-results=10&orderby=viewCount&alt=json

Explore the data in the feed

Once you've identified an appropriate API and query string parameters, the next step is to identify what you want to pull out. The best way to learn is to just browse through the result set until you start to see the patterns.

There are several tools to help. For XML data, you can use your browser's default XML-stylesheet. JSON data is slightly more complicated because there is no default browser, but there's a great Firefox plugin called JSONview that does much the same thing. The next figure shows the JSONview formatted output for the YouTube query from the last section.

JSONview

Write a Parser
It's unlikely that you'll be able to use common analytical tools (Excel, R, etc) directly against the data. So, you'll need to write a parser to pull out the elements you want and put them into some sort of simpler format (like a tab-delimited text file) for analysis.

This is where JSON really shines. There are libraries for practically every major language that you can use to read the data directly into a data structure inside your program, so you can spend your time analyzing data, not writing complex parsers.

So, if you have a choice of formats in an API, opt for JSON whenever possible. (Other people may have different opinions -- I'd love to hear people's experiences with XML tools).

Here, for example, is a script called yt_ranks.py that processes data from YouTube API. (It uses simplejson to decode the JSON data, so you'll need to install this module to use the script.) The following snippet shows how simple it is to extract the data elements we needed to answer our video length question.


...
search_results = urllib2.urlopen(url)  #URL is a string with the YouTube API
json = simplejson.loads(search_results.read())
for r in json['feed']['entry']:
      try:
         title = r['title']['$t'].encode('ascii', 'replace') 
         rating= r['gd$rating']['average']
         view_count = r['yt$statistics']['viewCount']
         duration = r['media$group']['yt$duration']['seconds']
         published = r['published']['$t'].encode('ascii', 'replace') 
         id = r['$id']['$t'].encode('ascii', 'replace') 
         print "%s\t%s\t%s\t%s\t%s\t%s" % (term.rstrip("\n"), title, duration, view_count, rating, published)
...

This data is printed as a tab-delimited string to stdout; you can use output redirection to send it to a file.

Analyze the results
Once you've got the data file, you'll need to comb through it to weed out garbage data. For example, in our video project, I had to cull videos that were related to a technical term but not relevant to my project. (Did you know that "Ajax" was a soccer club in the Netherlands? I didn't until this project.)

Once you've gone through the data -- and don't neglect this step, since nothing will destroy an argument quicker than a ludicrous search result! -- you can then import the results into Excel or some other tool and have at it.

Good luck, and I'd love to hear any projects where you've done when you've used an API for market research.

ps --

Here's what I concluded about technical videos:

~25% of the top 10 videos are under 2 minutes
~50% of the top 10 are under 6 minutes
~75% of the top 10 are under 10 minutes

Then, you go out a long way through a dead zone until you start to get to over an hour, when the percentage of videos in that category shoots back up. (For example, videos over 60 minutes account for 7 percent of the top 10). So, from this, I'd say videos need to be under 10 minutes or over an hour, but not in between.


You might also be interested in:

4 Comments

I think that the secret of success in the online marketing research varies. Determined on the nature of your own websites, the best approach might differed from some other kinds of individual blogs.
In order to lead your own online marketing research into success, it's necessary to check the marketing research approaches of different kinds of websites and try to follow the pattern of the websites with similar nature to yours.

You obviously put a lot of work into that post and its very interesting to see the thought process that you went through to come up with those conclusion. Thanks for sharing your deep thoughts. I must admit that I think you nailed it on this one.
I'm new to the online marketing research business. I'd love to learn more about this topic. I've bookmarked your blog. Looking forwards for your new posts.

You obviously put a lot of work into that post and its very interesting to see the thought process that you went through to come up with those conclusion. Thanks for sharing your deep thoughts. I must admit that I think you nailed it on this one.

Thanks,you bring up a good point that: find a credible API that has enough data to be relevant and reliable. Programmableweb is a great place to start. With over 63 categories, you can usually find something relevant. For example, want to know the best time of year to host an event? Use an Event API to find out when events in your industry are held. (For extra credit, cross-reference this against a Mapping to find popular locations.) Want to do some competitive research about a new blog or content site you're launching? Use an API in the Feed category. Looking for the market penetration of Panko? Use one of the Food APIs.

News Topics

Recommended for You

Got a Question?