The Land of Oz Ozzie Liu

Scraping 3rd-Party Ticket Prices using Stubhub's API

Update 3/5/2017: I’m glad my blog post has been helpful for a lot of users who are interested in getting started with getting data from StubHub’s API using Python. But since the blog post last June, there’s been a lot of changes and I wanted to update my post with the latest details.

First, Stubhub had deprecated the Inventory Search API with a newer version, so the new code is updated to reflect that.

If you’re having trouble subscribing to InventorySearchAPI-v2, see the last section on this post to manually subscribe to it.

Background

I was backpacking through Southeast Asia for the past few weeks, hence the break in posts. It was a much needed break but now that I’m back, I have some fun projects to blog about.

First, I have a colleague working in entertainment that needed to gather ticket prices on 3rd party sales to get a reading on the popularity and pricing of events. Currently, the colleague’s company pays a large sum to a contractor to scrape this data daily. We wanted to see if this was possible and easy to do using the StubHub API.

Initial Problem

The output product from the contractor’s tool is a simple flat file with event, venue information, and all the tickets currently listed for sale with seat information, quantity, and prices.

This sounds pretty straight forward and I should be able to use StubHub’s API to gather this information. So I start by doing some homework.

You can find my full code on GitHub

Getting Started with StubHub’s API

StubHub has provided a robust set of API to access its site with pretty thorough documentation, including a Getting Started Guide to sign up for a StubHub account and request for API keys.

So I spent a few hours reading up on the developer interface to create a proof of concept for my colleague.

Step 1 - Obtaining StubHub User Access Token

First step to use the API is to request an Authorization Token that my Python app will use. StubHub has some instructions using a REST client, but it’s a little different with Python.

Requesting an Anthorization Token will require us to encrypt our Consumer Key and Consumer Secret. First I enter my StubHub user account and my API info:

import requests
import base64

## Enter user's API key, secret, and Stubhub login
app_token = input('Enter app token: ')
consumer_key = input('Enter consumer key: ')
consumer_secret = input('Enter consumer secret: ')
stubhub_username = input('Enter Stubhub username (email): ')
stubhub_password = input('Enter Stubhub password: ')

Then I concat the key and secret with the colon as per the instructions, and create the basic authorization token by encrpyting it in base64.

combo = consumer_key + ':' + consumer_secret
basic_authorization_token = base64.b64encode(combo.encode('utf-8'))

Now I create a post request with the appropriate headers and use requests to talk to StubHub. I store my response in token_response. And I retrieve 2 fields in particular: The access_token is what I’m after, and my user_GUID will be handy for some API calls.

url = 'https://api.stubhub.com/login'
headers = {
        'Content-Type':'application/x-www-form-urlencoded',
        'Authorization':'Basic '+basic_authorization_token,}
body = {
        'grant_type':'password',
        'username':stubhub_username,
        'password':stubhub_password,
        'scope':'PRODUCTION'}

r = requests.post(url, headers=headers, data=body)
print r
print r.text

token_respoonse = r.json()
access_token = token_respoonse['access_token']
user_GUID = r.headers['X-StubHub-User-GUID']

Step 2 - Searching Inventory of an Event

To find the ticket inventory of an event, we’ll use the InventorySearch API. Of course we’ll need a specific Event ID.

There are 2 ways to get this. Let’s say my app is to track the prices of Hamilton tickets. On the event’s StubHub page, there’s a unique 7 digit in the URL that’s the Event ID. Just copy that number:

eventid

The second way involves using the EventSearchAPI - v2 much like searching for an event on the website. I leave this to the reader to explore.

With the Event ID, now it’s just a matter of making a get request with the proper headers:

inventory_url = 'https://api.stubhub.com/search/inventory/v2'
eventid = '9670859'
data = {'eventid':eventid, 'rows':200}
headers['Authorization'] = 'Bearer ' + access_token
headers['Accept'] = 'application/json'
headers['Accept-Encoding'] = 'application/json'

inventory = requests.get(inventory_url, headers=headers, params=data)

One thing to note, that this API defaults to return 100 rows. If we wany more, I can add rows as a parameter. See the API documentation for more details. inventory is my JSON results. I’ll convert it to a dictionary with

inv = inventory.json()

In particular, I want to see the ticket listing, so I’ll call the listing key:

import pprint
pprint.pprint(inv['listing'])

[{u'currentPrice': {u'amount': 663.3, u'currency': u'USD'},
  u'deliveryMethodList': [2],
  u'deliveryTypeList': [2],
  u'dirtyTicketInd': False,
  u'listingId': 1207961705,
  u'listingPrice': {u'amount': 560.0, u'currency': u'USD'},
  u'quantity': 2,
  u'row': u'G',
  u'seatNumbers': u'9,11',
  u'sectionId': 659009,
  u'sectionName': u'Mezzanine Rear Sides',
  u'sellerOwnInd': 0,
  u'sellerSectionName': u'Mezzanine Rear Sides',
  u'splitOption': u'0',
  u'splitVector': [2],
  u'ticketSplit': u'2',
  u'zoneId': 105098,
  u'zoneName': u'Mezzanine Rear'},
  ...

Now I want to convert the dictionary to a Pandas DataFrame. And since currentPrice column is a nested dictionary with ticket price and currency, I extract just the USD amount as a new column in my dataframe:

import pandas as pd
listing_df = pd.DataFrame(inv['listing'])
listing_df['amount'] = listing_df.apply(
    lambda x: x['currentPrice']['amount'], axis=1)
listing_df.to_csv('tickets_listing.csv', index=False)

Here’s what the CSV file looks like now:

csv

Step 3 - Adding Event and Venue Info

I have the ticket information, but what if I want to know some more details about the venue?

In that case, I use StubHub’s EventSearchAPI to get the details.

I already have the eventID, so I just add it to the new URL, and take a peek at the response in dict form:

info_url = 'https://api.stubhub.com/catalog/events/v2/' + eventid
info = requests.get(info_url, headers=headers)
pprint.pprint(info.json())

{u'ancestors': {u'categories': [{u'id': 174}, {u'id': 700188}],
                u'groupings': [{u'id': 1500226}],
                u'performers': [{u'id': 1500227}]},
 u'bobId': 1,
 u'categories': {u'primaryCategory': {u'id': 700188,
                                      u'name': u'Musicals Tickets'}},
 u'currencyCode': u'USD',
 u'description': u'Hamilton New York Tickets',
 u'eventDateLocal': u'2016-10-22T20:00:00-04:00',
 u'eventDateUTC': u'2016-10-23T00:00:00+0000',
 u'eventMeta': {u'keywords': u'Hamilton Richard Rodgers Theatre, Hamilton New York, Hamilton New York 10/22 0800 PM, Hamilton New York 10/22, Hamilton Richard Rodgers Theatre 10/22, buy, sell, tickets, ticket',
                u'locale': u'en_US',
                u'primaryAct': u'Hamilton New York',
                u'primaryName': u'Hamilton New York',
                u'secondaryName': u'Hamilton',
                u'seoDescription': u'Hamilton 08:00 PM',
                u'seoTitle': u'Hamilton Richard Rodgers Theatre New York Tickets - 2016-10-22'},
 ... }

Lots of relevant info here, and it’s just a matter of extracting what I need from the dict. Then I can add it to my DataFrame before exporting the final result to CSV.

Conclusion

After doing some more data cleaning to match the report’s format, I sent it over to my colleague. In a few hours, I was able to show that I can use StubHub’s API to gather the ticket data required. But there were some limitations:

My friend needed this data for about 1,200 events everyday. With StubHub’s free tier, I am limited to 10 requests per minute. If each event took 2 API calls, then this report would take 4 hours to generate everyday on the free tier. I’m sure there’s a way to pay StubHub for a higher tier access.

For now, I’ve proved that the report is possible with some API calls. And he’ll exploring some next steps with his team.

Update - Subscribing to Stubhub’s Inventory Search API - v2

Around December or January, Stubhub had deprecated version 1 of their Inventory Search API in favor of version 2, but a lot of people including me, had some difficulty subscribing to the API. Looks like by subscribing to “All API”, it doesn’t include v2 yet. And clicking on InventorySearchAPI-v2 didn’t go anywhere.

csv

BIT I found a workaround by selecting InventorySearchAPI and then manually changing the URL from “v1” to “v2”.

csv

And then subscribe it to your application! You can also just use this link: https://developer.stubhub.com/store/apis/info?name=InventorySearchAPI&version=v2&provider=runiu&category=Search&api=InventorySearchAPI