Scraping 3rd-Party Ticket Prices Using Stubhub's API

Scraping 3rd-Party Ticket Prices Using Stubhub's API

Jun 21, 2016    

I have a colleague working in entertainment that needed to gather ticket prices on 3rd party sales to get a reading on the popularity and pricing of events. Currently, the colleague’s company pays a large sum to a contractor to scrape this data daily. We wanted to see if this was possible and easy to do using the StubHub API.

Update 3/5/2017: I'm glad my blog post has been helpful for a lot of users who are interested in getting started with getting data from StubHub's API using Python. But since the blog post last June, there's been a lot of changes and I wanted to update my post with the latest details.

First, Stubhub had deprecated the Inventory Search API with a newer version, so the new code is updated to reflect that.

If you're having trouble subscribing to InventorySearchAPI-v2, see the last section on this post to manually subscribe to it.

Initial Problem

The output product from the contractor’s tool is a simple flat file with event, venue information, and all the tickets currently listed for sale with seat information, quantity, and prices.

This sounds pretty straight forward and I should be able to use StubHub’s API to gather this information. So I start by doing some homework.

You can find my full code on GitHub

Getting Started with StubHub’s API

StubHub has provided a robust set of API to access its site with pretty thorough documentation, including a Getting Started Guide to sign up for a StubHub account and request for API keys.

So I spent a few hours reading up on the developer interface to create a proof of concept for my colleague.

Step 1 - Obtaining StubHub User Access Token

First step to use the API is to request an Authorization Token that my Python app will use. StubHub has some instructions using a REST client, but it’s a little different with Python.

Requesting an Anthorization Token will require us to encrypt our Consumer Key and Consumer Secret. First I enter my StubHub user account and my API info:

import requests
import base64

## Enter user's API key, secret, and Stubhub login
app_token = input('Enter app token: ')
consumer_key = input('Enter consumer key: ')
consumer_secret = input('Enter consumer secret: ')
stubhub_username = input('Enter Stubhub username (email): ')
stubhub_password = input('Enter Stubhub password: ')

Then I concat the key and secret with the colon as per the instructions, and create the basic authorization token by encrpyting it in base64.

combo = consumer_key + ':' + consumer_secret
basic_authorization_token = base64.b64encode(combo.encode('utf-8'))

Now I create a post request with the appropriate headers and use requests to talk to StubHub. I store my response in token_response. And I retrieve 2 fields in particular: The access_token is what I’m after, and my user_GUID will be handy for some API calls.

url = 'https://api.stubhub.com/login'
headers = {
        'Content-Type':'application/x-www-form-urlencoded',
        'Authorization':'Basic '+basic_authorization_token,}
body = {
        'grant_type':'password',
        'username':stubhub_username,
        'password':stubhub_password,
        'scope':'PRODUCTION'}

r = requests.post(url, headers=headers, data=body)
print r
print r.text

token_respoonse = r.json()
access_token = token_respoonse['access_token']
user_GUID = r.headers['X-StubHub-User-GUID']

Step 2 - Searching Inventory of an Event

To find the ticket inventory of an event, we’ll use the InventorySearch API. Of course we’ll need a specific Event ID.

There are 2 ways to get this. Let’s say my app is to track the prices of Hamilton tickets. On the event’s StubHub page, there’s a unique 7 digit in the URL that’s the Event ID. Just copy that number:

eventid

The second way involves using the EventSearchAPI - v2 much like searching for an event on the website. I leave this to the reader to explore.

With the Event ID, now it’s just a matter of making a get request with the proper headers:

inventory_url = 'https://api.stubhub.com/search/inventory/v2'
eventid = '9670859'
data = {'eventid':eventid, 'rows':200}
headers['Authorization'] = 'Bearer ' + access_token
headers['Accept'] = 'application/json'
headers['Accept-Encoding'] = 'application/json'

inventory = requests.get(inventory_url, headers=headers, params=data)

One thing to note, that this API defaults to return 100 rows. If we wany more, I can add rows as a parameter. See the API documentation for more details. inventory is my JSON results. I’ll convert it to a dictionary with

inv = inventory.json()

In particular, I want to see the ticket listing, so I’ll call the listing key:

import pprint
pprint.pprint(inv['listing'])

[{u'currentPrice': {u'amount': 663.3, u'currency': u'USD'},
  u'deliveryMethodList': [2],
  u'deliveryTypeList': [2],
  u'dirtyTicketInd': False,
  u'listingId': 1207961705,
  u'listingPrice': {u'amount': 560.0, u'currency': u'USD'},
  u'quantity': 2,
  u'row': u'G',
  u'seatNumbers': u'9,11',
  u'sectionId': 659009,
  u'sectionName': u'Mezzanine Rear Sides',
  u'sellerOwnInd': 0,
  u'sellerSectionName': u'Mezzanine Rear Sides',
  u'splitOption': u'0',
  u'splitVector': [2],
  u'ticketSplit': u'2',
  u'zoneId': 105098,
  u'zoneName': u'Mezzanine Rear'},
  ...

Now I want to convert the dictionary to a Pandas DataFrame. And since currentPrice column is a nested dictionary with ticket price and currency, I extract just the USD amount as a new column in my dataframe:

import pandas as pd
listing_df = pd.DataFrame(inv['listing'])
listing_df['amount'] = listing_df.apply(
    lambda x: x['currentPrice']['amount'], axis=1)
listing_df.to_csv('tickets_listing.csv', index=False)

Here’s what the CSV file looks like now:

csv

Step 3 - Adding Event and Venue Info

I have the ticket information, but what if I want to know some more details about the venue?

In that case, I use StubHub’s EventSearchAPI to get the details.

I already have the eventID, so I just add it to the new URL, and take a peek at the response in dict form:

info_url = 'https://api.stubhub.com/catalog/events/v2/' + eventid
info = requests.get(info_url, headers=headers)
pprint.pprint(info.json())

{u'ancestors': {u'categories': [{u'id': 174}, {u'id': 700188}],
                u'groupings': [{u'id': 1500226}],
                u'performers': [{u'id': 1500227}]},
 u'bobId': 1,
 u'categories': {u'primaryCategory': {u'id': 700188,
                                      u'name': u'Musicals Tickets'}},
 u'currencyCode': u'USD',
 u'description': u'Hamilton New York Tickets',
 u'eventDateLocal': u'2016-10-22T20:00:00-04:00',
 u'eventDateUTC': u'2016-10-23T00:00:00+0000',
 u'eventMeta': {u'keywords': u'Hamilton Richard Rodgers Theatre, Hamilton New York, Hamilton New York 10/22 0800 PM, Hamilton New York 10/22, Hamilton Richard Rodgers Theatre 10/22, buy, sell, tickets, ticket',
                u'locale': u'en_US',
                u'primaryAct': u'Hamilton New York',
                u'primaryName': u'Hamilton New York',
                u'secondaryName': u'Hamilton',
                u'seoDescription': u'Hamilton 08:00 PM',
                u'seoTitle': u'Hamilton Richard Rodgers Theatre New York Tickets - 2016-10-22'},
 ... }

Lots of relevant info here, and it’s just a matter of extracting what I need from the dict. Then I can add it to my DataFrame before exporting the final result to CSV.

Conclusion

After doing some more data cleaning to match the report’s format, I sent it over to my colleague. In a few hours, I was able to show that I can use StubHub’s API to gather the ticket data required. But there were some limitations:

My friend needed this data for about 1,200 events everyday. With StubHub’s free tier, I am limited to 10 requests per minute. If each event took 2 API calls, then this report would take 4 hours to generate everyday on the free tier. I’m sure there’s a way to pay StubHub for a higher tier access.

For now, I’ve proved that the report is possible with some API calls. And he’ll exploring some next steps with his team.

Update - Subscribing to Stubhub’s Inventory Search API - v2

Around December or January, Stubhub had deprecated version 1 of their Inventory Search API in favor of version 2, but a lot of people including me, had some difficulty subscribing to the API. Looks like by subscribing to “All API”, it doesn’t include v2 yet. And clicking on InventorySearchAPI-v2 didn’t go anywhere.

csv

BIT I found a workaround by selecting InventorySearchAPI and then manually changing the URL from “v1” to “v2”.

csv

And then subscribe it to your application! You can also just use this link: https://developer.stubhub.com/store/apis/info?name=InventorySearchAPI&version=v2&provider=runiu&category=Search&api=InventorySearchAPI

-->