Extracting health data from Fitbit

Last time, I laid out the idea of rigging a pipeline to enable data driven self reflection and improvement. The first step is getting data. We’ll start with Fitbit. This post steps through using an API to grab the numbers, doing a little cleaning, and storing it as a csv. We won’t get too nitty gritty. If you’re looking for more detail, I’d highly recommend checking out this this awesome blog post!

Step 1: Register your personal app with Fitbit

Start by registering your personal app with Fitbit at their fancy developer portal. The aforementioned post details the right inputs for the form. Once you’re done, navigate to your newly created app and grab the contents of the “OAuth 2.0 Client ID” and “Client Secret”. In the spirit of not posting creds for all the internet to enjoy, I stored mine locally in a yaml file.

import yaml
with open("../creds.yml", 'r') as stream:
    creds = yaml.safe_load(stream)
    
CLIENT_ID = creds['fitbit']['client_id']
CLIENT_SECRET = creds['fitbit']['client_secret']

Step 2: Authorize and connect to Fitbit

We’ll make heavy use of the python-fitbit package from Jake from Orcas. We also install specific versions of the requests-oauthlib and oauthlib libraries to avoid errors when you try to authorize.

!git clone https://github.com/orcasgit/python-fitbit
!pip3 install python-fitbit/requirements/base.txt
!pip3 install --upgrade requests-oauthlib==1.1.0 oauthlib==2.1.0

Navigate to the python-fitbit directory.

cd python-fitbit

And grab the tools contained in the gather_keys_oauth2 module to get authorized with the API.

from gather_keys_oauth2 import *

server = OAuth2Server(CLIENT_ID, CLIENT_SECRET)
server.browser_authorize()
keys = server.fitbit.client.session.token
ACCESS_TOKEN, REFRESH_TOKEN = str(keys['access_token']), str(keys['refresh_token'])

If you’re like me, you hit a 500 error here. The Fitbit help forum suggests and my experience corroborates that you can get around this by downgrading to requests-oauthlib==1.1.0 and oauthlib==2.1.0.

Finally, go ahead and create a client to ferry your fitbit data requests.

import fitbit

client = fitbit.Fitbit(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    oauth2=True,
    access_token=ACCESS_TOKEN,
    refresh_token=REFRESH_TOKEN
)

If you made it this far, you’ve successfully created a bridge to your Fitbit data. Now it’s time for the good stuff - getting and manipulating your numbers!

Step 3: Get your data

Start by establishing the range of time we want to pull. Fitbit gets testy if you request more than 100 days of data. We’ll abide by this until we have a need for more historical data.

from datetime import datetime, timedelta

today = datetime.now()
base_date = today - timedelta(days=100) # Fitbit doesn't like if we request more than 100 days. Meh. Fine.

Next, let’s use Fitbit’s Web API to request our data! We’ll simplify this by using the time-series method built into our python-fitbit package. Heads up, Fitbit limits the number of daily requests you can make. Be careful not to over-ask or you’ll have to wait a day to try again.

sleeps = client.time_series(resource='sleep', base_date=base_date, end_date=today)

# Remove the minute by minute data of last night's sleep for nicer printing
example = sleeps['sleep'][0]
example.pop('minuteData', None) 
example

{'awakeCount': 3,
 'awakeDuration': 4,
 'awakeningsCount': 13,
 'dateOfSleep': '2019-10-19',
 'duration': 28440000,
 'efficiency': 94,
 'endTime': '2019-10-19T08:02:30.000',
 'logId': 24320700816,
 'minutesAfterWakeup': 0,
 'minutesAsleep': 447,
 'minutesAwake': 27,
 'minutesToFallAsleep': 0,
 'restlessCount': 10,
 'restlessDuration': 23,
 'startTime': '2019-10-19T00:08:30.000',
 'timeInBed': 474}

Pull down the rest of your data. I grabbed the stuff that interests me, but there’s a lot more! Check out the Fitbit API documentation to learn about what’s possible.

activities = client.time_series(resource='activities/distance', base_date=base_date, end_date=today)
weights = client.time_series(resource='body/weight', base_date=base_date, end_date=today)
fats = client.time_series(resource='body/fat', base_date=base_date, end_date=today)
bmis = client.time_series(resource='body/bmi', base_date=base_date, end_date=today)

Most of the payloads returned by requests are pretty simple. I wrote a function to handle those cases.

def unload_simple_json(data, d=None):
    if not d:
        d = {'dateTime': [], 'value': []}
    for entry in data:
        for k in d.keys():
            d[k].append(entry[k])
    return d

d = {'efficiency': [],'minutesAsleep': [], 'startTime': [], 'endTime': [], 'awakeningsCount': [], 'dateOfSleep': []}
sleeps = unload_simple_json(sleeps['sleep'], d=d)
activities = unload_simple_json(activities['activities-distance'])
weights = unload_simple_json(weights['body-weight'])
fats = unload_simple_json(fats['body-fat'])
bmis = unload_simple_json(bmis['body-bmi'])

Some endpoints have different structures so we’ll have to write some custom logic.

hearts = client.time_series(resource='activities/heart', base_date=base_date, end_date=today)
hearts['activities-heart'][0]

d = {
    'Fat Burn': [],
    'Cardio': [],
    'Peak': [],
    'dateTime': [],
    'restingHeartRate': []
}

for entry in list(hearts['activities-heart']):
    d['dateTime'].append(entry.get('dateTime'))
    d['restingHeartRate'].append(entry['value'].get('restingHeartRate'))
    
    # Flattening embedded list
    for k in d.keys():
        for value in entry['value']['heartRateZones']:
            if value['name'] == k:
                d[k].append(value.get('minutes'))
hearts = d

Step 4: Organize and store your data

Next we’ll organize our data it into pandas dataframes. Let’s start by setting up a foundational table with all the dates we’ll need.

import pandas as pd

df = pd.DataFrame(
    pd.date_range(start=base_date, end=today)
    ).rename(columns={0: 'dateTime'})
df['dateTime'] = pd.to_datetime(df['dateTime']).dt.date
df.head()

	dateTime
0	2019-07-12
1	2019-07-13
2	2019-07-14
3	2019-07-15
4	2019-07-16

I renamed these keys to avoid merging them as duplicate column names later.

weights['weight'] = weights.pop('value')
fats['fat_pct'] = fats.pop('value')
bmis['bmi'] = bmis.pop('value')
sleeps['dateTime'] = sleeps.pop('dateOfSleep')

Now merge up your data!

for d in [activities, weights, fats, bmis, hearts]:
    d = pd.DataFrame(d)
    d['dateTime'] = pd.to_datetime(d['dateTime']).dt.date
    df = df.merge(pd.DataFrame(d), how='left', on='dateTime')

The sleep data has multiple values for each date. A simple aggregation should be enough to de-duplicate with enough fidelity for the rough analyses we’ll be running to start.

sleeps = pd.DataFrame(sleeps)
sleeps[sleeps.duplicated(subset='dateTime', keep=False)]

	efficiency	minutesAsleep	startTime	endTime	awakeningsCount	dateTime
2	96	71	2019-10-17T22:02:00.000	2019-10-17T23:16:30.000	1	2019-10-17
3	96	358	2019-10-17T00:21:00.000	2019-10-17T06:34:00.000	10	2019-10-17
44	87	109	2019-09-06T10:37:30.000	2019-09-06T12:43:30.000	9	2019-09-06
45	88	278	2019-09-06T01:41:00.000	2019-09-06T06:57:00.000	13	2019-09-06
85	81	60	2019-07-14T08:21:00.000	2019-07-14T09:35:30.000	3	2019-07-14
86	91	329	2019-07-14T01:02:30.000	2019-07-14T07:12:00.000	9	2019-07-14
87	93	56	2019-07-13T19:25:30.000	2019-07-13T20:36:30.000	6	2019-07-13
88	90	572	2019-07-12T22:28:00.000	2019-07-13T09:04:30.000	27	2019-07-13

Looks like we got ‘em.

# sleeps['dateTime'] = sleeps.pop('dateOfSleep')
sleeps = sleeps \
    .groupby('dateTime', as_index=False) \
    .agg({'startTime': min, 'endTime': max, 'minutesAsleep': sum, 'efficiency': 'mean'})
sleeps[sleeps.duplicated(subset='dateTime', keep=False)]

	dateTime	startTime	endTime	minutesAsleep	efficiency

sleeps['dateTime'] = pd.to_datetime(sleeps['dateTime']).dt.date
df = df.merge(pd.DataFrame(sleeps), how='left', on='dateTime')

Clean up column names. I must confess that I’m not crazy about camelcase.

df = df.rename(columns={
    'dateTime': 'date',
    'efficiency': 'sleep_efficiency',
    'minutesAsleep': 'sleep_minutes', 
    'startTime': 'sleep_start_at',
    'endTime': 'sleep_end_at',
    'awakeningsCount': 'wake_count',
    'Fat Burn': 'fat_burn_minutes',
    'Cardio': 'cardio_minutes',
    'Peak': 'peak_minutes',
    'restingHeartRate': 'resting_heart_rate'
})

And finally export as a CSV!

df.to_csv('../../data/{}_to_{}_fitbit_data.csv'.format(base_date.date(), today.date()), index=False)