Extracting health data from Fitbit
Last time, I laid out the idea of rigging a pipeline to enable data driven self reflection and improvement. The first step is getting data. We’ll start with Fitbit. This post steps through using an API to grab the numbers, doing a little cleaning, and storing it as a csv. We won’t get too nitty gritty. If you’re looking for more detail, I’d highly recommend checking out this this awesome blog post!
Step 1: Register your personal app with Fitbit
Start by registering your personal app with Fitbit at their fancy developer portal. The aforementioned post details the right inputs for the form. Once you’re done, navigate to your newly created app and grab the contents of the “OAuth 2.0 Client ID” and “Client Secret”. In the spirit of not posting creds for all the internet to enjoy, I stored mine locally in a yaml file.
import yaml
with open("../creds.yml", 'r') as stream:
creds = yaml.safe_load(stream)
CLIENT_ID = creds['fitbit']['client_id']
CLIENT_SECRET = creds['fitbit']['client_secret']
Step 2: Authorize and connect to Fitbit
We’ll make heavy use of the python-fitbit
package from Jake from Orcas. We also install specific versions of the requests-oauthlib
and oauthlib
libraries to avoid errors when you try to authorize.
!git clone https://github.com/orcasgit/python-fitbit
!pip3 install python-fitbit/requirements/base.txt
!pip3 install --upgrade requests-oauthlib==1.1.0 oauthlib==2.1.0
Navigate to the python-fitbit
directory.
cd python-fitbit
And grab the tools contained in the gather_keys_oauth2
module to get authorized with the API.
from gather_keys_oauth2 import *
server = OAuth2Server(CLIENT_ID, CLIENT_SECRET)
server.browser_authorize()
keys = server.fitbit.client.session.token
ACCESS_TOKEN, REFRESH_TOKEN = str(keys['access_token']), str(keys['refresh_token'])
If you’re like me, you hit a 500 error here. The Fitbit help forum suggests and my experience corroborates that you can get around this by downgrading to requests-oauthlib==1.1.0
and oauthlib==2.1.0
.
Finally, go ahead and create a client to ferry your fitbit data requests.
import fitbit
client = fitbit.Fitbit(
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET,
oauth2=True,
access_token=ACCESS_TOKEN,
refresh_token=REFRESH_TOKEN
)
If you made it this far, you’ve successfully created a bridge to your Fitbit data. Now it’s time for the good stuff - getting and manipulating your numbers!
Step 3: Get your data
Start by establishing the range of time we want to pull. Fitbit gets testy if you request more than 100 days of data. We’ll abide by this until we have a need for more historical data.
from datetime import datetime, timedelta
today = datetime.now()
base_date = today - timedelta(days=100) # Fitbit doesn't like if we request more than 100 days. Meh. Fine.
Next, let’s use Fitbit’s Web API to request our data! We’ll simplify this by using the time-series
method built into our python-fitbit
package. Heads up, Fitbit limits the number of daily requests you can make. Be careful not to over-ask or you’ll have to wait a day to try again.
sleeps = client.time_series(resource='sleep', base_date=base_date, end_date=today)
# Remove the minute by minute data of last night's sleep for nicer printing
example = sleeps['sleep'][0]
example.pop('minuteData', None)
example
{'awakeCount': 3,
'awakeDuration': 4,
'awakeningsCount': 13,
'dateOfSleep': '2019-10-19',
'duration': 28440000,
'efficiency': 94,
'endTime': '2019-10-19T08:02:30.000',
'logId': 24320700816,
'minutesAfterWakeup': 0,
'minutesAsleep': 447,
'minutesAwake': 27,
'minutesToFallAsleep': 0,
'restlessCount': 10,
'restlessDuration': 23,
'startTime': '2019-10-19T00:08:30.000',
'timeInBed': 474}
Pull down the rest of your data. I grabbed the stuff that interests me, but there’s a lot more! Check out the Fitbit API documentation to learn about what’s possible.
activities = client.time_series(resource='activities/distance', base_date=base_date, end_date=today)
weights = client.time_series(resource='body/weight', base_date=base_date, end_date=today)
fats = client.time_series(resource='body/fat', base_date=base_date, end_date=today)
bmis = client.time_series(resource='body/bmi', base_date=base_date, end_date=today)
Most of the payloads returned by requests are pretty simple. I wrote a function to handle those cases.
def unload_simple_json(data, d=None):
if not d:
d = {'dateTime': [], 'value': []}
for entry in data:
for k in d.keys():
d[k].append(entry[k])
return d
d = {'efficiency': [],'minutesAsleep': [], 'startTime': [], 'endTime': [], 'awakeningsCount': [], 'dateOfSleep': []}
sleeps = unload_simple_json(sleeps['sleep'], d=d)
activities = unload_simple_json(activities['activities-distance'])
weights = unload_simple_json(weights['body-weight'])
fats = unload_simple_json(fats['body-fat'])
bmis = unload_simple_json(bmis['body-bmi'])
Some endpoints have different structures so we’ll have to write some custom logic.
hearts = client.time_series(resource='activities/heart', base_date=base_date, end_date=today)
hearts['activities-heart'][0]
d = {
'Fat Burn': [],
'Cardio': [],
'Peak': [],
'dateTime': [],
'restingHeartRate': []
}
for entry in list(hearts['activities-heart']):
d['dateTime'].append(entry.get('dateTime'))
d['restingHeartRate'].append(entry['value'].get('restingHeartRate'))
# Flattening embedded list
for k in d.keys():
for value in entry['value']['heartRateZones']:
if value['name'] == k:
d[k].append(value.get('minutes'))
hearts = d
Step 4: Organize and store your data
Next we’ll organize our data it into pandas dataframes. Let’s start by setting up a foundational table with all the dates we’ll need.
import pandas as pd
df = pd.DataFrame(
pd.date_range(start=base_date, end=today)
).rename(columns={0: 'dateTime'})
df['dateTime'] = pd.to_datetime(df['dateTime']).dt.date
df.head()
dateTime | |
---|---|
0 | 2019-07-12 |
1 | 2019-07-13 |
2 | 2019-07-14 |
3 | 2019-07-15 |
4 | 2019-07-16 |
I renamed these keys to avoid merging them as duplicate column names later.
weights['weight'] = weights.pop('value')
fats['fat_pct'] = fats.pop('value')
bmis['bmi'] = bmis.pop('value')
sleeps['dateTime'] = sleeps.pop('dateOfSleep')
Now merge up your data!
for d in [activities, weights, fats, bmis, hearts]:
d = pd.DataFrame(d)
d['dateTime'] = pd.to_datetime(d['dateTime']).dt.date
df = df.merge(pd.DataFrame(d), how='left', on='dateTime')
The sleep data has multiple values for each date. A simple aggregation should be enough to de-duplicate with enough fidelity for the rough analyses we’ll be running to start.
sleeps = pd.DataFrame(sleeps)
sleeps[sleeps.duplicated(subset='dateTime', keep=False)]
efficiency | minutesAsleep | startTime | endTime | awakeningsCount | dateTime | |
---|---|---|---|---|---|---|
2 | 96 | 71 | 2019-10-17T22:02:00.000 | 2019-10-17T23:16:30.000 | 1 | 2019-10-17 |
3 | 96 | 358 | 2019-10-17T00:21:00.000 | 2019-10-17T06:34:00.000 | 10 | 2019-10-17 |
44 | 87 | 109 | 2019-09-06T10:37:30.000 | 2019-09-06T12:43:30.000 | 9 | 2019-09-06 |
45 | 88 | 278 | 2019-09-06T01:41:00.000 | 2019-09-06T06:57:00.000 | 13 | 2019-09-06 |
85 | 81 | 60 | 2019-07-14T08:21:00.000 | 2019-07-14T09:35:30.000 | 3 | 2019-07-14 |
86 | 91 | 329 | 2019-07-14T01:02:30.000 | 2019-07-14T07:12:00.000 | 9 | 2019-07-14 |
87 | 93 | 56 | 2019-07-13T19:25:30.000 | 2019-07-13T20:36:30.000 | 6 | 2019-07-13 |
88 | 90 | 572 | 2019-07-12T22:28:00.000 | 2019-07-13T09:04:30.000 | 27 | 2019-07-13 |
Looks like we got ‘em.
# sleeps['dateTime'] = sleeps.pop('dateOfSleep')
sleeps = sleeps \
.groupby('dateTime', as_index=False) \
.agg({'startTime': min, 'endTime': max, 'minutesAsleep': sum, 'efficiency': 'mean'})
sleeps[sleeps.duplicated(subset='dateTime', keep=False)]
dateTime | startTime | endTime | minutesAsleep | efficiency |
---|
sleeps['dateTime'] = pd.to_datetime(sleeps['dateTime']).dt.date
df = df.merge(pd.DataFrame(sleeps), how='left', on='dateTime')
Clean up column names. I must confess that I’m not crazy about camelcase.
df = df.rename(columns={
'dateTime': 'date',
'efficiency': 'sleep_efficiency',
'minutesAsleep': 'sleep_minutes',
'startTime': 'sleep_start_at',
'endTime': 'sleep_end_at',
'awakeningsCount': 'wake_count',
'Fat Burn': 'fat_burn_minutes',
'Cardio': 'cardio_minutes',
'Peak': 'peak_minutes',
'restingHeartRate': 'resting_heart_rate'
})
And finally export as a CSV!
df.to_csv('../../data/{}_to_{}_fitbit_data.csv'.format(base_date.date(), today.date()), index=False)