Datasets Overview
We have put together a collection of datasets for your use. You will find the datasets here.
Structure
The data is represented in a CSV file in a single structured format:
participant-id | session-id | timestamp (s) | source | DATA_TYPE_1 | DATA_TYPE_2 | AND ETC |
---|---|---|---|---|---|---|
-- | -- | -- | -- | -- | -- | -- |
The CSV parsers on our starter apps are built to handle files in this format. So as long as you import a csv file in this format, the data will be parsed and stored correctly in the mobile applications.
Details
participant-id: id of the participant
session-id: Data collected from participants can be broken down into various sessions. This is an optional value as this is not always the case.
timestamp: unix timestamp from epoch in milliseconds
source: this is the sensor name
Dataset Details
OpenICE: Generate Simulated Data
OpenICE will let you generate simulated data by creating an ICE Device Adapter
under the Data Recorder
app. Check out their webpage for more info: https://www.openice.info/
MIMIC II
https://physionet.org/mimic2/demo/
This data is from ICU patients before they have passed away. The data is de-identified and all doctors notes have been removed. You will find information collected in the ICU, such as lab values, medications, diagnoses, and procedures. A more detailed introduction to the MIMIC II dataset is provided here. Check out the image below for more information on how data is structured:
Dataset Size: 4000 patients
How to download it: Scroll all the way to the bottom (in the link above) and download the `mimic2dead.sql.gz` file. You can import this data using MySQL Workbench.
Fitbit Dataset
Participant IDs: 1-2
Devices Used: Fitbit
Data Collected: weight, nutrients, calories, steps, activity, and sleep
Details: This data was collected on a Fitbit from public website located here.
Fitabase (mTurk) Dataset
Participant IDs: Various IDs
Devices Used: Fitbit
Data Collected: activity, calories, intensities, steps, heart rate, sleep, and weight
Details: This data was collected on a Fitbit from public website located here: https://zenodo.org/record/53894.
Empatica Dataset
Participant IDs: 5-10
Devices Used: Empatica
Data Collected:
Details: This data was collected on an Empatica device. Multiple participants were run through studies. See participant ids 5 to 10.
Stress Datasets (EDA, ECG, EEG)
Participant IDs: 5-10, 11-13, 19
Devices Used: ECG - Hexoskin, EDA - Empatica, EEG - Mindwave and/or Muse
Data Collected:
- EDA (for participants 5-10)
- heart rate, rr interval, skin conductance, and temperature
- EDA (for participants 11-13 and 19)
- temperature, electrodermal activity, photoplethysmograph data, accelerometer sensor data, time between heart beats, and heart rate
- ECG (for participants 11-13 and 19)
- acceleration, breathing rate, activity, ecg, cadence, epiration, heart rate, inspiration, ventilation, nn interval, rr interval, steps, and tidal volume
- EEG (for participants 11-13 and 19)
- mindwave: attention, signal quality, meditation level, band power
- muse: raw eeg values
Details:
The datasets on github only includes processed EDA data. Other (ECG and EEG) were too large to put up on github. Therefore you will have to download them separately. Check out the raw, unprocessed stress datasets here: https://ibm.box.com/s/fobxq6z5ah49l8f6xc2vfvgf6t2dou6s. Note, these datasets are unprocessed and are in their own format. Please see the information below on how to understand the raw datasets.
Processed EDA Data:
This data was collected on an Empatica device. Multiple participants were run through studies. See participant ids 11-13 and 19.
- ACC.csv: Data from a 3-axis accelerometer sensor
- The accelerometer was configured to measure acceleration in the range [-2g, 2g]. Therefore the unit in this file is 1/64g.
- Data from x, y, and z axis are respectively in 5th, 6th and 7th column.
- BVP.csv: Data from a photoplethysmograph
- EDA.csv: Data from an electrodermal activity sensor expressed as microsiemens (μS)
- HR.csv: Average heart rate extracted from the BVP signal
- The 5th column is the sample rate expressed in Hz.
- IBI.csv: Time between individuals' heart beats extracted from the BVP signal
- The 5th column is the duration in seconds (s) of the detected inter-beat interval (i.e., the distance in seconds from the previous beat).
- TEMP.csv: Data from temperature sensor expressed degrees on the Celsius (°C) scale
Raw, Unprocessed ECG Data:
This data was collected from Hexoskin. The .wav files are the compressed version of the data collected from Hexoskin. Check out this page for more information on how to parse them.
The CSV files were generated from Hexoskin APIs. Check out this page for more information: https://api.hexoskin.com/docs/index.html#introduction
Note: the folder names for the raw stress datasets are in this structure: "subjectID__location__date". Location can only be 1 (in field) or 0 (in the lab). The date is in the format of yymmdd. Also take into account the subjectID's from the raw datasets (on Box) do not map directly to the processed datasets (on github). Take a look at the participant-id-mapping.csv file to understand the mapping.
Raw, Unprocessed EEG Data:
This data was collected from Mindwave or Muse. You can determine the device it was collected from by checking the file type. Mindwave data is in a text file while Muse data is in a csv file.
To understand/parse Mindwave data, check out the links below:
- http://developer.neurosky.com/docs/doku.php?id=mindset_data_types
- http://developer.neurosky.com/docs/doku.php?id=thinkgear_communications_protocol
To understand/parse Muse data, check out the link below:
Note: the folder names for the raw stress datasets are in this structure: "subjectID__location__date". Location can only be 1 (in field) or 0 (in the lab). The date is in the format of yymmdd. Also take into account the subjectID's from the raw datasets (on Box) do not map directly to the processed datasets (on github). Take a look at the participant-id-mapping.csv file to understand the mapping.
iHealth BP
Participant IDs: 3
Devices Used: iHealth Wrist Blood Pressure Monitor
Data Collected: heart rate, rr interval, skin conductance, and temperature
Details: None
iHealth Pulse Ox
Participant IDs: 3
Devices Used: iHealth Finger Pulse Oximeter.
Data Collected: SPO2 and Heart Rate
Details: None
Jawbone Dataset
Participant IDs: 4
Devices Used: Jawbone Up Move
Data Collected: Steps, Calories, Weight, etc
Details: None
Participant ID Metadata
These participant IDs are referring to the IDs from our datasets here.
User ID | Sessions | Notes |
---|---|---|
1 | None | Gender: Male. |
2 | None | Gender: Male. |
3 | None | Gender: Female. |
4 | None | Gender: Female |
5 | 2, 3 | None |
6 | 4, 5 | None |
7 | 6, 7 | None |
8 | 8, 9 | None |
9 | 10 | None |
10 | 11 | None |
Various IDs | None | This is pertaining to the csv files inside fitbit-mturk . See here for more details: https://zenodo.org/record/53894 |
Additional Data
- Various Datasets
- Parkinsons Dataset