subject

Classes for managing data and protocol access and storage.

Currently named subject, but will likely be refactored to include other data models should the need arise.

Classes

Subject(name, dir, file, new, biography)

Class for managing one subject’s data and protocol.

class Subject(name: str = None, dir: str = None, file: str = None, new: bool = False, biography: dict = None)[source]

Bases: object

Class for managing one subject’s data and protocol.

Creates a tables hdf5 file in prefs.DATADIR with the general structure:

/ root
|--- current (tables.filenode) storing the current task as serialized JSON
|--- data (group)
|    |--- task_name  (group)
|         |--- S##_step_name
|         |    |--- trial_data
|         |    |--- continuous_data
|         |--- ...
|--- history (group)
|    |--- hashes - history of git commit hashes
|    |--- history - history of changes: protocols assigned, params changed, etc.
|    |--- weights - history of pre and post-task weights
|    |--- past_protocols (group) - stash past protocol params on reassign
|         |--- date_protocol_name - tables.filenode of a previous protocol's params.
|         |--- ...
|--- info - group with biographical information as attributes

Classes

Hash_Table()

Class to describe table for hash history

History_Table()

Class to describe parameter and protocol change history

Weight_Table()

Class to describe table for weight history

Methods

apply_along([along, step])

assign_protocol(protocol[, step_n])

Assign a protocol to the subject.

close_hdf(h5f)

Flushes & closes the open hdf file.

data_thread(queue)

Thread that keeps hdf file open and receives data while task is running.

ensure_structure()

Ensure that our h5f has the appropriate baseline structure as defined in self.STRUCTURE

flush_current()

Flushes the ‘current’ attribute in the subject object to the current filenode

get_step_history([use_history])

Gets a dataframe of step numbers, timestamps, and step names as a coarse view of training status.

get_timestamp([simple])

Makes a timestamp.

get_trial_data(step, list, str] = -1, what)

Get trial data from the current task.

get_weight([which, include_baseline])

Gets start and stop weights.

graduate()

Increase the current step by one, unless it is the last step.

new_subject_file(biography)

Create a new subject file and make the general filestructure.

open_hdf([mode])

Opens the hdf5 file.

prepare_run()

Prepares the Subject object to receive data while running the task.

save_data(data)

Alternate and equivalent method of putting data in the queue as Subject.data_queue.put(data)

set_weight(date, col_name, new_value)

Updates an existing weight in the weight table.

stash_current()

Save the current protocol in the history group and delete the node

stop_run()

puts ‘END’ in the data_queue, which causes data_thread() to end.

to_csv(path[, task, step])

Export trial data to .csv

update_biography(params)

Change or make a new biographical attribute, stored as attributes of the info group.

update_history(type, name, value[, step])

Update the history table when changes are made to the subject’s protocol.

update_weights([start, stop])

Store either a starting or stopping mass.

Variables
  • lock (threading.Lock) – manages access to the hdf5 file

  • name (str) – Subject ID

  • file (str) – Path to hdf5 file - usually {prefs.DATADIR}/{self.name}.h5

  • current (dict) – current task parameters. loaded from the ‘current’ filenode of the h5 file

  • step (int) – current step

  • protocol_name (str) – name of currently assigned protocol

  • current_trial (int) – number of current trial

  • running (bool) – Flag that signals whether the subject is currently running a task or not.

  • data_queue (queue.Queue) – Queue to dump data while running task

  • thread (threading.Thread) – thread used to keep file open while running task

  • did_graduate (threading.Event) – Event used to signal if the subject has graduated the current step

  • STRUCTURE (list) –

    list of tuples with order:

    • full path, eg. ‘/history/weights’

    • relative path, eg. ‘/history’

    • name, eg. ‘weights’

    • type, eg. Subject.Weight_Table or ‘group’

  • locations (node) – tables.IsDescriptor for tables.

Parameters
  • name (str) – subject ID

  • dir (str) – path where the .h5 file is located, if None, prefs.DATADIR is used

  • file (str) – load a subject from a filename. if None, ignored.

  • new (bool) – if True, a new file is made (a new file is made if one does not exist anyway)

  • biography (dict) – If making a new subject file, a dictionary with biographical data can be passed

open_hdf(mode='r+')[source]

Opens the hdf5 file.

This should be called at the start of every method that access the h5 file and close_hdf() should be called at the end. Otherwise the file will close and we risk file corruption.

See the pytables docs here and here

Parameters

mode (str) – a file access mode, can be:

  • ‘r’: Read-only - no data can be modified.

  • ‘w’: Write - a new file is created (an existing file with the same name would be deleted).

  • ‘a’ Append - an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ‘r+’ (default) - Similar to ‘a’, but file must already exist.

Returns

Opened hdf file.

Return type

tables.File

close_hdf(h5f)[source]

Flushes & closes the open hdf file. Must be called whenever open_hdf() is used.

Parameters

h5f (tables.File) – the hdf file opened by open_hdf()

new_subject_file(biography)[source]

Create a new subject file and make the general filestructure.

If a file already exists, open it in append mode, otherwise create it.

Parameters

biography (dict) – Biographical details like DOB, mass, etc. Typically created by Biography_Tab.

ensure_structure()[source]

Ensure that our h5f has the appropriate baseline structure as defined in self.STRUCTURE

Checks that all groups and tables are made, makes them if not

update_biography(params)[source]

Change or make a new biographical attribute, stored as attributes of the info group.

Parameters

params (dict) – biographical attributes to be updated.

update_history(type, name, value, step=None)[source]

Update the history table when changes are made to the subject’s protocol.

The current protocol is flushed to the past_protocols group and an updated filenode is created.

Note

This only updates the history table, and does not make the changes itself.

Parameters
  • type (str) – What type of change is being made? Can be one of

    • ‘param’ - a parameter of one task stage

    • ‘step’ - the step of the current protocol

    • ‘protocol’ - the whole protocol is being updated.

  • name (str) – the name of either the parameter being changed or the new protocol

  • value (str) – the value that the parameter or step is being changed to, or the protocol dictionary flattened to a string.

  • step (int) – When type is ‘param’, changes the parameter at a particular step, otherwise the current step is used.

assign_protocol(protocol, step_n=0)[source]

Assign a protocol to the subject.

If the subject has a currently assigned task, stashes it with stash_current()

Creates groups and tables according to the data descriptions in the task class being assigned. eg. as described in Task.TrialData.

Updates the history table.

Parameters
  • protocol (str) – the protocol to be assigned. Can be one of

    • the name of the protocol (its filename minus .json) if it is in prefs.PROTOCOLDIR

    • filename of the protocol (its filename with .json) if it is in the prefs.PROTOCOLDIR

    • the full path and filename of the protocol.

  • step_n (int) – Which step is being assigned?

flush_current()[source]

Flushes the ‘current’ attribute in the subject object to the current filenode in the .h5

Used to make sure the stored .json representation of the current task stays up to date with the params set in the subject object

stash_current()[source]

Save the current protocol in the history group and delete the node

Typically this is called when assigning a new protocol.

Stored as the date that it was changed followed by its name if it has one

prepare_run()[source]

Prepares the Subject object to receive data while running the task.

Gets information about current task, trial number, spawns Graduation object, spawns data_queue and calls data_thread().

Returns

the parameters for the current step, with subject id, step number,

current trial, and session number included.

Return type

Dict

data_thread(queue)[source]

Thread that keeps hdf file open and receives data while task is running.

receives data through queue as dictionaries. Data can be partial-trial data (eg. each phase of a trial) as long as the task returns a dict with ‘TRIAL_END’ as a key at the end of each trial.

each dict given to the queue should have the trial_num, and this method can properly store data without passing TRIAL_END if so. I recommend being explicit, however.

Checks graduation state at the end of each trial.

Parameters

queue (queue.Queue) – passed by prepare_run() and used by other objects to pass data to be stored.

save_data(data)[source]

Alternate and equivalent method of putting data in the queue as Subject.data_queue.put(data)

Parameters

data (dict) – trial data. each should have a ‘trial_num’, and a dictionary with key ‘TRIAL_END’ should be passed at the end of each trial.

stop_run()[source]

puts ‘END’ in the data_queue, which causes data_thread() to end.

to_csv(path, task='current', step='all')[source]

Export trial data to .csv

Parameters
  • path (str) – output path of .csv

  • task (str, int) – not implemented, but in the future pull data from ‘current’ or other named task

  • step (str, int, list, tuple) – Step to select, see Subject.get_trial_data()

get_trial_data(step: Union[int, list, str] = - 1, what: str = 'data')[source]

Get trial data from the current task.

Parameters
  • step (int, list, ‘all’) – Step that should be returned, can be one of

    • -1: most recent step

    • int: a single step

    • list of two integers eg. [0, 5], an inclusive range of steps.

    • string: the name of a step (excluding S##_)

    • ‘all’: all steps.

  • what (str) – What should be returned?

    • ‘data’ : Dataframe of requested steps’ trial data

    • ‘variables’: dict of variables without loading data into memory

Returns

DataFrame of requested steps’ trial data.

Return type

pandas.DataFrame

apply_along(along='session', step=- 1)[source]
get_step_history(use_history=True)[source]

Gets a dataframe of step numbers, timestamps, and step names as a coarse view of training status.

Parameters

use_history (bool) – whether to use the history table or to reconstruct steps and dates from the trial table itself. compatibility fix for old versions that didn’t stash step changes when the whole protocol was updated.

Returns

pandas.DataFrame

get_timestamp(simple=False)[source]

Makes a timestamp.

Parameters

simple (bool) –

if True:

returns as format ‘%y%m%d-%H%M%S’, eg ‘190201-170811’

if False:

returns in isoformat, eg. ‘2019-02-01T17:08:02.058808’

Returns

basestring

get_weight(which='last', include_baseline=False)[source]

Gets start and stop weights.

Todo

add ability to get weights by session number, dates, and ranges.

Parameters
  • which (str) – if ‘last’, gets most recent weights. Otherwise returns all weights.

  • include_baseline (bool) – if True, includes baseline and minimum mass.

Returns

dict

set_weight(date, col_name, new_value)[source]

Updates an existing weight in the weight table.

Todo

Yes, i know this is bad. Merge with update_weights

Parameters
  • date (str) – date in the ‘simple’ format, %y%m%d-%H%M%S

  • col_name (‘start’, ‘stop’) – are we updating a pre-task or post-task weight?

  • new_value (float) – New mass.

update_weights(start=None, stop=None)[source]

Store either a starting or stopping mass.

start and stop can be passed simultaneously, start can be given in one call and stop in a later call, but stop should not be given before start.

Parameters
  • start (float) – Mass before running task in grams

  • stop (float) – Mass after running task in grams.

graduate()[source]

Increase the current step by one, unless it is the last step.

class History_Table

Bases: tables.description.IsDescription

Class to describe parameter and protocol change history

Variables
  • time (str) – timestamps

  • type (str) – Type of change - protocol, parameter, step

  • name (str) – Name - Which parameter was changed, name of protocol, manual vs. graduation step change

  • value (str) – Value - What was the parameter/protocol/etc. changed to, step if protocol.

columns = {'name': StringCol(itemsize=256, shape=(), dflt=b'', pos=None), 'time': StringCol(itemsize=256, shape=(), dflt=b'', pos=None), 'type': StringCol(itemsize=256, shape=(), dflt=b'', pos=None), 'value': StringCol(itemsize=4028, shape=(), dflt=b'', pos=None)}
class Weight_Table

Bases: tables.description.IsDescription

Class to describe table for weight history

Variables
  • start (float) – Pre-task mass

  • stop (float) – Post-task mass

  • date (str) – Timestamp in simple format

  • session (int) – Session number

columns = {'date': StringCol(itemsize=256, shape=(), dflt=b'', pos=None), 'session': Int32Col(shape=(), dflt=0, pos=None), 'start': Float32Col(shape=(), dflt=0.0, pos=None), 'stop': Float32Col(shape=(), dflt=0.0, pos=None)}
class Hash_Table

Bases: tables.description.IsDescription

Class to describe table for hash history

Variables
  • time (str) – Timestamps

  • hash (str) – Hash of the currently checked out commit of the git repository.

columns = {'hash': StringCol(itemsize=40, shape=(), dflt=b'', pos=None), 'time': StringCol(itemsize=256, shape=(), dflt=b'', pos=None)}