--------------------------------------------------- lines 6-155 of file: at_cascade/create_job_table.py --------------------------------------------------- {xrst_begin create_job_table} {xrst_spell bool } Table of Job Parent Child Relationships ####################################### Prototype ********* {xrst_literal , # BEGIN_DEF, END_DEF # BEGIN_RETURN, END_RETURN } Summary ******* This routine returns a list where each element corresponds to a job: #. A job is a combination of a node and split reference value. For example, if the node is n0 and we are splitting on sex some possible jobs are n0.female, n0.male. #. All of the jobs that have :ref:`create_job_table@job_table@prior_only` false, must be fit to fit all the jobs for the nodes in the *fit_goal_set*. #. Each job has a *parent_job_id* for the job that needs to be fit before it, except for the start job which corresponds to the start node and start split reference id. The *prior_only* field is false for any job that is a parent; i.e., all the parent jobs are fit. #. Each job also has a list of which jobs need be run after it (to fit the *fit_goal_set* ). #. If a job has *prior_only* true, it does not need to be fit for this *fit_goal_set*, but its priors should be created (when the corresponding parent job is fit) so it could be the start job for a different *fit_goal_set* . all_node_database ***************** is a python string specifying the location of the :ref:`all_node_db-name` relative to the current working directory. This argument can't be ``None``. node_table ********** is a ``list`` of ``dict`` containing the node table for this cascade. This argument can't be ``None``. start_node_id ************* This, together with *start_split_reference_id* corresponds to a completed fit that we are starting from. We assume that the priors for this fit have been created; see prior_only below. The start node must be a descendant of the :ref:`glossary@root_node` . start_split_reference_id ************************ This, together with *start_node_id* corresponds to a completed fit that we are starting from. Only jobs that depend on the start jobs completion will be included in the job table. This is ``None`` if and only if :ref:`split_reference_table-name` is empty. fit_goal_set ************ This is the a :ref:`glossary@fit_goal_set`. In addition, each such node must be the start node, or a descendant of the start node. job_table ********* The return value *job_table* is a ``list`` of ``dict`` : job_id ====== We use *this_job_id* to denote the index of a row in the job_table list. The value *job_table[job_id]* is a ``dict`` with the following keys: job_name ======== This is a ``str`` containing the job name. If the :ref:`split_reference_table-name` is empty, *job_name* is equal to *node_name* where *node_name* is the node name corresponding to *node_id*. Otherwise, *job_name* is equal to *node_name*\ ``.``\ *split_reference_name* where *split_reference_name* is the split reference name corresponding to *split_reference_id*. prior_only ========== If this ``bool`` is false, this job must be run to fit all the nodes in *fit_goal_set* . It will be false if this is the start job; i.e, the start job must be fit to fit the nodes in *fit_goal_set*. If *prior_only* is true, *prior_only* cannot be true for the corresponding parent job. The priors for this job will be created if the parent job succeeds, but this job will not be run and it will not have any children. These priors are intended to be used by a subsequent call to :ref:`continue_cascade-name` where this job is the start job ( and *prior_only* is false because we have a different *fit_goal_set* ). fit_node_id =========== This is an ``int`` containing the node_id for the :ref:`glossary@fit_node` for this *this_job_id*. split_reference_id ================== If the split_reference table is empty, this is ``None``. Otherwise it is an ``int`` containing the :ref:`split_reference_table@split_reference_id` for this *this_job_id*; i.e. the splitting covariate has this reference value. parent_job_id ============= This is an ``int`` containing the job_id corresponding to the parent job which must be greater than the job_id for this row of the job table. The parent job (and only the parent job) must have completed before this job can be run. This first row of the job table has *parent_job_id* equal to None; i.e., there is not parent for the first job. start_child_job_id ================== This is the job_id for the first job that can run as soon as this job is completed. The start_child_job_id is always greater than the job_id for the current row. The simplest way to run the jobs is in job table order (not in parallel). end_child_job_id ================ This is the job_id plus one for the last job that can run as soon as this job is completed. If end_child_job_id is equal to start_child_job_id, there are no jobs that require the results of this job. Note that this job is the parent of each job between the start and end, {xrst_end create_job_table}