Database Schema Documentation

Version: 2.2 (Verified against production database) Last Validated: 2025-11-09 Database: MongoDB with 11 collections Total Documents: Production deployment with multiple collections

For application architecture and data flow, see Architecture. For configuration details, see Platform Overview.

Overview

This document provides comprehensive documentation of all MongoDB collections used in the SmartRunning Coach application. The database stores runner profiles, training sessions, workout libraries, training plans, and background job information.

Note: This schema reflects the ACTUAL production database structure, validated against live MongoDB data.

Collections Summary

Collection	Documents	Purpose	Key Relationships
`runner`	4	Athlete profiles and settings	Links to sessions, plans, history
`sessionsFIT`	556	Raw FIT file data	Links to runner, GridFS files
`sessionsParquet`	556	Processed session records	Links to runner, GridFS files
`sessionStatistics`	556	Aggregated session metrics	Links to runner, sessions
`runnerHistory`	1,256	Historical training load data	Links to runner
`workouts`	-	Workout library	Referenced by training plans
`trainingplanschema`	2	Training plan templates	Used to generate training plans
`trainingplan`	993	Training plan days with versioning	Links to runner, workouts, supports active plan management
`background_jobs`	13	Async job tracking	Links to runner, files
`fs.files`	1,112	GridFS file metadata	Links to sessions
`fs.chunks`	1,644	GridFS file chunks	Links to fs.files

Collection Schemas

1. runner

Purpose: Stores athlete profiles, physiological parameters, training metrics, and training plan configuration.

Document Count: 4 Indexes: _id_, runnerID_idx, marathon_date_idx

Fields (38 total):

Field	Type	Required	Description	Validation
`_id`	ObjectId	Yes	Unique identifier	MongoDB ObjectId
`runnerID`	String	Yes	Human-readable runner identifier	Unique string (e.g., "Marc_001")
`name`	String	Yes	Runner's full name	Non-empty string
`age`	Integer	Yes	Runner's age	Positive integer
Training Load Metrics
`CTL_starting`	Float	Yes	Initial Chronic Training Load	Non-negative float
`ATL_starting`	Float	Yes	Initial Acute Training Load	Non-negative float
`CTL_current`	Float	Yes	Current Chronic Training Load	Non-negative float
`ATL_current`	Float	Yes	Current Acute Training Load	Non-negative float
`TSB_current`	Float	Yes	Current Training Stress Balance	Float (can be negative)
`ACWR_current`	Float	Yes	Current Acute:Chronic Workload Ratio	Non-negative float
Banister Model Parameters
`banister_k1`	Float	Yes	Fitness gain coefficient	Positive float
`banister_k2`	Float	Yes	Fatigue gain coefficient	Positive float
`banister_tau_f`	Float	Yes	Fitness decay time constant (days)	Positive float
`banister_tau_fr`	Float	Yes	Fatigue decay time constant (days)	Positive float
Heart Rate Zones (Legacy - Being Migrated)
`hr_zone_1_max`	Float	No	Zone 1 upper limit (bpm)	0-220 (legacy)
`hr_zone_2_max`	Float	No	Zone 2 upper limit (bpm)	0-220 (legacy)
`hr_zone_3_max`	Float	No	Zone 3 upper limit (bpm)	0-220 (legacy)
`hr_zone_4_max`	Float	No	Zone 4 upper limit (bpm)	0-220 (legacy)
`hr_zone_5_max`	Float	No	Zone 5 upper limit (bpm)	0-220 (legacy)
`heart_rate_zones_history`	Array	Yes	Time series HR zones	See Time Series Schema
Personal Bests (Legacy - Being Migrated)
`pb_400m_seconds`	Float	No	400m personal best time	Positive float (legacy)
`pb_800m_seconds`	Float	No	800m personal best time	Positive float (legacy)
`pb_5000m_seconds`	Float	No	5000m personal best time	Positive float (legacy)
`personal_bests_history`	Object	Yes	Time series PBs by distance	See Time Series Schema
Training Stress Score Thresholds
`rtss_min`	Float	Yes	Minimum rTSS for workout	Non-negative float
`rtss_max`	Float	Yes	Maximum rTSS for workout	Non-negative float
Speed Metrics
`threshold_speed`	Float	Yes	Threshold speed (m/s)	Positive float
Training Plan Configuration
`blocks`	Integer	Yes	Number of training blocks	Positive integer
`weeks_per_block`	Integer	Yes	Weeks per training block	Positive integer
`taper_weeks`	Integer	Yes	Number of taper weeks	Positive integer
`weekly_increment`	Float	Yes	Weekly volume increase (%)	Positive float
`training_days`	Array	Yes	Scheduled training days	Array of integers (1-7)
`marathon_date`	DateTime	No	Target marathon date	ISO 8601 datetime
`prep_start`	DateTime	No	Preparation start date	ISO 8601 datetime or null
Data Collections
`sessions`	Array	Yes	Session references	Array (typically empty)
`schedules`	Array	Yes	Schedule references	Array (typically empty)
`critical_speed_history`	Array	No	Historical critical speed data	Array of objects (legacy)
`critical_speed_time_series`	Array	Yes	Time series critical speed	See Time Series Schema
Metadata
`created_at`	DateTime	Yes	Account creation timestamp	ISO 8601 datetime
`updated_at`	DateTime	Yes	Last modification timestamp	ISO 8601 datetime
`history_last_updated`	DateTime	No	Last history update	ISO 8601 datetime

Critical Speed History Schema:

{
  "critical_speed_history": [
    {
      "date": "2025-09-17T08:55:16.393Z",
      "critical_speed": 3.189,        // m/s
      "w_prime": 285.2,                // meters
      "r_squared": 0.9986,             // model fit quality
      "pb_400m_seconds": 65,
      "pb_800m_seconds": 131,
      "pb_5000m_seconds": 1480
    }
  ]
}

Sample Document:

{
  "_id": ObjectId("68c3f9062a04875c129365d4"),
  "runnerID": "Marc_001",
  "name": "Marc De Reu",
  "age": 55,
  "CTL_starting": 45.0,
  "ATL_starting": 30.0,
  "CTL_current": 49.1,
  "ATL_current": 1.2,
  "TSB_current": 47.87,
  "ACWR_current": 0.025,
  "banister_k1": 1.0,
  "banister_k2": 2.0,
  "banister_tau_f": 42.0,
  "banister_tau_fr": 7.0,
  "hr_zone_1_max": 100.0,
  "hr_zone_2_max": 120.0,
  "hr_zone_3_max": 140.0,
  "hr_zone_4_max": 160.0,
  "hr_zone_5_max": 200.0,
  "pb_400m_seconds": 65.0,
  "pb_800m_seconds": 131.0,
  "pb_5000m_seconds": 1480.0,
  "rtss_min": 30.0,
  "rtss_max": 150.0,
  "threshold_speed": 2.56,
  "blocks": 4,
  "weeks_per_block": 4,
  "taper_weeks": 2,
  "weekly_increment": 10.0,
  "training_days": [1, 3, 5, 6],
  "marathon_date": "2026-01-01T00:00:00",
  "prep_start": null,
  "sessions": [],
  "schedules": [],
  "critical_speed_history": [
    {
      "date": "2025-09-17T08:55:16.393Z",
      "critical_speed": 3.189,
      "w_prime": 285.2,
      "r_squared": 0.9986,
      "pb_400m_seconds": 65,
      "pb_800m_seconds": 131,
      "pb_5000m_seconds": 1480
    }
  ],
  "created_at": "2025-09-12T10:42:14.274000",
  "updated_at": "2025-09-17T08:55:16.397000",
  "history_last_updated": "2025-09-12T11:19:12.011203"
}

2. sessionsFIT

Purpose: Stores raw FIT file data and metadata for uploaded workout sessions.

Document Count: 556 Indexes: _id_ only

Fields:

Field	Type	Required	Description	Validation
`_id`	ObjectId	Yes	Unique identifier	MongoDB ObjectId
`runner_id`	ObjectId	Yes	Reference to runner	Valid runner._id
`file_id`	ObjectId	Yes	GridFS file reference	Valid fs.files._id
`original_filename`	String	Yes	Original filename	Non-empty string
`file_size`	Integer	Yes	File size in bytes	Positive integer
`uploaded_at`	DateTime	Yes	Upload timestamp	ISO 8601 datetime
`processed`	Boolean	Yes	Processing status	true/false
`parquet_files_created`	Integer	No	Count of parquet files	Non-negative integer
`metadata`	Object	Yes	FIT file metadata	See Metadata Schema

Metadata Schema:

{
  "metadata": {
    "message_types": [
      "file_id_mesgs",
      "file_creator_mesgs",
      "activity_mesgs",
      "session_mesgs",
      "lap_mesgs",
      "record_mesgs",
      // ... additional message types
    ],
    "total_messages": 13671,
    "errors": [],
    "session_info": {
      "start_time": "2024-11-26T11:00:48",
      "total_elapsed_time": 2986.878,    // seconds
      "total_distance": 7861.87,          // meters
      "sport": "running",
      "sub_sport": "generic"
    }
  }
}

Sample Document:

{
  "_id": ObjectId("68c3f9122a04875c129365d6"),
  "runner_id": ObjectId("68c3f9062a04875c129365d4"),
  "original_filename": "tp-1581646.2024-11-26-11-58-13-283Z.GarminPing.AAAAAGdFt9Tjz2sX.FIT",
  "file_id": ObjectId("68c3f9122a04875c129365d7"),
  "file_size": 411633,
  "uploaded_at": "2025-09-12T10:42:26.856000",
  "processed": true,
  "parquet_files_created": 1,
  "metadata": {
    "message_types": ["file_id_mesgs", "file_creator_mesgs", "activity_mesgs", "..."],
    "total_messages": 13671,
    "errors": [],
    "session_info": {
      "start_time": "2024-11-26T11:00:48",
      "total_elapsed_time": 2986.878,
      "total_distance": 7861.87,
      "sport": "running",
      "sub_sport": "generic"
    }
  }
}

3. sessionsParquet

Purpose: Stores processed session data with detailed record-level information in Parquet format.

Document Count: 556 Indexes: _id_ only

Fields:

Field	Type	Required	Description	Validation
`_id`	ObjectId	Yes	Unique identifier	MongoDB ObjectId
`session_id`	ObjectId	Yes	Reference to FIT session	Valid sessionsFIT._id
`runner_id`	ObjectId	Yes	Reference to runner	Valid runner._id
`original_fit_filename`	String	Yes	Source FIT filename	Non-empty string
`parquet_filename`	String	Yes	Parquet filename	Non-empty string
`file_id`	ObjectId	Yes	GridFS parquet file	Valid fs.files._id
`file_size`	Integer	Yes	Parquet file size (bytes)	Positive integer
`created_at`	DateTime	Yes	Creation timestamp	ISO 8601 datetime
`metadata`	Object	Yes	Parquet structure metadata	See Metadata Schema

Metadata Schema:

{
  "metadata": {
    "rows": 2987,                        // Number of data records
    "columns": 42,                       // Number of columns
    "column_names": [
      "cumulative_time",
      "timestamp",
      "position_lat",
      "position_long",
      "distance",
      "enhanced_speed",
      "enhanced_altitude",
      "heart_rate",
      "cadence",
      "temperature",
      // ... additional columns
    ],
    "data_types": {
      "cumulative_time": "int64",
      "timestamp": "datetime64[ns, UTC]",
      "position_lat": "float64",
      "position_long": "float64",
      "distance": "float64",
      "heart_rate": "float64",
      "cadence": "float64",
      // ... additional data types
    },
    "time_range": {
      "start": "2024-11-26T11:00:48+00:00",
      "end": "2024-11-26T11:50:34+00:00",
      "duration_seconds": 2987
    }
  }
}

Sample Document:

{
  "_id": ObjectId("68c3f9132a04875c129365dc"),
  "session_id": ObjectId("68c3f9122a04875c129365d6"),
  "runner_id": ObjectId("68c3f9062a04875c129365d4"),
  "original_fit_filename": "tp-1581646.2024-11-26-11-58-13-283Z.GarminPing.AAAAAGdFt9Tjz2sX.FIT",
  "parquet_filename": "tp-1581646.2024-11-26-11-58-13-283Z.GarminPing.AAAAAGdFt9Tjz2sX_record_mesgs.parquet",
  "file_id": ObjectId("68c3f9132a04875c129365da"),
  "file_size": 216719,
  "created_at": "2025-09-12T10:42:26.856000",
  "metadata": {
    "rows": 2987,
    "columns": 42,
    "column_names": ["cumulative_time", "timestamp", "position_lat", "..."],
    "data_types": {
      "cumulative_time": "int64",
      "timestamp": "datetime64[ns, UTC]",
      "heart_rate": "float64"
    },
    "time_range": {
      "start": "2024-11-26T11:00:48+00:00",
      "end": "2024-11-26T11:50:34+00:00",
      "duration_seconds": 2987
    }
  }
}

4. sessionStatistics

Purpose: Stores aggregated statistics and derived metrics for training sessions.

Document Count: 556 Indexes: _id_ only

Fields (65 total):

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`session_id`	ObjectId	Yes	Reference to session
`runner_id`	ObjectId	Yes	Reference to runner
`processed_timestamp`	DateTime	Yes	Calculation timestamp
`sport`	String	Yes	Activity type (e.g., "running")
`sub_sport`	String	Yes	Sub-activity type (e.g., "generic")
Heart Rate Statistics
`hr_mean_bpm`	Float	No	Average heart rate
`hr_median_bpm`	Float	No	Median heart rate
`hr_std_bpm`	Float	No	Heart rate std deviation
`hr_min_bpm`	Float	No	Minimum heart rate
`hr_max_bpm`	Float	No	Maximum heart rate
`hr_max_session_bpm`	Float	No	Session max heart rate
`hr_iqr_bpm`	Float	No	Interquartile range
`hr_5th_percentile_bpm`	Float	No	5th percentile HR
`hr_95th_percentile_bpm`	Float	No	95th percentile HR
Heart Rate Zones
`hr_zone_1_recovery_pct`	Float	No	% time in zone 1
`hr_zone_1_recovery_seconds`	Integer	No	Seconds in zone 1
`hr_zone_2_endurance_pct`	Float	No	% time in zone 2
`hr_zone_2_endurance_seconds`	Integer	No	Seconds in zone 2
`hr_zone_3_tempo_pct`	Float	No	% time in zone 3
`hr_zone_3_tempo_seconds`	Integer	No	Seconds in zone 3
`hr_zone_4_threshold_pct`	Float	No	% time in zone 4
`hr_zone_4_threshold_seconds`	Integer	No	Seconds in zone 4
`hr_zone_5_vo2max_pct`	Float	No	% time in zone 5
`hr_zone_5_vo2max_seconds`	Integer	No	Seconds in zone 5
Speed Statistics
`speed_mean_m_s`	Float	No	Average speed (m/s)
`speed_median_m_s`	Float	No	Median speed (m/s)
`speed_std_m_s`	Float	No	Speed std deviation
`speed_min_m_s`	Float	No	Minimum speed
`speed_max_m_s`	Float	No	Maximum speed
`speed_iqr_m_s`	Float	No	Interquartile range
`speed_5th_percentile_m_s`	Float	No	5th percentile speed
`speed_95th_percentile_m_s`	Float	No	95th percentile speed
`speed_mean_ypm`	Float	No	Average speed (yards/min)
Pace Statistics
`pace_mean_min_km`	Float	No	Average pace (min/km)
`pace_median_min_km`	Float	No	Median pace (min/km)
`pace_min_min_km`	Float	No	Best pace (min/km)
`pace_max_min_km`	Float	No	Worst pace (min/km)
`pace_std_min_km`	Float	No	Pace std deviation
Efficiency Metrics
`efficiency_factor_mean`	Float	No	Average efficiency factor
`efficiency_factor_median`	Float	No	Median efficiency factor
`efficiency_factor_std`	Float	No	Efficiency std deviation
Session Totals
`session_duration_seconds`	Integer	Yes	Total duration (seconds)
`session_duration_minutes`	Float	Yes	Total duration (minutes)
`total_distance_m`	Float	Yes	Total distance (meters)
`total_distance_km`	Float	Yes	Total distance (kilometers)
Elevation
`elevation_gain_m`	Float	No	Total elevation gain
`elevation_loss_m`	Float	No	Total elevation loss
`min_altitude_m`	Float	No	Minimum altitude
`max_altitude_m`	Float	No	Maximum altitude
Cadence Statistics
`cadence_mean_spm`	Float	No	Average cadence (steps/min)
`cadence_median_spm`	Float	No	Median cadence
`cadence_std_spm`	Float	No	Cadence std deviation
`cadence_min_spm`	Float	No	Minimum cadence
`cadence_max_spm`	Float	No	Maximum cadence
Personal Bests
`fastest_400m_seconds`	Integer	No	Best 400m split
`fastest_800m_seconds`	Integer	No	Best 800m split
`fastest_1km_seconds`	Integer	No	Best 1km split
`fastest_5km_seconds`	Integer	No	Best 5km split
`pb_updates`	Object	No	Personal best updates
Training Metrics
`performance_index`	Float	No	Performance index
`rtss`	Float	Yes	Running Training Stress Score
`rtss_threshold_speed`	Float	No	Threshold speed for rTSS
`rtss_intensity_factor`	Float	No	Intensity factor for rTSS
`rtss_critical_speed`	Float	No	Critical speed for rTSS

Sample Document:

{
  "_id": ObjectId("68c3fa2a2a04875c12936b4f"),
  "session_id": ObjectId("68c3f9122a04875c129365d6"),
  "runner_id": ObjectId("68c3f9062a04875c129365d4"),
  "processed_timestamp": "2025-09-12T10:47:03.589066",
  "sport": "running",
  "sub_sport": "generic",
  "hr_mean_bpm": 145.48,
  "hr_median_bpm": 143.0,
  "hr_max_bpm": 166.0,
  "hr_zone_3_tempo_pct": 28.89,
  "hr_zone_3_tempo_seconds": 863,
  "hr_zone_4_threshold_pct": 57.88,
  "hr_zone_4_threshold_seconds": 1729,
  "speed_mean_m_s": 2.66,
  "pace_mean_min_km": 7.72,
  "efficiency_factor_mean": 1.20,
  "session_duration_seconds": 2986,
  "session_duration_minutes": 49.77,
  "total_distance_m": 7861.87,
  "total_distance_km": 7.86,
  "elevation_gain_m": 53.80,
  "cadence_mean_spm": 79.31,
  "fastest_1km_seconds": 351,
  "fastest_5km_seconds": 1882,
  "rtss": 89.74,
  "rtss_threshold_speed": 2.56,
  "rtss_intensity_factor": 1.04
}

5. runnerHistory

Purpose: Stores historical training load progression and daily aggregated metrics.

Document Count: 1,256 Indexes: _id_ only

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`runner_id`	ObjectId	Yes	Reference to runner
`date`	DateTime	Yes	History date
`session_ids`	Array	No	Session ObjectIds for day
Training Load (Banister Model)
`rtss`	Float	Yes	Running Training Stress Score
`CTL`	Float	Yes	Chronic Training Load
`ATL`	Float	Yes	Acute Training Load
`TSB`	Float	Yes	Training Stress Balance
`ACWR`	Float	Yes	Acute:Chronic Workload Ratio
Daily Aggregates
`total_distance_km`	Float	No	Total distance for day
`session_duration_minutes`	Float	No	Total duration for day
`hr_mean_bpm`	Float	No	Average HR for day
`pace_mean_min_km`	Float	No	Average pace for day
`efficiency_factor_mean`	Float	No	Average efficiency for day
Calendar Info
`week_number`	Integer	No	ISO week number
`year`	Integer	No	Year
Metadata
`updated_at`	DateTime	Yes	Last update timestamp

Sample Document:

{
  "_id": ObjectId("68c4017c2a04875c12937249"),
  "runner_id": ObjectId("68c3fa4e2a04875c12936b59"),
  "date": "2024-09-07T00:00:00",
  "session_ids": [ObjectId("68c3fb7e2a04875c12936f31")],
  "rtss": 47.75,
  "CTL": 45.0,
  "ATL": 30.0,
  "TSB": 15.0,
  "ACWR": 0.67,
  "total_distance_km": 23.58,
  "session_duration_minutes": 409.8,
  "hr_mean_bpm": 134.30,
  "pace_mean_min_km": 11.24,
  "efficiency_factor_mean": 0.32,
  "week_number": 36,
  "year": 2024,
  "updated_at": "2025-09-12T11:18:20.076370"
}

6. trainingplan

Purpose: Stores individual days of training plans with full versioning support. Each document represents a single day in a training plan. Supports versioning for mid-plan regeneration and active plan management.

Document Count: 993 Indexes: _id_, date_idx, training_days_idx, week_idx, session_code_idx, date_training_idx, active_plan_lookup_idx, plan_list_view_idx

Important: This is a flat structure - one document per day, not nested blocks/weeks.

New Features (Version 2.2):

Active Plan Management: Each runner has exactly ONE active training plan (identified by is_active: true AND is_latest: true)
Generation Parameters: Stores complete parameters used to generate each plan for consistency during regeneration
Version Tracking: Full history of plan versions with parent-child relationships
Parameter Preservation: Regeneration pre-fills parameters from original plan

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`trainingplanID`	String	Yes	Plan identifier (UUID) - same across versions
`runnerID`	String	Yes	Runner identifier
`date`	DateTime	Yes	Date for this day
`weekday`	Integer	Yes	Day of week (0=Monday, 6=Sunday)
`week`	Integer	Yes	Week number in plan
`phase`	String	Yes	Training phase (e.g., "('block', 1)")
`is_training`	Boolean	Yes	Is this a training day?
`recovery`	Boolean	Yes	Is this a recovery day?
`rTSS`	Float	Yes	Planned training stress score
`CTL`	Float	Yes	Projected Chronic Training Load
`ATL`	Float	Yes	Projected Acute Training Load
`TSB`	Float	Yes	Projected Training Stress Balance
`ACWR`	Float	Yes	Projected Acute:Chronic Workload Ratio
`session_code`	String	Yes	Reference to workout (empty if rest day)
`creationdate`	DateTime	Yes	When plan was created
`version`	Integer	Yes	Version number (1, 2, 3, ...)
`parent_version`	Integer	No	Previous version number (null for v1)
`is_latest`	Boolean	Yes	Is this the current version?
`regeneration_date`	DateTime	No	When this version was regenerated (null for v1)
`regeneration_reason`	String	No	Why regeneration occurred (high_acwr, illness, manual, other)

Query Patterns:

// Get latest version of a training plan (RECOMMENDED)
db.trainingplan.find({
  trainingplanID: "uuid",
  is_latest: true
}).sort({ date: 1 })

// Get all versions of a training plan
db.trainingplan.find({
  trainingplanID: "uuid"
}).sort({ version: -1, date: 1 })

// Get specific version
db.trainingplan.find({
  trainingplanID: "uuid",
  version: 2
}).sort({ date: 1 })

// Get specific week of latest version
db.trainingplan.find({
  trainingplanID: "uuid",
  is_latest: true,
  week: 1
}).sort({ date: 1 })

// Get training days only (latest version)
db.trainingplan.find({
  trainingplanID: "uuid",
  is_latest: true,
  is_training: true
})

// Get version metadata
db.trainingplan.aggregate([
  { $match: { trainingplanID: "uuid" } },
  { $group: {
      _id: "$version",
      parent_version: { $first: "$parent_version" },
      is_latest: { $first: "$is_latest" },
      regeneration_date: { $first: "$regeneration_date" },
      regeneration_reason: { $first: "$regeneration_reason" },
      total_days: { $sum: 1 }
    }
  },
  { $sort: { _id: -1 } }
])

Versioning Behavior:

Same trainingplanID across all versions of a plan
Only one version has is_latest: true at a time
version increments: 1, 2, 3, etc.
parent_version links to previous version (forms chain)
Historical versions preserved forever (immutable)

Sample Documents:

Version 1 (Original Plan):

{
  "_id": ObjectId("68c3f360bff7cc7967116ee8"),
  "trainingplanID": "145d05ec-1873-42ea-a5b4-8cf6b8e7a6a2",
  "runnerID": "HDL",
  "date": "2025-09-01T00:00:00",
  "weekday": 0,
  "week": 1,
  "phase": "('block', 1)",
  "is_training": false,
  "recovery": false,
  "rTSS": 0.0,
  "CTL": 12.8,
  "ATL": 2.4,
  "TSB": 10.4,
  "ACWR": 0.19,
  "session_code": "",
  "creationdate": "2025-09-12T10:16:37.825000",
  "version": 1,
  "parent_version": null,
  "is_latest": false,  // Superseded by v2
  "regeneration_date": null,
  "regeneration_reason": null
}

Version 2 (Regenerated After ACWR Spike):

{
  "_id": ObjectId("68c3f361bff7cc7967116ee9"),
  "trainingplanID": "145d05ec-1873-42ea-a5b4-8cf6b8e7a6a2",  // Same ID
  "runnerID": "HDL",
  "date": "2025-10-15T00:00:00",  // Later date
  "weekday": 2,
  "week": 8,
  "phase": "('block', 2)",
  "is_training": true,
  "recovery": false,
  "rTSS": 85.0,  // Re-optimized value
  "CTL": 58.3,   // Re-calculated
  "ATL": 42.1,   // Re-calculated
  "TSB": 16.2,   // Re-calculated
  "ACWR": 0.72,  // Re-calculated
  "session_code": "END4-W8D3-E4925C",
  "creationdate": "2025-09-12T10:16:37.825000",  // Original creation
  "version": 2,
  "parent_version": 1,
  "is_latest": true,  // Current version
  "regeneration_date": "2025-11-07T14:30:00.000000",
  "regeneration_reason": "high_acwr"
}

7. trainingplanschema

Purpose: Templates for generating periodized training plans with multiple training blocks. This is different from trainingplan - schemas are templates, trainingplan documents are actual scheduled days.

Document Count: 3 Indexes: Not verified

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`name`	String	Yes	Schema name
`description`	String	No	Schema description
`author`	String	No	Author name
`blocks`	Array	Yes	Training blocks array
`is_public`	Boolean	Yes	Public visibility flag (true = visible to all)
`created_by`	ObjectId	No	User ID of creator (null for grandfathered data)
`created_at`	DateTime	No	Creation timestamp
`last_modified`	DateTime	No	Last modification timestamp

Block Schema:

{
  "blocks": [
    {
      "block_type": "Volume 1",        // Block name/phase
      "workouts": [
        {
          "day_in_the_week": "Monday",
          "day_in_week": 1,            // 1-7
          "performed": false,
          "rtss": 50,
          "sequence_of_execution": 1,
          "session_code": "END4-W1D1-E4924B",
          "session_definition": "Interval session: 5 x 45\"",
          "session_type": "interval",   // interval, LSD, recovery, tempo
          "week_in_block": 1,
          "workout_id": null
        }
      ]
    }
  ]
}

Sample Document: See validation report for full example with nested blocks and workouts.

8. workouts

Purpose: Library of reusable workout templates in ZWO (Zwift Workout) XML format.

Document Count: 63 Indexes: Not verified

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`filename`	String	Yes	Generated filename (e.g., "Workout-Name-ABC123.zwo")
`author`	String	Yes	Workout author/creator name
`name`	String	Yes	Workout name
`description`	String	No	Workout description
`sport_type`	String	Yes	Sport type (e.g., "run")
`duration_type`	String	Yes	Duration measurement type (e.g., "time")
`tags`	String	No	Comma-separated tags
`workout_type`	String	No	Workout category (interval, LSD, recovery, tempo)
`session_id`	String	No	Session identifier
`rtss`	Float	No	Running Training Stress Score
`raw_xml`	String	Yes	Complete ZWO XML content
`file_size`	Integer	Yes	XML file size in bytes
`is_public`	Boolean	Yes	Public visibility flag (true = visible to all)
`created_by`	ObjectId	No	User ID of creator (null for grandfathered data)
`uploaded_at`	DateTime	Yes	Upload timestamp
`last_modified`	DateTime	Yes	Last modification timestamp

9. background_jobs

Purpose: Tracks asynchronous background tasks (file processing, analysis).

Document Count: 13 Indexes: _id_ only

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`job_id`	String	Yes	UUID job identifier
`job_type`	String	Yes	Job category (e.g., "file_processing")
`runner_id`	ObjectId	No	Associated runner
`parameters`	Object	Yes	Job input parameters
`status`	String	Yes	Current status
`progress`	Integer	Yes	Completion percentage (0-100)
`result`	Object	No	Job output results
`error`	String	No	Error details if failed
`created_at`	DateTime	Yes	Job creation time
`started_at`	DateTime	No	Processing start time
`completed_at`	DateTime	No	Completion time
`updated_at`	DateTime	Yes	Last update time

Status Values:

pending - Queued for processing
running - Currently executing
completed - Finished successfully
failed - Error occurred
cancelled - User cancelled

Job Type Values:

file_processing - FIT file upload and processing

Sample Document:

{
  "_id": ObjectId("68c3f9112a04875c129365d5"),
  "job_id": "fbc90b6c-2443-4a1d-b4b1-83d5152312a8",
  "job_type": "file_processing",
  "runner_id": ObjectId("68c3f9062a04875c129365d4"),
  "parameters": {
    "temp_files": ["/app/data/uploads/temp_20250912_104224_941199_WorkoutFileExport.zip"],
    "file_count": 1
  },
  "status": "completed",
  "progress": 100,
  "result": {
    "total_files": 1,
    "processed_fit_files": 188,
    "processed_parquet_files": 0,
    "skipped_duplicates": 0,
    "failed_files": 0,
    "errors": [],
    "session_ids": ["68c3f9122a04875c129365d6", "..."]
  },
  "error": null,
  "created_at": "2025-09-12T10:42:25.039000",
  "started_at": null,
  "completed_at": "2025-09-12T10:47:02.465000",
  "updated_at": "2025-09-12T10:47:02.465000"
}

10. fs.files (GridFS)

Purpose: GridFS metadata for storing large binary files (FIT files, Parquet files).

Document Count: 4,260 Indexes: _id_, filename_1_uploadDate_1

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`filename`	String	Yes	Original filename
`contentType`	String	Yes	MIME type
`chunkSize`	Integer	Yes	Chunk size (default 255KB)
`length`	Integer	Yes	File size in bytes
`uploadDate`	DateTime	Yes	Upload timestamp
`metadata`	Object	Yes	File ownership and linkage information (added Oct 2025)

Metadata Schema:

Since October 2025, all GridFS files include a metadata field for direct file ownership tracking and linkage verification.

For FIT files:

{
  "runner_id": "68c3f9062a04875c129365d4",  // Links to runner._id (as string)
  "file_type": "fit",                      // Identifies file type
  "session_id": "68c3f9122a04875c129365d6", // Links to sessionsFIT._id (as string)
  "session_date": "2024-11-26T11:00:48",   // Session start time
  "uploaded_at": "2025-09-12T10:42:27.047000" // Upload timestamp
}

For Parquet files:

{
  "runner_id": "68c3f9062a04875c129365d4",  // Links to runner._id (as string)
  "file_type": "parquet",                  // Identifies file type
  "fit_session_id": "68c3f9122a04875c129365d6",  // Links to sessionsFIT._id
  "parquet_session_id": "68c3f9132a04875c129365da", // Links to sessionsParquet._id
  "created_at": "2025-09-12T10:42:27.047000" // Creation timestamp
}

Sample Documents:

FIT file:

{
  "_id": ObjectId("68c3f9122a04875c129365d7"),
  "filename": "tp-1581646.2024-11-26-11-58-13-283Z.GarminPing.AAAAAGdFt9Tjz2sX.FIT",
  "contentType": "application/octet-stream",
  "chunkSize": 261120,
  "length": 411633,
  "uploadDate": "2025-09-12T10:42:27.047000",
  "metadata": {
    "runner_id": "68c3f9062a04875c129365d4",
    "file_type": "fit",
    "session_id": "68c3f9122a04875c129365d6",
    "session_date": "2024-11-26T11:00:48",
    "uploaded_at": "2025-09-12T10:42:27.047000"
  }
}

Parquet file:

{
  "_id": ObjectId("68c3f9132a04875c129365da"),
  "filename": "tp-1581646.2024-11-26-11-58-13-283Z.GarminPing.AAAAAGdFt9Tjz2sX_record_mesgs.parquet",
  "contentType": "application/octet-stream",
  "chunkSize": 261120,
  "length": 216719,
  "uploadDate": "2025-09-12T10:42:27.047000",
  "metadata": {
    "runner_id": "68c3f9062a04875c129365d4",
    "file_type": "parquet",
    "fit_session_id": "68c3f9122a04875c129365d6",
    "parquet_session_id": "68c3f9132a04875c129365da",
    "created_at": "2025-09-12T10:42:27.047000"
  }
}

Benefits of Metadata:

Direct file ownership queries without joining through sessions
Orphaned file detection (files with no corresponding session records)
Redundancy protection if sessionsFIT/sessionsParquet get corrupted
Simplified runner deletion (can find all files directly)

Migration Status:

Retroactive migration completed: Oct 21, 2025
Files migrated: 4,218 files (2,109 FIT + 2,109 Parquet)
Orphaned files found: 42 files (no session records)
Future uploads: All new uploads automatically include metadata

11. fs.chunks (GridFS)

Purpose: GridFS chunks storing actual binary data for large files.

Document Count: 1,644 Indexes: _id_, files_id_1_n_1

Fields:

Field	Type	Required	Description
`_id`	ObjectId	Yes	Unique identifier
`files_id`	ObjectId	Yes	Reference to fs.files
`n`	Integer	Yes	Chunk sequence number
`data`	Binary	Yes	Binary chunk data

Sample Document:

{
  "_id": ObjectId("68c3f9122a04875c129365d8"),
  "files_id": ObjectId("68c3f9122a04875c129365d7"),
  "n": 0,
  "data": BinData(0, "...")  // Binary data chunk
}

Relationships Diagram

runner (1) ──────< (many) sessionsFIT
  │                         │
  │                         └──> fs.files (GridFS)
  │
  ├──────< (many) sessionsParquet
  │                         │
  │                         └──> fs.files (GridFS)
  │
  ├──────< (many) sessionStatistics
  │                         │
  │                         └──> sessionsParquet (reference)
  │
  ├──────< (many) runnerHistory
  │
  └──────< (many) trainingplan (by runnerID string)
                    │
                    └──> workouts (via session_code)

trainingplanschema ──> Used to generate trainingplan documents

background_jobs ──> runner (optional)
                 ──> fs.files (for file processing jobs)

Data Flow

1. Session Upload Flow

User uploads FIT file
    ↓
Store in GridFS (fs.files, fs.chunks)
    ↓
Create sessionsFIT record with metadata
    ↓
Background job processes FIT file
    ↓
Extract records → Store as Parquet in GridFS
    ↓
Create sessionsParquet record with metadata
    ↓
Calculate statistics → Create sessionStatistics record
    ↓
Update runnerHistory with new training load
    ↓
Update runner training load metrics

2. Training Plan Creation Flow

User selects trainingplanschema template
    ↓
System reads runner configuration (blocks, weeks_per_block, training_days, etc.)
    ↓
Generate individual day documents for entire plan duration
    ↓
For each day:
  - Calculate projected training load metrics
  - Assign workouts based on schema
  - Set is_training, recovery flags
    ↓
Insert ~100+ trainingplan documents (one per day)
    ↓
Display calendar view by querying trainingplanID

3. Personal Best Update Flow

User completes workout session
    ↓
Upload and process session (sessionsFIT → sessionsParquet → sessionStatistics)
    ↓
Calculate fastest splits across multiple distances
    ↓
Compare with runner personal best records
    ↓
If new PB: Update runner document
    ↓
If sufficient PBs exist: Recalculate performance metrics
    ↓
Push new entry to runner performance history array

Performance Considerations

Current Indexes

Actual production indexes:

runner: _id_, runnerID_idx, marathon_date_idx
trainingplan: _id_, date_idx, training_days_idx, week_idx, session_code_idx, date_training_idx
fs.chunks: _id_, files_id_1_n_1
fs.files: _id_, filename_1_uploadDate_1
All other collections: _id_ only

Performance Indexes

Query performance indexes used in the system:

// Query sessions by runner
db.sessionsFIT.createIndex({ runner_id: 1, uploaded_at: -1 })
db.sessionsParquet.createIndex({ runner_id: 1, created_at: -1 })
db.sessionStatistics.createIndex({ runner_id: 1, processed_timestamp: -1 })

// Session statistics unique constraint
db.sessionStatistics.createIndex({ session_id: 1 }, { unique: true })

// Time series queries on runner history
db.runnerHistory.createIndex({ runner_id: 1, date: -1 })

// Background job queries
db.background_jobs.createIndex({ runner_id: 1, status: 1 })
db.background_jobs.createIndex({ created_at: -1 })

Query Patterns

Most common queries:

Get all sessions for runner: db.sessionsFIT.find({ runner_id: ObjectId(...) })
Get runner history: db.runnerHistory.find({ runner_id: ObjectId(...) }).sort({ date: -1 })
Get training plan days: db.trainingplan.find({ trainingplanID: "uuid" }).sort({ date: 1 })
Get session statistics: db.sessionStatistics.findOne({ session_id: ObjectId(...) })

Validation and Constraints

Business Logic Constraints

runner.marathon_date contains target race date
trainingplan.date aligns with trainingplanID date range
sessionStatistics.session_id is unique (one stats doc per session)
runnerHistory.date is unique per runner (one history doc per day per runner)
background_jobs.progress ranges from 0-100

Data Type Constraints

ObjectId: Valid 24-character hex string
Dates: ISO 8601 format (YYYY-MM-DDTHH:MM:SS)
Numbers: Within logical ranges (HR 0-220, speeds > 0, percentages 0-100)

Referential Integrity

When deleting a runner:

Should cascade delete: sessionsFIT, sessionsParquet, sessionStatistics, runnerHistory, background_jobs
Should handle trainingplan documents (query by runnerID string)
Should delete associated GridFS files

When deleting a workout from library:

Check if referenced by trainingplanschema
Prevent deletion or nullify references

Schema Version History

Version 2.2 (Current)

Training plan versioning support with parent-child relationships
Enhanced query patterns for version filtering
Support for mid-plan regeneration with reason tracking

Version 2.1 (Time Series Migration)

Added time series structures for heart rate zones, PBs, and critical speed
Legacy fields marked for migration
New standardized time series schema

Version 2.0 (2025-10-02)

Validated against production database
Updated runner schema to match actual 38 fields
Corrected trainingplan to flat structure (one doc per day)
Fixed sessionsFIT/sessionsParquet metadata organization
Updated background_jobs field names (parameters/result)
Removed non-existent metadata from fs.files
Added runnerHistory additional fields
Documented current database indexes

Version 1.0 (Initial)

Created from code analysis (had discrepancies with actual DB)

Time Series Schema

All time series data follows a consistent structure to track changes over time.

Generic Time Series Entry Structure

{
  "timestamp": ISODate("2025-01-01T00:00:00.000Z"),  // When this value is effective
  "value": <any>,                                      // The actual value (can be object)
  "source": "manual|calculated|imported|session",      // How this value was obtained
  "notes": "Optional notes about this entry"           // Optional context
}

Heart Rate Zones Time Series

{
  "heart_rate_zones_history": [
    {
      "timestamp": ISODate("2025-01-01T00:00:00.000Z"),
      "value": {
        "hr_zone_1_max": 125,
        "hr_zone_2_max": 146,
        "hr_zone_3_max": 167,
        "hr_zone_4_max": 188,
        "hr_zone_5_max": 209
      },
      "source": "age_calculated",
      "notes": "Calculated from age using HUNT formula"
    }
  ]
}

Personal Bests Time Series

{
  "personal_bests_history": {
    "400m": [
      {
        "timestamp": ISODate("2025-01-15T00:00:00.000Z"),
        "value": {
          "distance": "400m",
          "seconds": 65.5,
          "session_id": "optional_session_reference"
        },
        "source": "session",
        "notes": null
      }
    ],
    "800m": [...],
    "5000m": [...]
  }
}

Critical Speed Time Series

{
  "critical_speed_time_series": [
    {
      "timestamp": ISODate("2025-01-15T00:00:00.000Z"),
      "value": {
        "critical_speed": 3.189,        // m/s
        "w_prime": 285.2,               // anaerobic capacity (meters)
        "r_squared": 0.9986,            // regression fit quality
        "pb_400m_seconds": 65.5,        // PBs used for calculation
        "pb_800m_seconds": 131.2,
        "pb_5000m_seconds": 1480
      },
      "source": "calculated",
      "notes": null
    }
  ]
}

Time Series Query Functions

The application provides utility functions for time series operations:

get_value_at_date() - Get the value that was active at a specific date
get_latest_value() - Get the most recent value
add_value() - Add a new value to the time series
get_value_history() - Get historical values within a date range

Architecture - System architecture and data flow
Platform Overview - Application features and API documentation
MongoDB Setup - MongoDB connection troubleshooting
Troubleshooting - Common issues and solutions

Schema Version: 2.2 Last Validated: 2025-11-09 Database: Production MongoDB deployment Validation Method: Direct MongoDB connection and document analysis Total Collections: 11 Total Documents: Production scale deployment

Overview​

Collections Summary​

Collection Schemas​

1. runner​

2. sessionsFIT​

3. sessionsParquet​

4. sessionStatistics​

5. runnerHistory​

6. trainingplan​

7. trainingplanschema​

8. workouts​

9. background_jobs​

10. fs.files (GridFS)​

11. fs.chunks (GridFS)​

Relationships Diagram​

Data Flow​

1. Session Upload Flow​

2. Training Plan Creation Flow​

3. Personal Best Update Flow​

Performance Considerations​

Current Indexes​

Performance Indexes​

Query Patterns​

Validation and Constraints​

Business Logic Constraints​

Data Type Constraints​

Referential Integrity​

Schema Version History​

Version 2.2 (Current)​

Version 2.1 (Time Series Migration)​

Version 2.0 (2025-10-02)​

Version 1.0 (Initial)​

Time Series Schema​

Generic Time Series Entry Structure​

Heart Rate Zones Time Series​

Personal Bests Time Series​

Critical Speed Time Series​

Time Series Query Functions​

Related Documentation​

Overview

Collections Summary

Collection Schemas

1. runner

2. sessionsFIT

3. sessionsParquet

4. sessionStatistics

5. runnerHistory

6. trainingplan

7. trainingplanschema

8. workouts

9. background_jobs

10. fs.files (GridFS)

11. fs.chunks (GridFS)

Relationships Diagram

Data Flow

1. Session Upload Flow

2. Training Plan Creation Flow

3. Personal Best Update Flow

Performance Considerations

Current Indexes

Performance Indexes

Query Patterns

Validation and Constraints

Business Logic Constraints

Data Type Constraints

Referential Integrity

Schema Version History

Version 2.2 (Current)

Version 2.1 (Time Series Migration)

Version 2.0 (2025-10-02)

Version 1.0 (Initial)

Time Series Schema

Generic Time Series Entry Structure

Heart Rate Zones Time Series

Personal Bests Time Series

Critical Speed Time Series

Time Series Query Functions

Related Documentation