Help Center
All Help Topics
Clinical Data
CBC (Complete Blood Count) Abbreviation
Common blood test that measures various components of blood including red blood cells, white blood cells, and platelets.
HCT (Hematocrit) Abbreviation
The percentage of red blood cells by volume in your blood.
HB (Hemoglobin) Abbreviation
The protein in red blood cells that carries oxygen throughout your body.
PLT (Platelet Count) Abbreviation
Number of platelets in your blood, which help with clotting.
RBC (Red Blood Cell Count) Abbreviation
Number of red blood cells in your blood, which carry oxygen.
WBC (White Blood Cell Count) Abbreviation
Number of white blood cells in your blood, which fight infection.
Reference Range Metric
The expected range of values for a healthy population, used to interpret test results.
CBC Units of Measurement Metric
Standard units used to report Complete Blood Count (CBC) test results.
Reference Range Status Metric
Status flags indicating whether a test result falls within, above, or below the expected reference range.
HLQ (Health and Lifestyle Questionnaire) Abbreviation
Comprehensive questionnaire collecting health and lifestyle information from study participants.
Data Management
Bulk Upload Process
Feature for uploading multiple records or files at once.
CSV Format Data type
Comma-Separated Values file format used for data import and export.
Metadata Data type
Data about data - descriptive information about datasets and samples.
Data Validation Process
Process of checking uploaded data for errors and consistency.
Data Dictionary Process
Catalog of data elements with their definitions, formats, and relationships.
Export Process
Feature to download data from CMMI-DCC for external analysis.
Downloads Ui component
Access to pre-generated and previously exported data files.
Master Submissions Ui component
Central record of all participant data submissions across studies.
Data Processing
Standard Scaling Method
Normalization technique that transforms features to have mean=0 and standard deviation=1.
MinMax Scaling Method
Normalization technique that transforms features to a fixed range, typically [0, 1].
Iterative Imputer Method
Advanced missing value imputation method that models each feature with missing values as a function of other features.
Mean Imputation Method
Simple missing value imputation method that replaces missing values with the mean of the feature.
Robust Scaling Method
Normalization technique using median and quartiles, robust to outliers.
Median Imputation Method
Missing value imputation using the median, more robust to outliers than mean imputation.
Mode Imputation Method
Missing value imputation using the most frequent value, typically for categorical data.
Outlier Detection Process
Process of identifying data points that differ significantly from the majority of observations.
IQR Method Method
Statistical method for outlier detection using the Interquartile Range.
Z-Score Method Method
Outlier detection method based on standard deviations from the mean.
Machine Learning
Random Forest Algorithm
Ensemble learning method that builds multiple decision trees and combines their predictions.
XGBoost Algorithm
Gradient boosting framework known for high performance and accuracy in predictive modeling.
Feature Selection Process
Process of selecting a subset of relevant features for model construction.
Isolation Forest Algorithm
Unsupervised anomaly detection algorithm that identifies outliers by isolating observations in random decision trees.
Cross-Validation Method
Technique to evaluate machine learning models by splitting data into training and validation sets multiple times.
Gradient Boosting Algorithm
Ensemble machine learning technique that builds models sequentially to correct previous errors.
Logistic Regression Algorithm
Statistical model for binary classification that predicts the probability of an outcome.
SVM (Support Vector Machine) Algorithm
Classification algorithm that finds the optimal hyperplane to separate classes with maximum margin.
KNN (K-Nearest Neighbors) Algorithm
Classification algorithm that predicts based on the majority class of the K closest training examples.
Neural Network Algorithm
Machine learning model inspired by the brain, using interconnected layers of nodes to learn patterns.
ML Pipeline Process
Automated workflow for building, training, and evaluating machine learning models.
Target Variable Metric
The outcome variable that a machine learning model is trained to predict.
Task Type (ML) Metric
The type of machine learning problem: Classification (categories) or Regression (continuous values).
F1 Score Metric
Harmonic mean of precision and recall, balancing both for classification evaluation.
Accuracy Metric
Percentage of correct predictions out of all predictions made.
Precision Metric
Of all positive predictions, what proportion was actually positive.
Recall Metric
Of all actual positives, what proportion was correctly identified.
MAE (Mean Absolute Error) Abbreviation
Average of absolute differences between predictions and actual values.
MSE (Mean Squared Error) Abbreviation
Average of squared differences between predictions and actual values.
RMSE (Root Mean Squared Error) Abbreviation
Square root of MSE, in the same units as the target variable.
ROC-AUC Metric
Area Under the ROC Curve, measuring classifier performance across all thresholds.
Confusion Matrix Metric
Table showing counts of true positives, false positives, true negatives, and false negatives.
Feature Importance Metric
Ranking of how much each feature contributes to model predictions.
Hyperparameter Tuning Process
Process of finding optimal algorithm settings to maximize model performance.
Grid Search Method
Exhaustive search over specified parameter combinations to find optimal settings.
Random Search Method
Random sampling of hyperparameter combinations, often faster than grid search.
Navigation
Omics Data
Metabolomics Data type
Study of small molecules (metabolites) within cells, biofluids, tissues, or organisms.
HMDB ID Identifier
Unique identifier from the Human Metabolome Database for metabolites.
LOD (Limit of Detection) Metric
The lowest quantity of a substance that can be distinguished from the absence of that substance.
Biofluid Data type
Biological fluid sample collected for analysis (e.g., blood, urine, plasma).
Date of Measurement Data type
The date when the sample was measured or analyzed in the laboratory.
Proteomics Data type
Large-scale study of proteins, particularly their structures and functions.
Experimental Method Data type
The analytical technique or procedure used to measure or analyze samples in the laboratory.
UniProt ID Identifier
Unique identifier from the UniProt database for proteins.
Abundance Metric
Relative quantity or concentration of a molecule (protein, metabolite, etc.) measured in omics experiments.
Fold Change Metric
Ratio of molecular abundance between two conditions, indicating up or down regulation.
Sequencing Technology Technology
High-throughput DNA sequencing methods used to determine the order of nucleotides in genetic material.
Alpha Diversity Metric
Measure of microbial diversity within a single sample or community.
Sample Type Identifier
Classification of biological samples based on their source, collection method, or biological matrix.
Metagenomics Data type
Genomic analysis of microbial communities directly from environmental or clinical samples.
Microbiome Data type
The community of microorganisms (bacteria, viruses, fungi) living in a specific environment like the gut.
Microbial Diversity Metric
Measure of the variety and abundance of different microbial species in a sample.
16S rRNA Sequencing Technology
Targeted sequencing method for identifying and classifying bacteria based on the 16S ribosomal RNA gene.
WGS (Whole Genome Sequencing) Abbreviation
Complete sequencing of an organism's entire genome, providing comprehensive genetic information.
Instrument (Sequencing) Technology
The specific sequencing hardware used to generate genomic data.
Project & Study
CMMI-DCC Abbreviation
Canadian Microbiome Mapping Initiative - Data Coordination Centre, the central hub for CMMI data.
CMMI ID Identifier
Unique participant identifier used across all CMMI studies to link data from multiple sources.
Study ID Identifier
Study-specific identifier for organizing and grouping participant data.
Cohort Data type
A group of participants sharing common characteristics for research purposes.
BCGP (British Columbia Generations Project) Abbreviation
A longitudinal health study from British Columbia providing clinical and omics data for CMMI research.
Participant ID Identifier
Unique identifier assigned to each study participant, used to link data across different data types.
Sample ID Identifier
Unique identifier for biological samples collected from participants.
Statistical Analysis
Spearman Rank Correlation Statistical test
Non-parametric measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function.
P-Value Metric
Probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true.
Correlation Analysis Process
Statistical method to evaluate the strength and direction of the relationship between two variables.
T-Test Statistical test
Statistical test comparing means between two groups to determine if differences are significant.
ANOVA Statistical test
Analysis of Variance - compares means across three or more groups.
Chi-Square Test Statistical test
Test for association between categorical variables or goodness of fit.
Confidence Interval Metric
Range of values likely to contain the true population parameter.
Pearson Correlation Statistical test
Measure of linear relationship between two continuous variables.