Help Center

All Help Topics

Machine Learning

Random Forest Algorithm

Ensemble learning method that builds multiple decision trees and combines their predictions.

XGBoost Algorithm

Gradient boosting framework known for high performance and accuracy in predictive modeling.

Feature Selection Process

Process of selecting a subset of relevant features for model construction.

Isolation Forest Algorithm

Unsupervised anomaly detection algorithm that identifies outliers by isolating observations in random decision trees.

Cross-Validation Method

Technique to evaluate machine learning models by splitting data into training and validation sets multiple times.

Gradient Boosting Algorithm

Ensemble machine learning technique that builds models sequentially to correct previous errors.

Logistic Regression Algorithm

Statistical model for binary classification that predicts the probability of an outcome.

SVM (Support Vector Machine) Algorithm

Classification algorithm that finds the optimal hyperplane to separate classes with maximum margin.

KNN (K-Nearest Neighbors) Algorithm

Classification algorithm that predicts based on the majority class of the K closest training examples.

Neural Network Algorithm

Machine learning model inspired by the brain, using interconnected layers of nodes to learn patterns.

ML Pipeline Process

Automated workflow for building, training, and evaluating machine learning models.

Target Variable Metric

The outcome variable that a machine learning model is trained to predict.

Task Type (ML) Metric

The type of machine learning problem: Classification (categories) or Regression (continuous values).

F1 Score Metric

Harmonic mean of precision and recall, balancing both for classification evaluation.

Accuracy Metric

Percentage of correct predictions out of all predictions made.

Precision Metric

Of all positive predictions, what proportion was actually positive.

Recall Metric

Of all actual positives, what proportion was correctly identified.

MAE (Mean Absolute Error) Abbreviation

Average of absolute differences between predictions and actual values.

MSE (Mean Squared Error) Abbreviation

Average of squared differences between predictions and actual values.

RMSE (Root Mean Squared Error) Abbreviation

Square root of MSE, in the same units as the target variable.

ROC-AUC Metric

Area Under the ROC Curve, measuring classifier performance across all thresholds.

Confusion Matrix Metric

Table showing counts of true positives, false positives, true negatives, and false negatives.

Feature Importance Metric

Ranking of how much each feature contributes to model predictions.

Hyperparameter Tuning Process

Process of finding optimal algorithm settings to maximize model performance.

Grid Search Method

Exhaustive search over specified parameter combinations to find optimal settings.

Random Search Method

Random sampling of hyperparameter combinations, often faster than grid search.

Omics Data

Metabolomics Data type

Study of small molecules (metabolites) within cells, biofluids, tissues, or organisms.

HMDB ID Identifier

Unique identifier from the Human Metabolome Database for metabolites.

LOD (Limit of Detection) Metric

The lowest quantity of a substance that can be distinguished from the absence of that substance.

Biofluid Data type

Biological fluid sample collected for analysis (e.g., blood, urine, plasma).

Date of Measurement Data type

The date when the sample was measured or analyzed in the laboratory.

Proteomics Data type

Large-scale study of proteins, particularly their structures and functions.

Experimental Method Data type

The analytical technique or procedure used to measure or analyze samples in the laboratory.

UniProt ID Identifier

Unique identifier from the UniProt database for proteins.

Abundance Metric

Relative quantity or concentration of a molecule (protein, metabolite, etc.) measured in omics experiments.

Fold Change Metric

Ratio of molecular abundance between two conditions, indicating up or down regulation.

Sequencing Technology Technology

High-throughput DNA sequencing methods used to determine the order of nucleotides in genetic material.

Alpha Diversity Metric

Measure of microbial diversity within a single sample or community.

Sample Type Identifier

Classification of biological samples based on their source, collection method, or biological matrix.

Metagenomics Data type

Genomic analysis of microbial communities directly from environmental or clinical samples.

Microbiome Data type

The community of microorganisms (bacteria, viruses, fungi) living in a specific environment like the gut.

Microbial Diversity Metric

Measure of the variety and abundance of different microbial species in a sample.

16S rRNA Sequencing Technology

Targeted sequencing method for identifying and classifying bacteria based on the 16S ribosomal RNA gene.

WGS (Whole Genome Sequencing) Abbreviation

Complete sequencing of an organism's entire genome, providing comprehensive genetic information.

Instrument (Sequencing) Technology

The specific sequencing hardware used to generate genomic data.