🌐 English|한국어

Analysis Guide

This guide provides detailed workflows for running phylogenetic analyses in PhyloForester.

Overview

PhyloForester supports three main approaches to phylogenetic tree reconstruction:

Parsimony: Finds trees minimizing character state changes
Maximum Likelihood: Finds trees maximizing probability of observed data
Bayesian Inference: Estimates posterior probability distribution of trees

Each method has strengths and is suited for different datasets and research questions.

Choosing an Analysis Method

Parsimony

Best for:

Morphological data
Small to medium datasets (<100 taxa)
Discrete character states
Pedagogical purposes

Advantages:

Fast computation
No model assumptions
Easy to interpret

Limitations:

Assumes equal rates across branches
Can be inconsistent with long branches
No statistical framework for support

Maximum Likelihood

Best for:

Molecular data (DNA/protein sequences)
Large datasets (100+ taxa)
Model-based inference

Advantages:

Statistical framework
Model flexibility
Bootstrap support values

Limitations:

Computationally intensive
Requires model selection
Less suited for morphology

Bayesian Inference

Best for:

Complex evolutionary models
Integrating prior information
Uncertainty quantification

Advantages:

Full probabilistic framework
Posterior probabilities
Handles complex models well

Limitations:

Very computationally intensive
Convergence assessment required
Prior specification needed

Parsimony Workflow

Step 1: Prepare Data

Ensure your datamatrix:

Has clear character definitions
Minimal missing data (?)
Proper inapplicable coding (-)

Step 2: Create Parsimony Analysis

Right-click datamatrix → New Analysis → Parsimony

Configure parameters:

Replicates: 100
Hold: 1000
TBR: Yes (tree bisection-reconnection)
Mult: 10 (random addition sequences)

Step 3: Run Analysis

Click Start Analysis
TNT performs heuristic search
Progress shown as percentage
Typical runtime: seconds to minutes

Step 4: Examine Results

Check the Log tab for:

Number of trees found
Tree length (total character changes)
Consistency Index (CI)
Retention Index (RI)

Trees Tab:

Strict consensus tree shown
Bootstrap values (if requested)
Branch lengths (steps)

Maximum Likelihood Workflow

Step 1: Prepare Data

For DNA sequences:

Aligned sequences required
IUPAC ambiguity codes supported
Gap coding as missing (?) or 5th state

Step 2: Model Selection

IQTree can auto-detect best model:

Enable Auto-detect model
IQTree tests all standard models
Best model selected by AIC/BIC

Or specify model manually:

JC69: Jukes-Cantor (equal rates)
K80/K2P: Kimura 2-parameter (transitions/transversions)
HKY: Hasegawa-Kishino-Yano
GTR: General time reversible (most parameters)

Step 3: Create ML Analysis

Right-click datamatrix → New Analysis → Maximum Likelihood

Configure:

Model: Auto-detect
Bootstrap: 1000
Algorithm: Standard

Step 4: Run Analysis

Click Start Analysis
Model testing phase (if auto-detect)
Tree search phase
Bootstrap phase (if enabled)
Typical runtime: minutes to hours

Step 5: Interpret Results

Best Tree:

Maximum likelihood tree topology
Branch lengths (substitutions/site)
Log-likelihood score

Bootstrap Support:

Values 0-100 on nodes
≥70 generally considered significant
≥95 strong support

Bayesian Workflow

Step 1: Prepare Data

Similar to ML, but Bayesian is more flexible:

Can handle complex partitions
Morphology + molecules combined
Clock models for dating

Step 2: Set Priors

Substitution Model:

Often use GTR+Γ for DNA
Mk model for morphology

Tree Prior:

Uniform (default)
Birth-death process
Yule model

Branch Length Prior:

Exponential distribution
Compound Dirichlet

Step 3: Configure MCMC

Right-click datamatrix → New Analysis → Bayesian

Set parameters:

Generations: 1,000,000
Sample frequency: 1000
Burnin: 0.25 (25%)
Chains: 4 (2 heated)

Short run for testing:

Generations: 100,000
Sample: 100
Burnin: 0.25

Standard run:

Generations: 10,000,000
Sample: 1000
Burnin: 0.25

Step 4: Run Analysis

Click Start Analysis
MrBayes runs MCMC chains
Monitor: - Average standard deviation of split frequencies (should approach 0.01) - Potential Scale Reduction Factor (should approach 1.0)
Typical runtime: hours to days

Step 5: Assess Convergence

Check the Log for:

ASDSF < 0.01: Chains converged
ESS > 200: Sufficient sampling
Stable log-likelihood traces

If not converged:

Run more generations
Increase sample frequency
Simplify model

Step 6: Examine Posterior

Consensus Tree:

50% majority rule consensus
Posterior probabilities on nodes
≥0.95 generally considered strong support

Credible Sets:

95% credible set of trees
Topology uncertainty quantified

Character Mapping

After obtaining trees, map characters to visualize evolution.

Fitch Parsimony Mapping

PhyloForester uses Fitch’s algorithm for ancestral state reconstruction.

Open analysis with trees
Select a tree in Trees tab
Click Map Character
Select character from list
Tree shows: - Ancestral states at nodes - State changes on branches (synapomorphies) - Colored by state

Interpreting Mapped Trees

Node labels: Reconstructed ancestral states
Branch annotations: Character changes
Colors: Different states
Ambiguous: Multiple optimal reconstructions shown

Use cases:

Identify evolutionary transitions
Locate homoplasy (parallel/convergent evolution)
Support morphological hypotheses

Comparing Analyses

It’s valuable to compare results across methods.

Topology Comparison

Run multiple analysis types on same datamatrix
Compare tree topologies visually
Note areas of agreement/disagreement

Key questions:

Do methods agree on major clades?
Where do topologies differ?
Are differences in weakly supported regions?

Support Value Comparison

Parsimony: Bootstrap (if run)
ML: Bootstrap percentages
Bayesian: Posterior probabilities

Generally:

Bayesian PP ≥ 0.95 ≈ ML bootstrap ≥ 70%
Bayesian tends to give higher values
ML bootstrap more conservative

Troubleshooting Analyses

Analysis Won’t Start

Check:

External software path set correctly (Preferences)
Software executable has permissions
Datamatrix not empty
No special characters in names

Analysis Fails Immediately

Check:

Log tab for error messages
Datamatrix format correct
Missing data not excessive
Character definitions valid

Analysis Runs Forever

For Bayesian:

May take days - check convergence diagnostics
Consider reducing generations for testing

For ML:

Large datasets take time
Consider reducing bootstrap replicates temporarily

For Parsimony:

Usually fast; if slow, reduce Hold parameter

Poor Support Values

Common reasons:

Insufficient data
Conflicting signal
Model misspecification
Need more bootstrap replicates

Solutions:

Add more characters/taxa
Try different models
Increase replicates
Partition data

Best Practices

Data Preparation

Carefully define characters
Minimize missing data
Check for typos in taxon names
Validate alignment (for sequences)

Parameter Selection

Start with default/recommended values
Do quick test runs first
Increase rigor for final analyses
Document all parameters used

Quality Control

Always check log files
Verify convergence (Bayesian)
Compare multiple runs
Examine support values critically

Publication

When publishing, report:

Software versions
All parameter settings
Run statistics (length, likelihood, etc.)
Support measures
Convergence diagnostics (Bayesian)

Next Steps

See User Guide for general PhyloForester usage
See Troubleshooting for specific issues
See Developer Guide for advanced customization