Analysis Guide
This guide provides detailed workflows for running phylogenetic analyses in PhyloForester.
Overview
PhyloForester supports three main approaches to phylogenetic tree reconstruction:
Parsimony: Finds trees minimizing character state changes
Maximum Likelihood: Finds trees maximizing probability of observed data
Bayesian Inference: Estimates posterior probability distribution of trees
Each method has strengths and is suited for different datasets and research questions.
Choosing an Analysis Method
Parsimony
Best for:
Morphological data
Small to medium datasets (<100 taxa)
Discrete character states
Pedagogical purposes
Advantages:
Fast computation
No model assumptions
Easy to interpret
Limitations:
Assumes equal rates across branches
Can be inconsistent with long branches
No statistical framework for support
Maximum Likelihood
Best for:
Molecular data (DNA/protein sequences)
Large datasets (100+ taxa)
Model-based inference
Advantages:
Statistical framework
Model flexibility
Bootstrap support values
Limitations:
Computationally intensive
Requires model selection
Less suited for morphology
Bayesian Inference
Best for:
Complex evolutionary models
Integrating prior information
Uncertainty quantification
Advantages:
Full probabilistic framework
Posterior probabilities
Handles complex models well
Limitations:
Very computationally intensive
Convergence assessment required
Prior specification needed
Parsimony Workflow
Step 1: Prepare Data
Ensure your datamatrix:
Has clear character definitions
Minimal missing data (
?)Proper inapplicable coding (
-)
Step 2: Create Parsimony Analysis
Right-click datamatrix → New Analysis → Parsimony
Configure parameters:
Replicates: 100 Hold: 1000 TBR: Yes (tree bisection-reconnection) Mult: 10 (random addition sequences)
Step 3: Run Analysis
Click Start Analysis
TNT performs heuristic search
Progress shown as percentage
Typical runtime: seconds to minutes
Step 4: Examine Results
Check the Log tab for:
Number of trees found
Tree length (total character changes)
Consistency Index (CI)
Retention Index (RI)
Trees Tab:
Strict consensus tree shown
Bootstrap values (if requested)
Branch lengths (steps)
Maximum Likelihood Workflow
Step 1: Prepare Data
For DNA sequences:
Aligned sequences required
IUPAC ambiguity codes supported
Gap coding as missing (
?) or 5th state
Step 2: Model Selection
IQTree can auto-detect best model:
Enable Auto-detect model
IQTree tests all standard models
Best model selected by AIC/BIC
Or specify model manually:
JC69: Jukes-Cantor (equal rates)
K80/K2P: Kimura 2-parameter (transitions/transversions)
HKY: Hasegawa-Kishino-Yano
GTR: General time reversible (most parameters)
Step 3: Create ML Analysis
Right-click datamatrix → New Analysis → Maximum Likelihood
Configure:
Model: Auto-detect Bootstrap: 1000 Algorithm: Standard
Step 4: Run Analysis
Click Start Analysis
Model testing phase (if auto-detect)
Tree search phase
Bootstrap phase (if enabled)
Typical runtime: minutes to hours
Step 5: Interpret Results
Best Tree:
Maximum likelihood tree topology
Branch lengths (substitutions/site)
Log-likelihood score
Bootstrap Support:
Values 0-100 on nodes
≥70 generally considered significant
≥95 strong support
Bayesian Workflow
Step 1: Prepare Data
Similar to ML, but Bayesian is more flexible:
Can handle complex partitions
Morphology + molecules combined
Clock models for dating
Step 2: Set Priors
Substitution Model:
Often use GTR+Γ for DNA
Mk model for morphology
Tree Prior:
Uniform (default)
Birth-death process
Yule model
Branch Length Prior:
Exponential distribution
Compound Dirichlet
Step 3: Configure MCMC
Right-click datamatrix → New Analysis → Bayesian
Set parameters:
Generations: 1,000,000 Sample frequency: 1000 Burnin: 0.25 (25%) Chains: 4 (2 heated)
Short run for testing:
Generations: 100,000
Sample: 100
Burnin: 0.25
Standard run:
Generations: 10,000,000
Sample: 1000
Burnin: 0.25
Step 4: Run Analysis
Click Start Analysis
MrBayes runs MCMC chains
Monitor: - Average standard deviation of split frequencies (should approach 0.01) - Potential Scale Reduction Factor (should approach 1.0)
Typical runtime: hours to days
Step 5: Assess Convergence
Check the Log for:
ASDSF < 0.01: Chains converged
ESS > 200: Sufficient sampling
Stable log-likelihood traces
If not converged:
Run more generations
Increase sample frequency
Simplify model
Step 6: Examine Posterior
Consensus Tree:
50% majority rule consensus
Posterior probabilities on nodes
≥0.95 generally considered strong support
Credible Sets:
95% credible set of trees
Topology uncertainty quantified
Character Mapping
After obtaining trees, map characters to visualize evolution.
Fitch Parsimony Mapping
PhyloForester uses Fitch’s algorithm for ancestral state reconstruction.
Open analysis with trees
Select a tree in Trees tab
Click Map Character
Select character from list
Tree shows: - Ancestral states at nodes - State changes on branches (synapomorphies) - Colored by state
Interpreting Mapped Trees
Node labels: Reconstructed ancestral states
Branch annotations: Character changes
Colors: Different states
Ambiguous: Multiple optimal reconstructions shown
Use cases:
Identify evolutionary transitions
Locate homoplasy (parallel/convergent evolution)
Support morphological hypotheses
Comparing Analyses
It’s valuable to compare results across methods.
Topology Comparison
Run multiple analysis types on same datamatrix
Compare tree topologies visually
Note areas of agreement/disagreement
Key questions:
Do methods agree on major clades?
Where do topologies differ?
Are differences in weakly supported regions?
Support Value Comparison
Parsimony: Bootstrap (if run)
ML: Bootstrap percentages
Bayesian: Posterior probabilities
Generally:
Bayesian PP ≥ 0.95 ≈ ML bootstrap ≥ 70%
Bayesian tends to give higher values
ML bootstrap more conservative
Troubleshooting Analyses
Analysis Won’t Start
Check:
External software path set correctly (Preferences)
Software executable has permissions
Datamatrix not empty
No special characters in names
Analysis Fails Immediately
Check:
Log tab for error messages
Datamatrix format correct
Missing data not excessive
Character definitions valid
Analysis Runs Forever
For Bayesian:
May take days - check convergence diagnostics
Consider reducing generations for testing
For ML:
Large datasets take time
Consider reducing bootstrap replicates temporarily
For Parsimony:
Usually fast; if slow, reduce Hold parameter
Poor Support Values
Common reasons:
Insufficient data
Conflicting signal
Model misspecification
Need more bootstrap replicates
Solutions:
Add more characters/taxa
Try different models
Increase replicates
Partition data
Best Practices
Data Preparation
Carefully define characters
Minimize missing data
Check for typos in taxon names
Validate alignment (for sequences)
Parameter Selection
Start with default/recommended values
Do quick test runs first
Increase rigor for final analyses
Document all parameters used
Quality Control
Always check log files
Verify convergence (Bayesian)
Compare multiple runs
Examine support values critically
Publication
When publishing, report:
Software versions
All parameter settings
Run statistics (length, likelihood, etc.)
Support measures
Convergence diagnostics (Bayesian)
Next Steps
See User Guide for general PhyloForester usage
See Troubleshooting for specific issues
See Developer Guide for advanced customization