Managing Trained Species
The funannotate2 species command allows you to view and manage trained species models in the funannotate2 database. These trained models are used by the predict command to improve gene prediction accuracy.
Basic Usage
To list all trained species in the database:
funannotate2 species
Optional Arguments
-l, --load: Load a new species with a *.params.json file-d, --delete: Delete a species from database-f, --format: Format to show existing species in (default: table)
Viewing Trained Species
By default, the species command lists all trained species in the database in a tabular format:
funannotate2 species
You can change the output format to JSON or YAML:
funannotate2 species -f json
funannotate2 species -f yaml
Loading a New Species
After training a species using the train command, you can add it to the database:
funannotate2 species -l /path/to/params.json
The params.json file is typically found in the output directory of the train command:
funannotate2 species -l train_results/params.json
Deleting a Species
You can remove a species from the database:
funannotate2 species -d species_name
For example:
funannotate2 species -d aspergillus_fumigatus
How Species Models are Used
When you run the predict command, you can specify a trained species model:
funannotate2 predict -f genome.fasta -o predict_results -p species_name -s "Species name"
For example:
funannotate2 predict -f genome.fasta -o predict_results -p aspergillus_fumigatus -s "Aspergillus fumigatus"
The trained species model provides parameters for ab initio gene predictors:
Augustus: Species-specific parameters for splice sites, start/stop codons, etc.
SNAP: Species-specific HMM parameters
GlimmerHMM: Species-specific parameters for gene structure
Using a species model that is closely related to your target organism can significantly improve gene prediction accuracy.
Pretrained Species
Funannotate2 comes with several pretrained species models for common organisms. You can see the list of available pretrained species with the species command.
If your organism is not in the list, you can:
Use a closely related species model
Train a new model using the
traincommandLoad the new model into the database using the
species -lcommand
Species Model Storage
Species models are stored in the funannotate2 database directory, which is specified by the $FUNANNOTATE2_DB environment variable. Each species model includes:
Parameters for Augustus
Parameters for SNAP
Parameters for GlimmerHMM
Metadata about the training process
The models are stored in a structured format that allows them to be easily loaded and used by the predict command.