来自bioBakery Lab的宏基因组学微生物群落的代谢功能分析工具-HUMAnN 3.0的安装配置及分析使用方法-安装填坑

HUMAnN 3.0 简介：

HUMAnN 3.0 是一个用于宏基因组数据分析的工具，能够从宏基因组测序数据中推断出微生物群落的代谢功能信息。它可以识别微生物群落中存在的代谢途径，并定量这些通路的丰度。HUMAnN 3.0 依赖于多个工具和数据库来实现这些功能，其中包括 MetaPhlAn 3、DIAMOND、UniRef90 等。

原网站：humann3 – The Huttenhower Lab (harvard.edu)

仓库地址：github.com

HUMAnN 3.0 安装步骤：

通过conda或mamba安装

1. 创建并激活一个新的环境（可选步骤）

首先，您可以创建一个新的环境来安装 HUMAnN 3。在这个环境中，您可以独立管理 HUMAnN 3 及其依赖项。

# 因通过conda安装humann会默认配置MetaPhlAn，所以这里环境名称就使用bioBakery了
conda create --name biobakery3 python=3.7
# 或
mamba create --name biobakery3 python=3.7

接下来，激活新创建的环境：

conda activate biobakery3
# 或
mamba activate biobakery3

设置conda chanel

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery

2. 安装 HUMAnN 3

现在，您可以使用 mamba 来安装 HUMAnN 3。请注意，HUMAnN 3 是作为 Python 包发布的，因此您可以直接通过 pip 或 mamba 安装。

conda install -c bioconda humann
#
mamba install -c bioconda humann

这将从 bioconda 频道安装 HUMAnN 3 及其相关依赖项。

根据报错手动安装依赖环境：

#报缺少bowtie2
mamba install bowtie2 -c bioconda# 报缺少diamond
mamba install diamond -c bioconda

使用mamba或conda安装humann3我这里安装失败了，尝试了通过bioconda安装依赖，MetaPhlAn还是安装不上，所以最终使用pypi方法安装成功，目前为3.8版本

使用以下代码从 PyPI 安装 HUMAnN 3.0：

# 官方建议方法：
pip install humann --no-binary :all:###自动下载humann包然后配置解压就行了，我这里安装成功

通过pip这样安装后会出现找不到MetaPhlAn的错误，所以还得自己再配置安装，不然后面运行的时候会出错：

CRITICAL ERROR: The metaphlan executable can not be found. Please check the install.

其实这个就是安装不完全的原因，在前面mamba或者conda设置chanels时没有生效，下面是正确的安装方式：

### 将所有需要的chanels全部加入，这样依赖才能解析完全。
mamba install humann -c biobakery -c bioconda -c conda-forge##### 真是醉了，连自己的bioBakery没有独立配置完整依赖，这个坑真的好大！！！！

3. 下载数据库

HUMAnN 3.0 使用了多个数据库，需要下载这些数据库文件：

先查看可用的数据库：

humann_databases --availableHUMAnN Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/DEMO_chocophlan.v201901_v31.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref50_annotated_v201901b_full.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901b_subset.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901b.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz### 为啥还是2019呢？ 停止更新了？？？？

下载指定数据库：

humann_databases --download chocophlan full $DIR_TO_STORE_DB humann_databases --download uniref uniref90_diamond $DIR_TO_STORE_DB
# 其中 $DIR_TO_STORE_DB 是你希望存储数据库文件的路径。humann_databases --download chocophlan full /path/to/databases --update-config yeshumann_databases --download uniref uniref90_diamond /path/to/databases --update-config yeshumann_databases --download utility_mapping full /path/to/databases --update-config yes

手动下载数据库，可用链接直接使用前上面的humann_databases中分别对应的链接，并解压到指定文件夹：

wget -c http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
wget -c http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
wget -c http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gzmkdir chocophlan_v296_201901
mkdir uniref90_v201901
mkdir mapping_v201901tar -zxvf full_chocophlan.v296_201901.tar.gz -C ./chocophlan_v296_201901/
tar -zxvf uniref90_annotated_v201901.tar.gz -C uniref90_v201901
tar -zxvf full_mapping_v201901.tar.gz -C ./mapping_v201901/

数据库设置，先查看已有设置情况：

# 查看已有数据库
humann_databases --list
### 命令不对。。。。。。。。。。### 还是直接查看数据目录吧。
# 默认数据库目录，当然前面如果自己有设定的话看已设定目录
/miniconda3/envs/biobakery3/lib/python3.7/site-packages/humann### 应该是 humann_config
humann_configHUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /path/to/databases/chocophlan_v296_201901
database_folders : protein = /path/to/databases/uniref90_v201901/
database_folders : utility_mapping = /path/to/databases/mapping_v201901/
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False############################################################
humann_config --help
usage: humann_config [-h] [--print] [--update <section> <name> <value>]HUMAnN Configurationoptional arguments:-h, --help            show this help message and exit--print               print the configuration--update <section> <name> <value>update the section : name to the value provided

已准备好的数据库切换设置

## 更新格式：humann_config --update <section> <name> <value>
humann_config --update database_folders nucleotide /path/to/databases/chocophlan_v296_201901
humann_config --update database_folders protein /path/to/databases/uniref90_v201901/
humann_config --update database_folders utility_mapping /path/to/databases/mapping_v201901/## 更新后查看设置
humann_config# 还可以自己设置其他默认设置
# 比如说我的服务器都是30个线程以上，所以我将默认的运行线程数为30，这个根据自己服务器设置就行
humann_config --update run_modes threads 30
#######################
#　HUMAnN configuration file updated: run_modes : threads = 30

运行 HUMAnN 3.0

全参数帮助内容查看：

usage: humann_config [-h] [--print] [--update <section> <name> <value>]HUMAnN Configurationoptional arguments:-h, --help            show this help message and exit--print               print the configuration--update <section> <name> <value>update the section : name to the value provided
(biobakery3) [root@mgmt ~]# humann --help
usage: humann [-h] -i <input.fastq> -o <output> [--threads <1>] [--version][-r] [--bypass-nucleotide-index] [--bypass-nucleotide-search][--bypass-prescreen] [--bypass-translated-search][--taxonomic-profile <taxonomic_profile.tsv>][--memory-use {minimum,maximum}][--input-format {fastq,fastq.gz,fasta,fasta.gz,sam,bam,blastm8,genetable,biom}][--search-mode {uniref50,uniref90}] [-v][--metaphlan <metaphlan>][--metaphlan-options <metaphlan_options>][--prescreen-threshold <0.01>] [--bowtie2 <bowtie2>][--bowtie-options <bowtie_options>][--nucleotide-database <nucleotide_database>][--nucleotide-identity-threshold <0.0>][--nucleotide-query-coverage-threshold <90.0>][--nucleotide-subject-coverage-threshold <50.0>][--diamond <diamond>] [--diamond-options <diamond_options>][--evalue <1.0>] [--protein-database <protein_database>][--rapsearch <rapsearch>][--translated-alignment {usearch,rapsearch,diamond}][--translated-identity-threshold <Automatically: 50.0 or 80.0, Custom: 0.0-100.0>][--translated-query-coverage-threshold <90.0>][--translated-subject-coverage-threshold <50.0>][--usearch <usearch>] [--gap-fill {on,off}] [--minpath {on,off}][--pathways {metacyc,unipathway}][--pathways-database <pathways_database.tsv>] [--xipe {on,off}][--annotation-gene-index <3>] [--id-mapping <id_mapping.tsv>][--remove-temp-output][--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}][--o-log <sample.log>] [--output-basename <sample_name>][--output-format {tsv,biom}] [--output-max-decimals <10>][--remove-column-description-output][--remove-stratified-output]HUMAnN : HMP Unified Metabolic Analysis Network 3optional arguments:-h, --help            show this help message and exit[0] Common settings:-i <input.fastq>, --input <input.fastq>input file of type {fastq,fastq.gz,fasta,fasta.gz,sam,bam,blastm8,genetable,biom} [REQUIRED]-o <output>, --output <output>directory to write output files[REQUIRED]--threads <1>         number of threads/processes[DEFAULT: 1]--version             show program's version number and exit[1] Workflow refinement:-r, --resume          bypass commands if the output files exist--bypass-nucleotide-indexbypass the nucleotide index step and run on the indexed ChocoPhlAn database--bypass-nucleotide-searchbypass the nucleotide search steps--bypass-prescreen    bypass the prescreen step and run on the full ChocoPhlAn database--bypass-translated-searchbypass the translated search step--taxonomic-profile <taxonomic_profile.tsv>a taxonomic profile (the output file created by metaphlan)[DEFAULT: file will be created]--memory-use {minimum,maximum}the amount of memory to use[DEFAULT: minimum]--input-format {fastq,fastq.gz,fasta,fasta.gz,sam,bam,blastm8,genetable,biom}the format of the input file[DEFAULT: format identified by software]--search-mode {uniref50,uniref90}search for uniref50 or uniref90 gene families[DEFAULT: based on translated database selected]-v, --verbose         additional output is printed[2] Configure tier 1: prescreen:--metaphlan <metaphlan>directory containing the MetaPhlAn software[DEFAULT: $PATH]--metaphlan-options <metaphlan_options>options to be provided to the MetaPhlAn software[DEFAULT: "-t rel_ab"]--prescreen-threshold <0.01>minimum percentage of reads matching a species[DEFAULT: 0.01][3] Configure tier 2: nucleotide search:--bowtie2 <bowtie2>   directory containing the bowtie2 executable[DEFAULT: $PATH]--bowtie-options <bowtie_options>options to be provided to the bowtie software[DEFAULT: "--very-sensitive"]--nucleotide-database <nucleotide_database>directory containing the nucleotide database[DEFAULT: /path/to/databases/chocophlan_v296_201901]--nucleotide-identity-threshold <0.0>identity threshold for nuclotide alignments[DEFAULT: 0.0]--nucleotide-query-coverage-threshold <90.0>query coverage threshold for nucleotide alignments[DEFAULT: 90.0]--nucleotide-subject-coverage-threshold <50.0>subject coverage threshold for nucleotide alignments[DEFAULT: 50.0][3] Configure tier 2: translated search:--diamond <diamond>   directory containing the diamond executable[DEFAULT: $PATH]--diamond-options <diamond_options>options to be provided to the diamond software[DEFAULT: "--top 1 --outfmt 6"]--evalue <1.0>        the evalue threshold to use with the translated search[DEFAULT: 1.0]--protein-database <protein_database>directory containing the protein database[DEFAULT: /path/to/databases/uniref90_v201901/]--rapsearch <rapsearch>directory containing the rapsearch executable[DEFAULT: $PATH]--translated-alignment {usearch,rapsearch,diamond}software to use for translated alignment[DEFAULT: diamond]--translated-identity-threshold <Automatically: 50.0 or 80.0, Custom: 0.0-100.0>identity threshold for translated alignments[DEFAULT: Tuned automatically (based on uniref mode) unless a custom value is specified]--translated-query-coverage-threshold <90.0>query coverage threshold for translated alignments[DEFAULT: 90.0]--translated-subject-coverage-threshold <50.0>subject coverage threshold for translated alignments[DEFAULT: 50.0]--usearch <usearch>   directory containing the usearch executable[DEFAULT: $PATH][5] Gene and pathway quantification:--gap-fill {on,off}   turn on/off the gap fill computation[DEFAULT: on]--minpath {on,off}    turn on/off the minpath computation[DEFAULT: on]--pathways {metacyc,unipathway}the database to use for pathway computations[DEFAULT: metacyc]--pathways-database <pathways_database.tsv>mapping file (or files, at most two in a comma-delimited list) to use for pathway computations[DEFAULT: metacyc database ]--xipe {on,off}       turn on/off the xipe computation[DEFAULT: off]--annotation-gene-index <3>the index of the gene in the sequence annotation[DEFAULT: 3]--id-mapping <id_mapping.tsv>id mapping file for alignments[DEFAULT: alignment reference used][6] More output configuration:--remove-temp-output  remove temp output files[DEFAULT: temp files are not removed]--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}level of messages to display in log[DEFAULT: DEBUG]--o-log <sample.log>  log file[DEFAULT: temp/sample.log]--output-basename <sample_name>the basename for the output files[DEFAULT: input file basename]--output-format {tsv,biom}the format of the output files[DEFAULT: tsv]--output-max-decimals <10>the number of decimals to output[DEFAULT: 10]--remove-column-description-outputremove the description in the output column[DEFAULT: output column includes description]--remove-stratified-outputremove stratification from output[DEFAULT: output is stratified]

humann主要功能模块

humann_barplot
humann_strain_profiler
humann_benchmark
humann_genefamilies_genus_level
humann_reduce_table
humann_rna_dna_norm
humann_build_custom_database
humann_humann1_kegg
humann_regroup_table
humann_split_stratified_table
humann_unpack_pathways
humann_associate
humann_infer_taxonomy
humann_split_table

使用以下命令来运行 HUMAnN 3.0：

单个样品分别运行

# humann3已经不需要带3了，与2不同
humann --input input.fastq.gz --output output_dir --threads NUM_THREADS
# 正反序列直接按顺序多加一个input或-i参数,或者在-i参数后面两个文件逗号隔开
# 注意文件名和文件路径相同部分不能因为相同部分就使用简写
# 另外最好是指定输入文件类型--imput-format
humann -i <input_forward.fastq> -i <input_reverse.fastq> --output <output_directory> --imput-format fastq#在此命令中，input.fastq.gz 是宏基因组数据文件，output_dir 是输出结果的目录，NUM_THREADS 是你希望使用的线程数。