Genome analysis of Endozoicomonas species

Genome assembly of OTU5

Preprocessing

Read preprocessing was conducted by fastp

fastp \
   -i ${READ1}.fastq \
   -I ${READ2}.fastq \
   -w 20 \
   -h ${report}.html \
   -j ${report}.json \
   -o ${READ1}.QC.fastq \
   -O ${READ2}.QC.fastq

De novo assembly

De novo assembly was performed by SPAdes

spades.py \
   -1 ${READ1}.QC.fastq \
   -2 ${READ2}.QC.fastq \
   -k auto \
   --careful \
   -t 20 \
   -o assembly_results

Gene annotation

Gene prediction

CDS regions were predicted from the genome sequences by Prokka.

prokka \
   --outdir ../Prokka/${id}_prokka \
   --kingdom Bacteria \
   --gcode 11 \
   --gram neg \
   --cpus 20 \
   --centre X \
   --compliant \
   --prefix ${id}_prokka \
   --addgenes ${id}

KAAS

Gene annotation with KEGG database was performed by KEGG Automatic Annotation Server (https://www.genome.jp/tools/kaas/).

NCBI nr

Similarity search against NCBI nr was performed to identify taxonomy of the most similar sequences

  1. Preparation of DB

    git clone https://github.com/tmaruy/blast2taxonomy.git
    chmod +x preprare_taxdb.sh
    ./prepare_taxdb.sh
    
  2. Similarity search with Diamond

    diamond blastx \
       -q ${input} \
       -d ${db} \
       -o ${input}.nr_diamond.txt \
       -f 6 \
       -p 8
    
  3. Parse results

    python parse_nr.py \
       -i ${input}.nr_diamond.txt \
       -o ${input}.nr_diamond.parse.txt \
       -threshold 1e-5
    

Comparative genome analysis

Orthogroup identification

Orthologous gene groups were determined by OrthoFinder2

orthofinder \
   -f ${faa_dir} \ # Directory including Protein amino acid FASTA files
   -t 16 \
   -a 3

Genome alignment

Genome alignment was performed by progressiveMauve

progressiveMauve \
   --output=progressiveMauve.all.xmfa \
   ${indir}/Endozoicomonas_acroporae_Acr-14.fasta_prokka.fna \
   ${indir}/Endozoicomonas_arenosclerae_ab112.fasta_prokka.fna \
   ${indir}/Endozoicomonas_ascidiicola_AVMART05.fasta_prokka.fna \
   ${indir}/Endozoicomonas_ascidiicola_KASP37.fasta_prokka.fna \
   ${indir}/Endozoicomonas_atrinae_WP70.fasta_prokka.fna \
   ${indir}/Endozoicomonas_culture.decontamination.fasta_prokka.fna \
   ${indir}/Endozoicomonas_elysicola_DSM_22380.fasta_prokka.fna \
   ${indir}/Endozoicomonas_montiporae_CL-33.fasta_prokka.fna \
   ${indir}/Endozoicomonas_montiporae_LMG_24815.fasta_prokka.fna \
   ${indir}/Endozoicomonas_numazuensis_DSM_25634.fasta_prokka.fna \
   ${indir}/Endozoicomonas_sp_OPT23.fasta_prokka.fna

Downstream analysis