Dryad Data Access
The data for first manuscript has been deposited in the Dryad repository for longer term preservation.
The data can be accessed at the following link: https://doi.org/10.5061/dryad.ns1rn8q2k
Data Description
The included databases are described here.
One difference is the full SocialGene RefSeq Neo4j database and Actinomycetota database had to be split into smaller parts for hosting on Dryad. The resulting split files follow the naming convention neo4j_db_refseq_base.dump_split_0, neo4j_db_refseq_base.dump_split_01, ..., neo4j_db_refseq_base.dump_split_23.
Before using these two database dumps the files must be merged by downloading all of the parts and concatenating them together.
For example:
cat neo4j_db_refseq_base.dump_split_* > neo4j_db_refseq_base.dump
mad5sum -c md5checksums.txt
Dryad only allows for a flat file structure so the files are all in the same "directory". The included files are:
- md5 checksums
md5checksums.txt
- HMM files (for use with any of the 2023_v0.4.1 dataabses)
socialgene_nr_hmms_file_with_cutoffs_1_of_1.hmm.gzsocialgene_nr_hmms_file_without_cutoffs_1_of_1.hmm.gzhmminfo.tsv.gz
- Neo4j database dump files
- All RefSeq
neo4j_db_refseq_base.dump_split_00neo4j_db_refseq_base.dump_split_01neo4j_db_refseq_base.dump_split_02neo4j_db_refseq_base.dump_split_03neo4j_db_refseq_base.dump_split_04neo4j_db_refseq_base.dump_split_05neo4j_db_refseq_base.dump_split_06neo4j_db_refseq_base.dump_split_07neo4j_db_refseq_base.dump_split_08neo4j_db_refseq_base.dump_split_09neo4j_db_refseq_base.dump_split_10neo4j_db_refseq_base.dump_split_11neo4j_db_refseq_base.dump_split_12neo4j_db_refseq_base.dump_split_13neo4j_db_refseq_base.dump_split_14neo4j_db_refseq_base.dump_split_15neo4j_db_refseq_base.dump_split_16neo4j_db_refseq_base.dump_split_17neo4j_db_refseq_base.dump_split_18neo4j_db_refseq_base.dump_split_19neo4j_db_refseq_base.dump_split_20neo4j_db_refseq_base.dump_split_21neo4j_db_refseq_base.dump_split_22neo4j_db_refseq_base.dump_split_23
- All RefSeq Actinomycetota
neo4j_db_actinomycetota_base.dumpparams_actinomycetota.json
- All RefSeq Streptomyces
neo4j_db_streptomyces_base.dumpparams_micromonospora.json
- All RefSeq Micromonospora
neo4j_db_micromonospora_base.dumpparams_streptomyces.json
- All RefSeq antiSMASH-7.0 BGCs
neo4j_db_refseq_antismash_bgcs_base.dumpparams_refseq_antismash_bgcs.json
- Three genomes used for protein similarity method comparisons
methods_comparison.dumpmethods_comparison.json
- All RefSeq
Checksums
Additionally, there is a single md5checksums.txt file that contains the md5 checksums for all of the files in the dataset. This can be used to verify the integrity of any or all of the files after downloading. This includes the expected md5sum of the concatenated neo4j_db_refseq_base.dump and neo4j_db_actinomycetota_base.dump files.