Database Backups
Create a full database dump/backup
The following takes a database named "neo4j" found in database/location/socialgene_neo4j
and creates a single dump file at database/location/socialgene_neo4j/backups/neo4j.dump
. While it will be smaller than the space occupied by the database/location/socialgene_neo4j
directory, the file can still be quite large.
sg_neoloc='/database/location/socialgene_neo4j'
# mkdir because the docker image will create dirs as root if they don't exist
mkdir -p $sg_neoloc/backups
docker run \
--user=$(id -u):$(id -g) \
--interactive \
--tty \
--rm \
--volume=$sg_neoloc/data:/data \
--volume=$sg_neoloc/backups:/backups \
--env NEO4J_AUTH=neo4j/test \
neo4j/neo4j-admin:5.16.0 \
neo4j-admin database dump \
--to-path=/backups \
neo4j
Restore from a full database dump/backup
Given a Neo4j database dump file at path $dump_path
, rehydrate the database inside directory $sg_neoloc
.
dump_path='/path/to/neo4j.dump'
sg_neoloc='/path/to/new/db/directory'
pipeline_version='latest'
# mkdir because the docker image will create dirs as root if they don't exist
mkdir -p $sg_neoloc/data
mkdir -p $sg_neoloc/logs
mkdir -p $sg_neoloc/plugins
mkdir -p $sg_neoloc/conf
mkdir -p $sg_neoloc/import
docker run \
--user=$(id -u):$(id -g) \
--interactive \
--tty \
--rm \
--volume=$sg_neoloc/data:/opt/conda/bin/neo4j/data \
--volume=$sg_neoloc/plugins:/opt/conda/bin/neo4j/plugins \
--volume=$sg_neoloc/logs:/opt/conda/bin/neo4j/logs \
--volume=$dump_path:/opt/conda/bin/neo4j/neo4j.dump \
--env NEO4J_AUTH=neo4j/test \
chasemc2/sgnf-sgpy:$pipeline_version \
neo4j-admin database load \
--from-path=. \
neo4j
Note
The script below will create the database named as "neo4j", no matter what the $dump_path file name is. To change the db name you would have to modify both --volume=$dump_path:/opt/conda/bin/neo4j/neo4j.dump \
and the last neo4j
in the Docker command. Unless you are familiar with Neo4j, and want to load multiple databases at once, you should probably leave it as "neo4j".
Restore faster please
The rehydration step is quite I/O intensive. Therefore, for larger database dumps, and if you have enough spare RAM, it may be beneficial to copy the database dump file onto RAM first and then load/rehydrate so that read and write won't be occuring on the same hard drive. On Ubuntu Linux that would look something like this:
dump_path='/path/to/neo4j.dump'
sg_neoloc='/path/to/new/db/directory'
pipeline_version='latest'
# copy the dump file to RAM
mkdir -p /dev/shm/social_gene_dump
cp $dump_path /dev/shm/social_gene_dump
# Change the $dump_path
dump_path='/dev/shm/social_gene_dump/neo4j.dump'
# mkdir because the docker image will create dirs as root if they don't exist
mkdir -p $sg_neoloc/data
mkdir -p $sg_neoloc/logs
mkdir -p $sg_neoloc/plugins
mkdir -p $sg_neoloc/conf
mkdir -p $sg_neoloc/import
docker run \
--user=$(id -u):$(id -g) \
--interactive \
--tty \
--rm \
--volume=$sg_neoloc/data:/opt/conda/bin/neo4j/data \
--volume=$sg_neoloc/plugins:/opt/conda/bin/neo4j/plugins \
--volume=$sg_neoloc/logs:/opt/conda/bin/neo4j/logs \
--volume=$dump_path:/opt/conda/bin/neo4j/neo4j.dump \
--env NEO4J_AUTH=neo4j/test \
chasemc2/sgnf-sgpy:$pipeline_version \
neo4j-admin database load \
--from-path=. \
neo4j
More info
https://neo4j.com/docs/operations-manual/current/backup-restore/offline-backup/
Ultraquickstart Example
A database dump generated from the ultraquickstart example can be found at https://github.com/socialgene/sgnf/releases/download/v0.2.4/ultraquickstart.dump
Launch the Database
After hydrating a database dump you can launch it using the directions in Database Launch