.. _quickstart9steps: ======================= Quickstart on a cluster ======================= This is a very synthetic document that summarizes the steps necessary to install and use the |project| software on a cluster. Not many details are given here, since this document is only intended to be a brief reference. If you need more details on each step, please follow the provided links. Goal ==== Starting with a PacBio sequencing file (bam file) and a reference sequence (fasta file), the `sm-analysis` tool from |project| will generate a csv file (a so-called :ref:`methylation report `). Each row corresponds to one molecule, (*hole number* or *ZMW* in the PacBio parlance) with columns containing properties for each molecule that overcame good quality filters. Additional to this, a :ref:`summary-report` is generated containing some basic statistics about the input, the process and output files. Steps ===== 1. Create a cluster account. Needless to say, this step is strongly dependent on the cluster and details cannot be given here (but see :ref:`using-a-cluster` if you plan to use the `Goethe-HLR cluster`_). 2. Open a terminal and login to access to the cluster (see :ref:`using-a-cluster`). 3. Install Python-3.9 (or above) in the cluster (see the :ref:`installation` document). 4. Create a virtual environment (see the :ref:`installation` document). 5. Install the external dependences :ref:`pbindex `, :ref:`pbmm2 `, :ref:`kineticsTools` and :ref:`ccs ` (see the :ref:`using-a-cluster` document). 6. Install |project| (see the :ref:`installation` document). 7. Copy the input files to the cluster. Assuming that you want to process a file called ``pbsequencing.bam`` and your reference is stored in a file called ``reference.fasta`` (with its companion index ``reference.fasta.fai``), run the following command in a terminal: .. code-block:: console scp pbsequencing.bam reference.fasta{,.fai} dave@goethe.hhlr-gu.de:/scratch/fuchs/darmstadt/dave/myproject/ YMMV: the paths will change depending on the name of your account, and the destination directory. The destination directory must exist. Recent versions of ``rsync`` accept a ``--mkpath`` option to create missing components of the destination path; don't count on having recent versions of software by default on a cluster ;-) .. note:: The cluster administrators tend to be very concerned about a proper usage of the filesystems available in a cluster. Quite often they provide different filesystems with different properties (speed, size, etc) along with suggestions and policies to use them properly. Try to find out what is the situation in your case and stick, as much as you can, to their policy to minimize performance problems. On the `Goethe-HLR cluster`_ website you can learn about filesystems in the `Goethe-HLR storage`_ or `FUCHS-CSC storage`_ sections. 8. Prepare and submit a Job (see :ref:`using-a-cluster`). This step is where the analysis done by |project| is carried out. 9. Copy the output files to your personal computer: .. code-block:: console scp dave@goethe.hhlr-gu.de:/scratch/fuchs/darmstadt/dave/[file to transfer] . where the trailing ``.`` (*dot*) can be replaced by any other *local path*, of course. The special case of ``.`` means *current working directory*. Or, to synchronize the remote location with your current working directory: .. code-block:: console rsync -avz dave@goethe.hhlr-gu.de:/scratch/fuchs/darmstadt/dave/myproject/ ./ .. _Goethe-HLR cluster: https://csc.uni-frankfurt.de/ .. _Goethe-HLR storage: https://csc.uni-frankfurt.de/wiki/doku.php?id=public:usage:goethe#storage .. _FUCHS-CSC storage: https://csc.uni-frankfurt.de/wiki/doku.php?id=public:usage:fuchs#storage