HTSlib

PacBio Data Processing uses pysam, a wrapper around HTSlib, to read the BAM files. Although the installation of pysam is automatically triggered by the installation of PacBio Data Processing, HTSlib must be installed independently, otherwise PacBio Data Processing will die at runtime.

Installing HTSlib

In the sections below, I briefly explain two ways to install HTSlib.

Standard installation

Probably, the easiest way to install HTSlib is through your package manager. But it can be installed also from sources; have a look at the HTSlib webpage to learn about that.

Spack

Another particularly simple way to install HTSlib is through Spack, especially if you are going to work on a cluster where using its package manager is cumbersome, or even impossible, and the installation from sources is not appealing to you. In this case the installation with Spack goes as follows.

  1. (Optional) Choosing the compiler. HTSlib will be compiled from source code by Spack. You might need to choose an up-to-date compiler (clusters tend to have very stable, ie. old, default compilers). See Using Spack for details.

  2. Installing HTSlib itself. With the default compiler it would be:

    $ spack install htslib
    

    or if we want to install it with a specific compiler, say gcc-11.3:

    $ spack install htslib%gcc@11.3
    

    The result will be a module. In our case, the name of the module is htslib-1.14-gcc-11.3.0-22tiwx3

  3. Using HTSlib. As mentioned above, PacBio Data Processing depends on HTSlib at runtime. It means that after a successfull installation, the created module must be loaded whenever it is needed:

    $ module load htslib-1.14-gcc-11.3.0-22tiwx3
    

    Warning

    Remember to add the line:

    module load htslib-1.14-gcc-11.3.0-22tiwx3
    

    at the beginning of the slurm batch scripts used to submit any executable from PacBio Data Processing.