Installation

Pre-requisites

Python

To install PacBio Data Processing a Python interpreter is needed in your system since the package is written in Python. The recommended version of Python is 3.9. Strictly speaking the code should work with a less recent version, but some dependencies will require anyway Python-3.9.

Note

Along this document we will show commands to be typed in a terminal. You will notice that the commands are preceeded by a $ symbol that represents the prompt. The $ SHOULD NOT be typed. Its purpose is to distinguish between commands to be typed and output, since outputs will NOT have a preceeding $ symbol.

Note

During the installation process you will type several shell commands. The typical behaviour of shell commands is quite ingrate: loud complains and quiet celebrations. What can you do to know if a command worked correctly or not? You can find the exit status of the latest issued command with echo $?. For instance:

$ make
... lots of output ...
$ echo $?
0

The displayed exit status of 0 means that the program make ran successfully.

If you are using Linux, it is likely that Python is already present in your system. Open a terminal and check it out with:

$ python --version

or

$ python3 --version

You know that Python is in your system if you get as output something like (your mileage may vary):

Python 3.9.13

Installing Python

If you don’t have Python, or you have an old version, have a look at the section Installing Python, and the references therein.

Other dependencies

PacBio Data Processing delegates some tasks to external tools. Therefore, the next is a list of external dependencies:

These dependencies are required to be present in your system in order to use some tools provided by PacBio Data Processing. You need to install them if they are absent in your system.

Virtual environment

It is optional but highly recommended to use a virtual environment (or a variant thereof) to install PacBio Data Processing. In this document we will use the standard library’s venv module.

A virtual environment (or venv for short) allows us to have the required set of packages independently of the system-wide packages installed. This has several advantages. First, it will help you produce an isolated mess in case something goes wrong, but it also allows us to decide the version of any package we are interested in. irrespective of what other venv’s need, or what the system needs.

A venv can be created like follows:

$ python3.9 -m venv PDP-py39

this line will create a folder called PDP-py39 containing the venv. You can choose another name if you like. After the installation one can activate the venv to start using it with:

$ source PDP-py39/bin/activate

From that point on, the management of and access to Python packages happens within the venv. For example, installing a new package will be done inside the venv.

Afterwards you can proceed with the installation of PacBio Data Processing.

For more information on venv’s, consult the documentation of that module in the standard library venvs, and references therein.

Note

To stop using a venv, type deactivate in the same terminal where the venv was activated.

Installing the stable release of PacBio Data Processing

The latest stable release of PacBio Data Processing can be installed by executing this command in your terminal:

$ pip install PacbioDataProcessing

or, optionally, if you want to enable the sm-analysis-gui program, i.e. the GUI to the single molecule analysis, running this:

$ pip install PacbioDataProcessing[gui]

However, be aware that the installation including the GUI will fail if your system does not have [wxpython] installed.

Note

In the rare case that you don’t have pip installed, this Python installation guide can guide you through the process of installing pip.

Note

Typically, after you use pip for the first time in your venv you receive a warning message saying that your version of pip is too old:

WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/path/to/your/venv/bin/python -m pip install --upgrade pip' command.

That happens because the pip bundled with the specific version of Python you used to create the venv is older than the newest version available. You can update pip by following the command provided. Or, if the venv is active, equivalently with:

$ pip install -U pip

that will upgrade pip and make the warning messages disappear.

Alternative: Installing PacBio Data Processing from a file

It is also possible to install PacBio Data Processing from a file: a tarball or a wheel.

You simply need the file and run pip on it. For instance, using as an example a tarball corresponding to version 1.0.0, it would be:

$ pip install PacbioDataProcessing-1.0.0.tar.gz

From a wheel it would be:

$ pip install PacbioDataProcessing-1.0.0-py3-none-any.whl

Of course, you could also choose to install optional dependencies as usual:

$ pip install PacbioDataProcessing-1.0.0-py3-none-any.whl[gui]

Alternative: Installing PacBio Data Processing from the repository

Warning

The instructions in this section are not necessary for end users. If you are simply interested in using PacBio Data Processing to analyze some BAM file or you need to use some functionality provided by PacBio Data Processing from within your code, you don’t necessarily need this section. But if you want to have access to the source code keep reading.

The sources of PacBio Data Processing can be downloaded from its GitLab repo.

You can either clone the public repository:

$ git clone git://gitlab.com/dvelazquez/pacbio-data-processing

and install it with:

$ pip install ./pacbio-data-processing

Or download the tarball:

$ curl -JL https://gitlab.com/dvelazquez/pacbio-data-processing/-/archive/master/pacbio_data_processing-master.zip  --output pacbio-data-processing-master.zip

and install it with:

$ pip install pacbio-data-processing-master.zip

Or simply run:

$ pip install git+https://gitlab.com/dvelazquez/pacbio-data-processing