Installation¶
Pre-requisites¶
Python¶
To install PacBio Data Processing a Python interpreter is needed
in your system since the package is written in Python. The recommended
version of Python is 3.9
. Strictly speaking the code should work
with a less recent version, but some dependencies will require anyway
Python-3.9
.
Note
Along this document we will show commands to be typed in a terminal.
You will notice that the commands are preceeded by a $
symbol
that represents the prompt. The $
SHOULD NOT be typed. Its
purpose is to distinguish between commands to be typed and output,
since outputs will NOT have a preceeding $
symbol.
Note
During the installation process you will type several shell commands. The
typical behaviour of shell commands is quite ingrate: loud complains
and quiet celebrations. What can you do to know if a command worked
correctly or not? You can find the exit status of the latest
issued command with echo $?
. For instance:
$ make
... lots of output ...
$ echo $?
0
The displayed exit status of 0
means that the program
make
ran successfully.
If you are using Linux, it is likely that Python is already present in your system. Open a terminal and check it out with:
$ python --version
or
$ python3 --version
You know that Python is in your system if you get as output something like (your mileage may vary):
Python 3.9.13
Installing Python
If you don’t have Python, or you have an old version, have a look at the section Installing Python, and the references therein.
Other dependencies¶
PacBio Data Processing delegates some tasks to external tools. Therefore, the next is a list of external dependencies:
These dependencies are required to be present in your system in order to use some tools provided by PacBio Data Processing. You need to install them if they are absent in your system.
Virtual environment¶
It is optional but highly recommended to use a virtual environment
(or a variant thereof) to install PacBio Data Processing. In this
document we will use the standard library’s venv
module.
A virtual environment (or venv
for short) allows us to have
the required set of packages independently of the system-wide packages
installed. This has several advantages. First, it will help you produce an
isolated mess in case something goes wrong, but it also allows us to
decide the version of any package we are interested in. irrespective
of what other venv
’s need, or what the system needs.
A venv
can be created like follows:
$ python3.9 -m venv PDP-py39
this line will create a folder called PDP-py39
containing the venv
.
You can choose another name if you like.
After the installation one can activate the venv
to start using it with:
$ source PDP-py39/bin/activate
From that point on, the management of and access to Python packages
happens within the venv
. For example, installing a new package
will be done inside the venv
.
Afterwards you can proceed with the installation of PacBio Data Processing.
For more information on venv
’s, consult the documentation of that module
in the standard library venvs, and references therein.
Note
To stop using a venv
, type deactivate
in the same
terminal where the venv
was activated.
Installing the stable release of PacBio Data Processing¶
The latest stable release of PacBio Data Processing can be installed by executing this command in your terminal:
$ pip install PacbioDataProcessing
or, optionally, if you want to enable the sm-analysis-gui
program,
i.e. the GUI to the single molecule analysis, running this:
$ pip install PacbioDataProcessing[gui]
However, be aware that the installation including the GUI will fail if your system does not have [wxpython] installed.
Note
In the rare case that you don’t have pip installed, this Python installation guide can guide you through the process of installing pip.
Note
Typically, after you use pip
for the first time in your venv
you receive a warning message saying that your version of pip
is too
old:
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the '/path/to/your/venv/bin/python -m pip install --upgrade pip' command.
That happens because the pip
bundled with the specific version of Python
you used to create the venv
is older than the newest version available.
You can update pip
by following the command provided. Or, if the venv
is active, equivalently with:
$ pip install -U pip
that will upgrade pip
and make the warning messages disappear.
Alternative: Installing PacBio Data Processing from a file¶
It is also possible to install PacBio Data Processing from a file: a tarball or a wheel.
You simply need the file and run pip on it. For instance, using as an example
a tarball corresponding to version 1.0.0
, it would be:
$ pip install PacbioDataProcessing-1.0.0.tar.gz
From a wheel it would be:
$ pip install PacbioDataProcessing-1.0.0-py3-none-any.whl
Of course, you could also choose to install optional dependencies as usual:
$ pip install PacbioDataProcessing-1.0.0-py3-none-any.whl[gui]
Alternative: Installing PacBio Data Processing from the repository¶
Warning
The instructions in this section are not necessary for end users. If you are simply interested in using PacBio Data Processing to analyze some BAM file or you need to use some functionality provided by PacBio Data Processing from within your code, you don’t necessarily need this section. But if you want to have access to the source code keep reading.
The sources of PacBio Data Processing can be downloaded from its GitLab repo.
You can either clone the public repository:
$ git clone git://gitlab.com/dvelazquez/pacbio-data-processing
and install it with:
$ pip install ./pacbio-data-processing
Or download the tarball:
$ curl -JL https://gitlab.com/dvelazquez/pacbio-data-processing/-/archive/master/pacbio_data_processing-master.zip --output pacbio-data-processing-master.zip
and install it with:
$ pip install pacbio-data-processing-master.zip
Or simply run:
$ pip install git+https://gitlab.com/dvelazquez/pacbio-data-processing