STRPsearch is a specialized tool designed for rapid and precise identification and mapping of structured tandem repeats in proteins (STRPs).
If you find STRPsearch useful for your research, please cite:
Mozaffari S, Arrías PN, Clementel D, Piovesan D, Ferrari C, Tosatto SCE, Monzon AM. STRPsearch: fast detection of structured tandem repeat proteins. Bioinformatics. 2024;40(12):btae690. https://doi.org/10.1093/bioinformatics/btae690.
To get started with the project, first, extract the contents of data/databases.zip by running the following command:
cd data && unzip databases.zip && cd ..
Then you can choose one of the following methods to set up the software:
- Install all the dependencies listed in the
requirements.txtfile:
pip install -r requirements.txt
Note: Inside the requirements.txt file, you'll find a commented section that includes dependencies which cannot be installed with pip. To install these dependencies, you can use Conda by running the following commands:
conda install -c conda-forge -c bioconda foldseek
conda install -c bioconda usalign
- Navigate to the main directory of the project and run the software with the following command:
python3 ./bin/strpsearch.py [OPTIONS] COMMAND [ARGS]...
- Import and activate the Conda environment from the
environment.ymlfile:
conda env create -f environment.yml
conda activate strpsearch_env_ch_before
- Navigate to the main directory of the project and run the software with the following command:
python3 ./bin/strpsearch.py [OPTIONS] COMMAND [ARGS]...
- Build the Docker image using the provided
Dockerfile:
docker build -t strpsearch .
- To run the container in an interactive mode, use the following command:
docker run -it --entrypoint /bin/bash -v /mount/directory/:/app strpsearch
Be aware that -v /mount/directory/:/app command mounts the specified directory (/mount/directory/) to the working directory of the container. This ables the container to read and write files on the host machine.
- Navigate to the main directory of the project and run the software with the following command:
python3 ./bin/strpsearch.py [OPTIONS] COMMAND [ARGS]...
The tools has three Commands, each with its positional arguments and options.
To list the available commands run:
python3 bin/strpsearch.py --help
Which returns the following commands:
| Command | Description |
|---|---|
query-file |
Query an existing PDB/CIF formatted structure file by providing the file path |
download-pdb |
Download CIF file and query a structure from PDB by providing the PDB ID and the specific Chain of interest |
download-model |
Download CIF file and query an AlphaFold model by providing the UniProt ID and the AlphaFold version of interest |
version |
Show the version and exit |
input_file(TEXT): Path to the input structure file to query (PDB/mmCIF). This argument is required. Default: Noneout_dir(TEXT): Path to the output directory. This argument is required. Default: None
--chainsaw / --no-chainsaw(BOOL): Whether to use chainsaw segmentation tool or not. Default: no-chainsaw--chain(TEXT): Specific chain to query from the structures. Default: all--db(TEXT) : Path to the databases to use. Default: data/databases--temp-dir(TEXT): Path to the temporary directory. Default: /tmp--max-eval(FLOAT): Maximum E-value of the targets to prefilter. Default: 0.01--min-height(FLOAT): Minimum height of TM-score signals to be processed. Default: 0.4--keep-temp / --no-keep-temp: Whether to keep the temporary directory and files. Default: no-keep-temp--pymol-pse / --no-pymol-pse: Whether to create and output PyMOL session files. Default: no-pymol-pse--help: Show this message and exit--db: Path to databases to use
pdb_id(TEXT): PDB ID of the experimental structure to download and query. This argument is required. Default: Noneout_dir(TEXT): Path to the output directory. This argument is required. Default: None
--chainsaw / --no-chainsaw(BOOL): Whether to use chainsaw segmentation tool or not. Default: no-chainsaw--chain(TEXT): Specific chain to query from the structures. Default: all--db(TEXT) : Path to the databases to use. Default: data/databases--temp-dir(TEXT): Path to the temporary directory. Default: /tmp--max-eval(FLOAT): Maximum E-value of the targets to prefilter. Default: 0.01--min-height(FLOAT): Minimum height of TM-score signals to be processed. Default: 0.4--keep-temp / --no-keep-temp: Whether to keep the temporary directory and files. Default: no-keep-temp--pymol-pse / --no-pymol-pse: Whether to create and output PyMOL session files. Default: no-pymol-pse--help: Show this message and exit
uniprot_id(TEXT): UniProt ID of the AlphaFold-predicted model to download and query. This argument is required. Default: Noneaf_version(TEXT): Version of AlphaFold to download predicted models from. This argument is required. Default: Noneout_dir(TEXT): Path to the output directory. This argument is required. Default: None
--chainsaw / --no-chainsaw(BOOL): Whether to use chainsaw segmentation tool or not. Default: no-chainsaw--db(TEXT) : Path to the databases to use. Default: data/databases--temp-dir(TEXT): Path to the temporary directory. Default: /tmp--max-eval(FLOAT): Maximum E-value of the targets to prefilter. Default: 0.01--min-height(FLOAT): Minimum height of TM-score signals to be processed. Default: 0.4--keep-temp / --no-keep-temp: Whether to keep the temporary directory and files. Default: no-keep-temp--pymol-pse / --no-pymol-pse: Whether to create and output PyMOL session files. Default: no-pymol-pse--help: Show this message and exit
To generate and output PyMOL sessions using the --pymol-pse option, you must have PyMOL installed. You can install PyMOL using conda with one of the following commands, or compile it from source (https://www.pymol.org/):
conda install -c conda-forge -c schrodinger pymol-bundle=2.6
conda install -c conda-forge pymol-open-source
If you already have a PDB/CIF formatted structure file, and you want to query all the chains in the structure, keeping temporary directory and files:
python3 ./bin/strpsearch.py query-file /input/file output/directory --keep-temp
If you want to automatically download and query a specific experimental structure from PDB (e.g. chain B of PDB structure 1A0R), without keeping temporary directory and files:
python3 ./bin/strpsearch.py download-pdb 1a0r output/directory --chain B
If you want to automatically download and query a predicted-model from AlphaFold (e.g. UniProt ID: Q9HXJ7)
python3 ./bin/strpsearch.py download-model Q9HXJ7 output/directory --chainsaw
You can test out the tool on the official website: https://strpsearch.biocomputingup.it/
You must provide either a PDB ID, a UniProt accession number, or upload a structure file.You can choose to use Chainsaw for domain fragmentation, adjust Foldseek and US-align thresholds, add a job description, and optionally receive a completion alert via email.
After clicking on "submit" it will take you to the results Page
At the top of the Results page, You can find the job details along with the protein structure information. If no specific protein chain is selected, the prediction is performed on all chains. Results for each chain are displayed separately in expandable/collapsible sections, each composed of three main components. All prediction outputs and execution logs are available for download.

