|
For your convenience, we have provided a written guide to this tutorial below.
This video will help us get started using PatternHunter. Our PatternHunter download contains executables and some extras. The demo includes a compiled exe file, whereas the full version uses java jar files. You must have Java 1.4 or better to run this program. Create an easily accessible folder for PatternHunter and extract the files here.
Open a command prompt from the accessories section of your start menu. Change directories to your PatternHunter directory. Typing "dir" reveals its contents. Run PatternHunter by typing "java -jar phn.jar". In order to use PatternHunter, you must first register the product. The registration process will begin the first time you run PatternHunter. You will need the registration key that you received by email.
Follow onscreen instructions to complete registration. To register, you will also need to be connected to the internet and you will need administrator or root privileges. If you are not connected to the internet, type "N" to be directed to a web based registration form.
PatternHunter will display "Registration Successful!" if all is well. Then display its parameters.
Registration will create a folder "ver" containing the file "license.lcs" within PatternHunter's folder. If we remove, or rename this file or folder, PatternHunter will ask us to register next time. Now we are ready to run PatternHunter!
We will, of course, need some data to conduct our homology search. It is easiest if we place this data in the PatternHunter folder. Here we've chosen thee small sample sequences. Now we can run PatternHunter and compare the three sequences. Open a command prompt and go to the PatternHunter directory. Run PatternHunter, as before, to see a list of PatternHunter's parameters.
To avoid out of memory errors When comparing large sequences, we should override java's memory limit by typing -Xmx512m. It is a good habit to do so every time we run PatternHunter. It doesn't have to be 512 MB either, set this number to however much memory we have on our system.
Now type -i to specify the query sequence, type its filename. We can choose to search with many query sequences by entering multiple files, and by using files that contain many sequences. Type -j to specify the subject sequence, type its filename. We can also search multiple subject sequences. Type -o and then the name of the output text file to create. Users who are familiar with BLAST output should use the -b parameter for BLAST output format. Ok, hit enter and PatternHunter will begin its search.
Java -Xmx512m -jar ph.jar -i pneumoniae.fna genitalium.fna -j vector.fna -o output.txt -b
Let's have a look at what PatternHunter found. Open the output file in a text editor.
Significantly aligned sequences are summarized at the top, along with the score for each. A higher number represents a better match. The Expect value indicates the probability that this alignment would occur by random chance. Scrolling down, we can examine the alignments that PatternHunter has found. Information about each alignment, like name, score and strand polarity, precedes the sequence. Matching letters are connected with lines. Note that the sequence wraps to the next line, the first line ending at 80 and the next beginning at 81.
In addition to DNA-DNA comparisons, PatternHunter can compare protein sequences against protein sequences, compare translated DNA against protein, and compare of translated DNA against translated DNA.
To do these types of searches, we must use translated PatternHunter. When using tph, note that -jp and -ip indicate that the sequence is a protein. -j and -i indicate that the sequence is dna, and will be translated prior to comparison.
protein vs protein: -ip protein.fasta -jp protein2.fasta
tDNA vs protein: -i dna.fas -jp protein.fasta
protein vs tDNA : -ip protein.fasta -j dna.fas
tDNA vs tDNA : -i dna.fas -j dna2.fas
As an example of a protein vs translated dna search, we search yeast vs ecoli.
java Xmx700m -jar tph.jar -ip yeast.fasta -j ecoli.fas -o output4.txt -b
The output file from tPH is slightly different. The scoring and information remains the same, but the sequence printout is protein data. The top line of amino acids represents the query protein; if the query was translated DNA, each amino acid would be below its codon. The bottom line of amino acids represents the subject sequence. Since it was from translated DNA, it is shown above its codon. Matches will be shown as the letter repeated for an exact match, or a plus sign for a positive alignment.
There are many ways to configure PatternHunter for a specific application. See the documentation for a full description of parameters. Additionally, see "advanced PatternHunter usage" for tips on adjusting speed and sensitivity and tips on searching with large databases.
If you encounter any difficulties, feel free to Contact Us.
|