tacg (1) | Version 4.3 | tacg (1) |
In the following summary,
[-c h H l L q Q s S v] [-b begin] [-e end] [--clone #_#,#x#..] [-C 0-16] [--cost units/$] [--dam] [--dcm] [-D 0-4] [--example] [-f 0|1] [-F 0-3] [-g LoCutOff(,HiCutOff)] [-G bin_size,X|Y|L] [-i (--idonly) 0-2] [-m min_hits] [-M Max_hits] [-n 3-8] [--notics] [--numstart] [-o0|1|3|5] [-O 1-6(x),minORF] [--orfmap] [-p Name,Pattern,Err] [-P NameA,(+|-)(l|g)Dist_Lo(-Dist_Hi),NameB] --ps] --pdf] [--logdegens] [-r (--regex) 'Label:RegexPat' || 'FILE:FileOfRegexPatterns'] [-R alt_Rebase | alt_Matrix] [--raw] [--rev] [--comp] [--revcomp] [--rules 'NameA:min:Max[&|]NameB:min:Max[&|]..] [--rulefile /path/to/rulefile] [--silent] [--strands] [--tmppath /path/to/tmp/dir] [-T 0|1|3|6,1|3] [-w 1|width] [-V 1-3] [-W #] [-x (--explicit) 'NameA(,=),NameB..(,C)'] [-X (--eXtract) b,e,[0|1]] [-# %_Match_Cutoff]
NB: Most flags are the same as in earlier versions with the exception of these changes:
and these additions:
Unless requested (by -V1-3), it no longer sends errors to stderr (except failure errors) and it no longer emits default output (except for one special case (see -p) - you have to request all output. Most of the internals use dynamic memory so there are few limits on sequence input size and pattern number. I've generated >6000 patterns and searched 230Mb of input sequence. It's ~ 5-35x faster than the comparable routines in the GCG pkg and being written in ANSI C, is portable to all unix variants. It has been ported to Linux (Intel, Alpha, PPC), MacOSX, SunOS/Solaris, Compaq Tru64 Unix (aka DEC Unix aka TUFKAO (the Unix formerly known as OSF)), Ultrix, IRIX, NeXTStep, ConvexOS, and HP/UX. But it likes Linux best.
Unless told not to via the --raw flag,tacg now automagically translates most ASCII formats (Genbank, FASTA, etc) via Jim Knight's SEQIO library and now handles multiple sequences at one time, internally converting 'u's to 't's. It considers both strands at the same time so you don't have to manually reverse complement the sequence and will by default accept all IUPAC degeneracies (y r m k w s b d h v), performing all possible operations on that sequence. It treats degeneracies in the input sequence in different ways depending on the -D flag (see below). It either strips all letters other than a c g t and analyzes the sequence as 'pure' using a fast incremental hashing algorithm or it treats it as degenerate and analyses it via a slower algorithm. By default, it treats it as 'pure' unless it detects an IUPAC degeneracy, in which case it will adaptively switch back and forth between the fast and slow hashing routines. See also RELATED PROGRAMS at bottom.
NB: tacg can produce lots of output; while it's possible to pipe direct to lp/lpr, you'll probably regret it.
csh %> setenv TACGLIB /usr/local/lib/tacg [csh/tcsh]
bash #>export TACGLIB=/usr/local/lib/tacg [bash]
Flag | Value | Explanation |
---|---|---|
-b | select the beginning of a subsequence from a larger sequence file; 1* for 1st base of sequence. In the Linear Map output, the upper label indicates numbering from beginning of subsequence; the lower label indicates numbering from the beginning of the entire sequence. The SMALLEST SEQUENCE that tacg can handle is 4 bases (10 for the ladder map (-l)). This allows analysis of primers and linkers. | |
-e | select the end of a subsequence from a larger sequence file; 0* for last base of sequence. This subsequence can also be made circular via the -f flag. The largest sequence that tacg can handle depends on how much memory you have, although for practical purposes, assume 1 billion bases. | |
-c | order the output by # of cuts/fragments by each RE (Strider style) and thence alphabetically; otherwise output is by order of appearance in the REBASE file. | |
--clone | (all integers) |
Clone finds sequence ranges which either MUST NOT be cut (#_#) or that MUST be cut (#x#), up to a maximum of 15 at once. Ranges not specified can be either cut or not cut. The output first lists all REs (if any) which match ALL the rules, then all REs which match SOME rules as long as all NO-CUT rules are respected. The same filters that work in other RE selections (-n, -o, -m, -M, --cost, --dam/dcm) can be applied here to fine-tune the selection. |
-C | Codon Usage table to use for translation:
0 Standard 6 Echino_Mito 12 Blepharisma 1 Vert_Mito 7 Euplotid_Nuclear 13 Chloro_mito 2 Yeast_Mito 8 Bacterial 14 Trematode_mito 3 Mold_Mito 9 Alt_Yeast 15 Scenedes_mito 4 Invert_Mito 10 Ascidian_Mito 16 Thrausto_mito 5 Ciliate_Mito 11 Alt_Flatworm_mito |
|
--cost | select REs by their cost (units/$ - >100 is cheap; <10 is v. expensive) | |
--dam | simulate cutting in the presence of Dam methylase (GmATC). rebase.dam contains all REs that are Dam-sensitive. | |
--dcm | simulate cutting in the presence of Dcm methylase (CmCWGG). rebase.dcm contains all REs that are Dcm-sensitive. | |
-D | Degeneracy flag - controls input and analysis of degenerate
sequence input where:
0 FORCES excl'n of degens in seq; only 'acgtu' accepted 1* cut as NONdegen unless degen's found; then cut as '-D3' 2 degen's OK; ignore in KEY, but match outside of KEY 3 degen's OK; expand in KEY, find only EXACT matches 4 degen's OK; expand in KEY, find ALL POSSIBLE matchesThe pattern matching is adaptive; given a small window of nondegenerate sequence, the algorithm will match very fast; if degenerate sequence is detected, it will switch to a slower, iterative approach. This results in speed that is proportional to degeneracy for most cases. If you have long sequences of 'n's (inserted as placekeepers, for instance), -D2 may be a better choice. In all cases, as soon as degeneracy of the KEY hexamer exceeds a compiled-in limit (usually 256-fold degeneracy), the KEY is skipped. |
|
--example | example code to show how to add your own flags and functions. Search for 'EXAMPLE' in 'SetFlags.c' and 'tacg.c' for the code. | |
-f | form (or topology) of DNA - 0 (zero) for circular; 1 for linear. This flag also operates on subsequences. | |
-F | print/sort Fragments; 0*-omit; 1-unsorted; 2-sorted; 3-both. | |
-g | specify if you want a pseudo-graphic gel map, with a low end cutoff of Lo# bases (converted to an integer multiple of 10), and (if present), a high end cutoff of Hi#. In Ver <2, the Lo# was restricted to 10 or 100; now it can be any any integer exponent of 10 (10, 100, 1000, etc), as can the Hi#. If Hi# is omitted or is larger than the sequence length, it takes the value of the sequence length. See examples below. | |
-G | Graphic data output, so (mis)named for its original use, where:
binsize = # bases for which hits should be pooled X|Y|L indicates whether the BaseBins should be on the X or Y axis or in 'Long' form where Basebins (as X) and Name data (as Y) are reiterated in 2 columns for all the Named patterns: X: BaseBins 1000 2000 3000 .. NameA 0 4 0 .. NameB 22 57 98 .. (#s = matches per bin) NameC 1 0 0 .. . Y: BaseBins NameA NameB NameC .. 1000 0 22 1 .. 2000 4 57 0 .. 3000 0 98 0 .. . L: Basebins NameA 1000 0 2000 4 . . Basebins NameB 1000 22 2000 57 . .This addresses some missing features - allows the export of hit data for the selected Names so that you can manipulate it as you wish. Like other output, it is streamed to stdout, so it's not wise to mix -G with other analyses; the lines generated (esp. w/ the X option), can be quite long and are NOT governed by the -w flag). Here's an example. |
|
-h --help |
brief help page (condensed man page). | |
-H --HTML |
generates HTML tags for inclusion into Web pages. 0 - (default) makes standalone HTML page, with header, footer, and Table of Contents. 1 - does not generate HTML page headers, only TOC, to embed in other HTML pages. |
|
-i (--idonly) |
controls output for sequences that have no hits 0 - ID line and normal output printed regardless of hits 1 - (default) ID line and normal output are printed ONLY IF there are hits. 2 - ONLY ID line is printed if there are hits. |
|
--infile |
allows those wanting to specify a file by commandline flag to do so (helps in some kinds of GUI wrapping functions) | |
-l (el) |
specify if you want a ladder map of selected enzymes, much like the GCG MAPPLOT output. Also appends a summary of those enzymes that match few times. This last # is length-sensitive in the distributed source code, but it is easy to set another default as a '#define' in 'tacg.h'. | |
-L | specify if you WANT a Linear map a la Strider or GCG's MAP (but better - tacg indicates the actual CUT site as opposed to the 1st base in the pattern as do other mapping programs). In Ver 3.x, the Linear Map only includes those REs or patterns which pass the filtering criteria set via the -n, -o, -m, -M, --cost, etc. | |
--strands | specifies how many strands get printed in the linear map. Allows you to slightly compact the linear map, especially when used with the --notics flag below | |
--notics | do NOT print the tics marks below the DNA in the linear map. Allows you to slightly compact the linear map, especially when used with the --strands flag above | |
--numstart | (-|+) |
the value given with this flag is the beginning number in the Linear Map (-L) output. This can be used to force a particular numbering scheme on the output or to force upstream (negative) numbering for promoters sequences. If a negative number is used, the zero position is omitted at the transition from - to +. |
-m | select enzyme by minimum # cuts in the whole sequence. Default is no minimum (ie ALL). Affects the number of enzymes displayed by the sites (-s), fragments (-F), Linear map -L, and ladder map (-l) flags. | |
-M | select enzyme by Maximum # cuts in the whole sequence. Default is 32,000. Affects the number of enzymes displayed by the sites (-s), fragments (-F), Linear map -L, and ladder map (-l) flags. | |
-n | select enzymes by magnitude of recognition site; 3 = all, 5 = 5,6,7,8... n's don't count, other degeneracies are summed ie: tgca=4, tgyrca=5, tgcnnngca=6, tannnnnnnnnnta=4 | |
-o | select enzymes by overhang generated; 5 = 5', 3 = 3', 0 for blunt, 1 for all | |
-O | ORF analysis where any frame combination can be specified ('126'
or '45' or '13456') along with the minimum ORF Size you want to detect. Produces
either a single line (if -w1 is specified)
or a block, (with the Amino Acids wrapped at the specified width) for each
ORF including:
NB: Because the output can be in a single line for each ORF, other line- oriented pattern-matching tools (grep, perl, awk) can examine the ORF generated for matching regular expressions (see the GNU grep man page for an explanation of regular expressions). In this way you can search all 6 frames of >=MinSize AAs for whatever pattern interests you. Examples: -O 145x,25 (search frames 1,4,5 with extended AA information on all ORFs > 25 AAs) -O 2,66 (search frame 2 with a min ORF size of 66 AAs) |
|
--orfmap | in conjunction with -O (above) this option draws a pseudographic character-based ORF map showing all the frames specified with -O and all the ORFs larger than the minimum size mapped to their relative positions on the map. It also prints a smilar map of all the METs and STOP codons on a similar map. | |
-p | allows entry of search patterns from the command line, where
Name = name by which pattern is labeled (<=10 chars) Pat = <30 IUPAC characters (ie. gryttcnnngt) Err = max # of errors that are tolerated (<=5) Also logs the patterns you've entered into a file tacg.patterns in the correct format for later copying to a REBASE file. Can enter up to 10 of these at a time. Patterns should consist of <=30 IUPAC bases. Long sequences with large errors will cause SUBSTANTIAL cpu and memory usage in validating the patterns. |
|
-P | [+-][lg] Dist_Lo [-Dist_Hi], NameB MBQ |
Pattern proximity matching to search for spacial relationships
between factors, 2 at a time (up to a total of 10).
NameA and NameB must be in a REBASE file, either the default rebase.data or another specified by the -R flag and are case INsensitive. NameA/B patterns can be composed of any IUPAC bases and ERRORs can be specified in the REBASE entry ie: Pit1 5 WWTATNCATW 0 2 ! a Pit1 site with 2 errors Tataa 4 TATAAWWWW 0 1 ! a Tataa site with 1 error + NameA is DOWNSTREAM of NameB (default is either)
Examples:
-PPit1,-30-2500,Tataa
|
--ps | generates a postscript plasmid map (and multiple pages with the same parameters if fed a multi- sequence file). The output file is named tacg_Map.ps and additional plots will be appended to it if it exists in the same directory. REs to be plotted can be selected with the usual parameters: (-m -M --cost --n -x -p) but you'll usually want to use -M1 or -M2. Degeneracies are plotted along the rim as grayscale arcs (remember tacg can tolerate degeneracies in sequence, so you can compose accurate plasmid maps by connecting known sequences with N's.) ORFs from any and all frames can be plotted internal to the sequence ring by using the -O flag. | |
Invokes --ps above and automatically converts the Postscript putput to Adobe's Portable Document Format, which is considerably more compact. You'll need a PDF viewer to view the results, Adobe's Acrobat Reader, xpdf, gv, or functional equivalent. Requires a working local Ghostscript installation, with gs installed at or linked to /usr/bin/gs. NB: If the standard Type1 fonts aren't installed, it will fail. | ||
--logdegens | (off by default) Using this flag forces the logging of every degeneracy in the sequence, trivial if a short sequence (<1Mb), but of concern for chromosome-sized chunks. This info will be used for drawing graphic maps of the sequence and shading degeneracies differently (invoked by --ps & --pdf above). It is quite memory intensive as it marks the beginning and end of every degeneracy run. No external data is produced, but could be as it's just a simple 2-step array. | |
-q | REMOVED |
Be quiet. DISallows sending diagnostic udp info back to author, now the default behavior (so unless you TELL the program to send data back, it won't). |
-Q | REMOVED |
Be UNquiet. Allows the program to send diagnostic udp info back
to author. In version 2.x, this was the default behavior, but it has
served its purpose, so unless you WANT me to log your usage, I won't.
Allows sending diagnostic UDP info back to author's machine. Report stream includes this info: Date Time IP# UID hardware OS OS_version TACG_version [tacg commandline] < # bases analyzed > ie. 1996-03-08 17:02:26 128.23.4.24:[uid=502 hw=i486 os=Linux osver=1.2.6] [TACG Version 1.33F] tacg -t 3 -n 6 < 434 bp > |
--raw | tells tacg to consider ALL input as valid sequence (as with version 2). instead of using SEQIO to parse the input as a standard sequence format. Useful for analyzing file fragments or editor buffers, which may be missing valid format. Note that specifying this flag will tell tacg to eat headers, comments, etc as well as sequence, if it encounters them. ALL IUPAC degeneracies will be analyzed | |
--rev | tells tacg to reverse all input sequences before analyzing it. tacg -> gcat | |
--comp | tells tacg to complement all input sequences before analyzing it. tacg -> atgc | |
--revcomp | tells tacg to reverse-complement all input sequences before analyzing it. tacg -> cgta | |
--rules | ruleB[&|^] ruleC[&|^]..,i#' MBQ |
--rule allows you to specify arbitrarily complex logical
associations of characteristics to detect the patterns that interest
you. Admittedly, that phrase is incomprehensible, so let me
give an example:
Say you wanted to search for an enhancer that you suspected might be involved in the
transcriptional regulation of a pituitary-specific gene. You knew that you were looking
for a sequence about 1000 bp long in which there were at least 2 Pit1 sites and
3-5 Estrogen response elements, but NO TATAA boxes. Pit1 0 WWTATNCATW 0 1 ! Pit1 site w/ 1 error ERE 0 GGTCAGCCTGACC 0 1 ! ERE site w/ 1 error TATAA 0 tataawwww 0 0 ! TATAA site, no errors allowedyou could specify this search by: tacg --rule '((Pit1:2:7&ERE:3:5)&(TATAA:0:0),1000)' \ This query searches a sliding window of 1000 bps (',1000') for ((2-7 Pit1 AND 3-5 ERE sites) AND (0 TATAA sites)). These combinations can be as large as your OS allows your command-line to be with arbitraily complex relations represented with logical AND (&), OR (|), and XOR (^) as conjunctions. Parens enforce groupings; otherwise it's evaluated left to right. |
--rulefile | This option allows you to read in a complete file of the kind of complex rules described above and have them all evaluated. The file format is described in the example data file supplied rules.data | |
-r (--regex) |
or 'FILE:RegexFile' MBQ |
searches for regular expressions entered from the commandline
using the 1st option or searches for the regular expressions read from
a file using the 2nd option. The regular expression syntax can be formal
regex patterns or the IUPAC'ed version thereof; the translation from one
to the other is handled automatically. ie:
gy(tt|gc)nc{2,3}m -> g[ct]\(tt\|gc\).c\{2,3\}[ca] When trying to specify a file, the term FILE must be in CAPs (so don't use 'FILE' as a pattern name). Specific regex patterns from the file can be specified by using the -x flag to name them explicitly. |
-R | MATRIX file |
specifies an alternative Restriction Enzyme file (in GCG format), regular expression or
Matrix file (in TRANSFAC format) to use. (The latest REBASE files are available
via FTP or via WWW.
The latest TRANSFAC files are also available via FTP or WWW. There are several such files included in the std distribution:
|
-s | prints the summary of site information, describing how many times each enzyme or pattern matches the sequence. Those that cut zero times are shown first. In Ver >=2, only those that match at least once are shown in the second part (the 0-matchers are not reiterated) | |
-S | prints the the actual match Sites in tabular form. | |
--silent | requests that the NA sequence be translated starting at the 1st base, in frame 1 (use -b to shift the starting base), according to the Codon Translation table specified with -C, then reverse translated, using the same table, using all the possible degeneracies, then restrict that (quite) degenerate sequence and show all the REs that will match it. You should use the L and -T flags to generate the linear map which shows both the REs and the cotranslated sequence to verify that all is as it should be. NB: Depending on Codon Table, some AAs are not reversibly translatable. Using the standard table, Arg (=mgn), Leu (=ytn), and Ser (=wsn) cannot be Forward translated from their Reverse translation. | |
--tmppath | passes the path to tacg to cooperate with CGIs or other programs that need to tell tacg where to place the ps/pdf files for access by other processes. | |
-T | In the Linear map, beneath the DNA sequence, include the translated protein in
0*, 1, 3(= frames 123), or 6 (=123456) frames of Translation
with 1 or 3 letter codes.
ie. -T 3,3 (includes frames 1,2,3 with 3 letter labels) -T 6,1 (includes frames 1,2,3,4,5,6, with 1 letter labels) |
|
-v | asks for program version (there may be multiple versions of the same functional program to track its migration. Also build date, kernel version, and GCC version used. | |
-V | Verbose - requests all kinds of diagnostic info to be spat to the screen. May be useful in diagnosing why tacg did not behave as expected..but maybe not. Higher numbers mean more output and are generally downwardly inclusive. | |
-w | output width in bp's (must be between 60* and 210, truncated to a # exactly divisible by 15 ('-w 100' will be interpreted as '-w 90') and actual printed output will be about 20 characters wider. Also applies to output of the ladder and gel maps, so if you're trying to get more accuracy and your output device can display small fonts, you may want to use this flag to widen the output. If you want as much output on one line as possible for external parsing/analysis, specify -w 1. | |
-x | NameB, NameC, NameD,...(,C)' |
used to explicitly name those enzymes or patterns to be used
in the analysis (up to a maximum of 15). Case INsensitive (HindIII=hindiii=HinDiIi),
but the name HAS to be spelled exactly like the entry in the REBASE or
MATRIX file with no spaces (HindIII != Hind III != Hind3).
The ',=' tag appended to a name indicates that it is the tagged RE in a AFLP analysis; only those fragments that have at least one end generated by the tagged RE will be shown. This has been shown to be useful in AFLP analysis. The trailing ',C', if added, requests a combined digestion using all the REs specified with this flag. Examples: -xHindIII,BamHI,NruI,C requests data for these REs both individually, and combined. -x EcoRI=,MseI,Hinf
|
-X (--extract) |
aka "--extract" eXtracts the sequence around the pattern matched, from b bases preceding, to e bases following the MIDDLE of pattern if a normal pattern, the START of the pattern if a regular expression. If the pattern is found in the bottom strand AND the last field = 1, sequence is rev-compl'ed before it's extracted so all patterns are in same orientation; if last field = 0, it is NOT reverse compl'ed. In any event, the sequences are FASTA-formatted on output. | |
-# | The percentage of the optimal matrix score that you will accept as
a match. ie. if the matrix (as below) was 10 bases long, and had a maximum
score of 69 (scoring a 100% match at each position as '1', then if you
indicated a -# 75, you would accept a score of 51.75 (69 x .75)
as a match.
a t g g c y t r g g Consensus 1 2 3 4 5 6 7 8 9 10 Position a 8 0 1 1 1 0 1 4 0 0 c 1 3 1 0 9 6 0 0 2 0 Sum of Max (bold) = 69 g 1 0 8 7 0 0 0 6 7 10 t 0 7 0 2 0 4 9 0 1 0 |
However, if an external program IS needed for format interconversion, I also strongly recommend Don Gilbert's excellent readseq program (available in source or executable via FTP. Why recommend readseq when I've used SEQIO? SEQIO is a great library of functions to use in other programs, but readseq is easier to use for stand-alone, interactive use, chiefly due to a more std interface. Both are scriptable; for scripting use, it's a toss-up.
You can also use the paging utility less to move thru your sequence file and use its marking and piping facility to punt the sequence of interest to 'tacg'. Many editors also allow piping a selection of text to an external program and inclusion of the result into another window, especially (nedit ( here is a .nedit extract that includes some tacg functions into the nedit Background (aka right-click) menu system) as well as crisp, and the ubiquitous, omnipotent emacs and its gui doppelganger xemacs).
Much of tacg's output benefits from wider-than-normal printing. The '-w#' flag allows output up to about 230 characters wide, however to print this without wrapping, you need to print in landscape mode, using very small fonts. A number of unix printing utilities allow you to do this, notably genscript aka GNU Enscript, residing in the GNU repository
Used alone:
tacg -f0 -n5 -T3,1 -sL -F3 -g 100 <input.seq.file >output.seq.file
Translation: read sequence from input.seq.file and analyze it as circular (-f0), with 5+ cutters (-n5), returning both site info and Linear map (-sL) as well as sorted and unsorted fragment data (-F3) and do 1,2,3 frame translation w/ 1 letter codes (-T3,1) on the linear map, writing the output to output.seq.file. Also, include a pseudo gel diagram for those enzymes that pass the filtering, with a low end cutoff of 100 bases (-g100).
Used to search for Matrix Matches:
tacg -# 75 -R yeast.matrices -sS < yeast.chr_4 | less
Translation: seach the file yeast.chr_4 for all the matrix definitions in the file 'yeast.matrices', with a cutoff of 75% of the maximum score possible, listing also the summary and the Sites information, piping the output to the pager less
tacg -p Pit1,tatwcata,1 -p ap2,tgygcatw,1 -w90 -sSL < rprlPromo.seq > promo.map
Translation: search for the patterns labeled Pit1 and ap2 with 1 error each and search the sequence from the file rprlPromo.seq for them, printing the results (summary (-s), Sites (S), and the Linear Map (L) 90 characters wide (-w90) to the file promo.map.
Used to search for a Regular Expression:
tacg --regex 'yadda:gm(tt|ag)ggn{3,5}tgy' -SL < some.seq | less
Translation: search the file some.seq for the regular expression gm(tt|ag)ggn{3,5}tgy, piping the information about Sites (-S) and the Linear Map (L) to the pager 'less'.
Used to search the entire yeast 500bp Upstream Regulatory sequences (a file containing 6226 500 bp sequences) for matches to the MATa1 binding site (from TRANSFAC) :
tacg -R TRANSFAC.data -sScw1 -xMATa1 -#85 < utr5_sc_500.fasta > yeast.summary
Translation: translate each of the FASTA formatted entries in the input file utr5_sc_500.fasta into usable sequence, and after finding the MATa1 (-x MATa1) matrix description from the database (-R TRANSFAC.data), search the sequences for matches at 85% of the maximum score that it has in the TRANSFAC database (-# 85), returning the summary (-s), the sites (S) sorted in Strider order (c) with results printed on 1 line (w1), directing the output into the file yeast.summary.
- tacg will not currently cut sequence shorter than 4 bases; if you need to analyze sequences shorter than this, perhaps you're using the wrong program.
- tacg has been made re-entrant for the inclusion of SEQIO and as such a number of memory leaks have been plugged (with the use of Gray Watson's excellent dmalloc library). tacg's not perfect yet but it's a lot more robust.
- the command line handling has been completely re-written, using the getopt() and getopt_long() functions, so the flags are considerably less sensitive to spacing and order.
- translation in 6 frames assumes circular sequence regardless of '-f' flag, so that the last amino acids in frames 5 and 6 in the 1st output block are obviously incorrect if you are assuming linear sequence.