Buske OJ, Hoffman MM, Ponts N, Le Roch KG, Noble WS. Oct 2011.
Exploratory analysis of genomic segmentations with Segtools.
BMC Bioinformatics, 12:415; doi:10.1186/1471-2105-12-415
Segtools is a Python package for analyzing genomic segmentations. The software efficiently calculates a variety of summary statistics and produces corresponding publication quality visualizations. The overall goal of Segtools is to provide a bird's-eye view of complex genomic data sets, allowing researchers to easily generate and confirm hypotheses.
Segmentations should be in BED4+ or GFF format, with the 'name' field of each line used specifying the segment label of that line. The Segtools commands allow you to compare the properties of the segment labels with one another.
Segtools requires the following prerequisites:
Python 2.7, with packages:
R 2.10.0+with packages:
Once these prerequisites are properly installed, install Segtools with:
pip install segtools
To upgrade an existing Segtools installation to the latest version, type the following command at the shell prompt:
pip install -U segtools
* We have only tested this software on the following platforms. We would love to extend our support to other systems in the future, and we would gladly accept any contributions toward this end.
As a last resort, or for situation in which you want to try Segtools without dealing with the hassles of heterogeneous system configurations, we have created a VirtualBox Virtual Machine, loaded with Segtools, some sample data, and all necessary prerequisites.
Warning, this is a large (551 MB) file: VM for Segtools 1.1.6
The application's documentation is available in two formats:
To stay informed of new releases and other important information, please subscribe to the segtools-announce mailing list.
There is also a segtools-users mailing list for general discussion and questions about the use of Segtools.
If you want to report a bug or request a feature, please do so using the Segtools issue tracker.
* segtools-aggregation: fix bug: --mode gene option resulted in "ufunc 'rint' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''" * segtools-aggregation: fix bug described in the Issue #44: Segtools aggregate in gene mode not necessarily pick the longest transcript. (Mickael Mendez) * replaced deprecated rpy2 calls to "set_writeconsole" * segtools-signal-distribution: Switched to using "longdouble" precision, in order to provide more accurate mean and variance estimates. (Max Libbrecht)
* add "narrowPeak" as synonym format for "bed" * segtools-overlap: fix bug where totals above 2**31 could not be calculated on 32-bit machine * segtools-signal-distribution: fix bug: --indir option resulted in "can't multiply sequence by non-int of type 'str'" error * segtools-gmtk-parameters: add R transcripts * segtools-gmtk-parameters: unescape escaped tracknames from newer versions of Segway * segtools-relabel: add color support * add segtools.get_r_dirname() function * R transcripts: add #!/usr/bin/env Rscript at top * R transcripts now get source files using get_r_dirname() * move from use of reshape to reshape2 * signal-distribution now gets a transformation as an option and apply it on data (Habil Zare) * A bug in signal-distribution is fixed that resulted in incorrect labeling. (Habil Zare) * support for Cairo (http://www.cairographics.org/) has been added for systems with the R Cairo package. Cairo does not require X11. (Jay Hesselberth & Paul Ellenbogen)
* segtools-signal-distribution: cleaned up code and arguments to only support accurate computation of statistics (and not histogram approximation) * segtools-signal-distribution: rewrote calculation loop to improve speed (now takes ~2hrs for chr1 with 93 tracks) * progress bars now include ETA.
* segtools-nucleotide-transition: significant (several-hundred-fold) speedup * segtools-transition: added R transcript * segtools-aggregation: added R transcript * segtools-overlap: Fixed bug in argument parsing that caused R plotting to fail * docs: Added high-level structured summary of the output of each command * requirements: Genomedata package now only required to use segtools-nucleotide-transition and segtools-signal-distribution, not for unrelated commands * docs: Unified usage terminology to use "annotation" and "feature" (instead of "annotations" and "entries", for example) * segtools-*: Added -R option to allow command-line specification of R options to segtools commands that plot using R.
* docs: automtically add --help output to every command * __init__.py: add gzipped pickles * __init__.py: _from_pickle: fix UnpickleError message * segtools-overlap: add R transcript * segtools-overlap: add --max-contrast option * segtools-length-distribution: allows more generic ANNOTATIONS as input * segtools-length-distribution: added --no-segments and --no-bases flags to control display on size summary plot * segtools-nucleotide-frequency: improved speed by caching whole chromosome sequence * segtools-relabel: added command to relabel a segmentation * segtools-feature-distance: added histogram visualization output * segtools-preprocess: if OUTFILE is specified, the .pkl.gz extension is still added * common.R: fix a comment character-related bug
* aggregation.R: fixed syntax error that caused segtools-aggregation to fail
* common.R: print.image: create the filepath's parent directory, if it doesn't already exist * fix bug related to: allow comment character of # in mnemonics files, and automatically add a comment to generated mnemonic files * segtools-gmtk-parameters: fix bug related to: doesn't generate hierarchical mnemonics when Segway subseg is used but has cardinality 1
* allow comment character of # in mnemonics files, and automatically add a comment to generated mnemonic files * segtools-html-report: fix some problem with os.path.samefile() (maybe related to Python 2.7+) * segtools-gmtk-parameters: doesn't generate hierarchical mnemonics when Segway subseg is used but has cardinality 1 * add requirement of numpy>=1.3 (because histogram semantics change in that version)
* segtools-signal-distribution: added --order-tracks and --order-labels options * install-script: now sets R_HOME enviroment variable which fixes some issues * docs: fixed dead mnemonic file reference * bugfix: removed 'new' argument to 'histrogram' for compatibility with newer versions of numpy * bugfix: segtools-html-report no longer crashes when the mnemonic file is already in the place where it would be copied * bugfix: fixed segtools-flatten unpacking error (when --filter option was not specified)
* Docs: added workflow flowchart * segtools-flatten: added --filter option * Bug fix: segtools-flatten now works when some files specify strand information and others don't. * Install script: now searches for R a little harder, program versions are printed when found, and more errors are caught. * eliminated exclamation marks * new autogen stuff from Sphinx to allow man page generation
* Plotting: No longer plots to screen. This makes plotting cleaner, less error-prone, but slightly slower. * Documentation: Updated to include command syntax
* Filled in large holes in documentation * Improved robustness of installation script * Renamed many of the Segtools commands for simplicity * Made BED/GFF files interchangable for most arguments * Added ability to pre-process segmentations with segtools-preprocess * Made aggregation significance non-default (since it is not yet mathematically sound). * Cleaned up command-line interfaces * Sped up aggregation