[Bioperl-l] Bioperl Developer snapshot 1.3.03

Heikki Lehvaslaiho heikki at ebi.ac.uk
Fri Nov 21 11:20:25 EST 2003


   Bioperl developer snap shot  1.3.03
   ---------------------------------


This is the third developer snap shot from the BioPerl CVS head
that will eventually lead to release 1.4.

http://bioperl.org/DIST/current_core_unstable.tar.gz
http://bioperl.org/DIST/bioperl-1.3.03.tar.gz

Changes since 1.3.02
--------------------

A month is far too long time between snap shots, but I found it
difficult to find time to write an overview of what has
happend. Waiting made it harder, of course, so I'll be able to just
skim the top of the changes made.  See the latter pat of the message for 
emails.

Bio::LocatableSeq now gives reasonable values to start() and end()
without manually setting them if the values can be derived from the
sequence only.

Sequence database parsers now treat virus Bio::Species entries
differently form other taxons. Since virus nomenclature does not
follow the standard genus + species format, calling binomial() on viri
is not advisable. The output will merge group name and species name,
which is usually not what you want. This might need more work in the
future.

Bio::SimpleAlign has new methods. Help appreciated there too. (see
below)

If you really want, you can now add custom translation tables into
Bio::Tools::CodonTable and create Marsian proteins.

Stefan has continued finetuning his Bio::Matrix::PSM modules. 

Number of fixes has been added to Bio::Graphics modules. Work is
under way to add SVG support.

Bio::Tools::SeqWords has a new method: count_overlap_words()

Remember: BPlite is getting superceded by SearchIO.


On behalf of the bioperl core team,

	-Heikki

NEW DIRECTORIES and FILES
=========================

* AlignIO supports now MAF format
* SeqIO knows about KEGG and TIGR formats
* Bio/Tools/Analysis/Protein::ELM for

  documentation

* two texts converted into SGML: Flat_Databases.sgml
* new HOWTO: SimpleWebAnalysis.sgml
* bioperl-live/doc/howto/txt - New directory for text-only versions of
  howtos

  examples

* sirna/rnai_finder.cgi
* db/bioflat_index.pl

  models

* popgen.dia

CHANGES
=======

+ Lots of fixes to tests 
* tests fail now cleanly when run without network access



------------------------- details ---------------------------


Bio::Align::DNAStatistics 
  code alignment formatting

Bio::AlignIO::bl2seq
  Johnathan Segal's fixes for bug #1541 - problem with reverse
  complement alignments in bl2seq

Bio::DB::Flat::BinarySearch 
  More detail on secondary namespaces

Bio::DB::Flat 
  Some -index value has to be passed, it's required

Bio::DB::GFF::Adaptor::biofetch 
  changes making genbank2gff.pl use SOFA terms for type names in
  generated GFF3

Bio::DB::GFF::Aggregator 
  fixed errors in the high-mag sequence alignments shown by the
  segments glyph

Bio::DB::GFF::Feature 
  Reworked the following methods to more closely resemble the
  corresponding Bio::SeqFeatureI methods:
    - all_tags (alias get_all_tags)
    - gff_string
    - get_tag_values
    - aliased sub_SeqFeature to get_SeqFeatures

Bio::DB::GFF::Feature 
  silence the uninitialized value error

Bio::DB::Registry 
  The HOWTO says that one should be able to use 1 or more
  seqdatabase.ini files. This is right, since the administrator could
  put one in /etc/bioinformatics and I might want my own in
  /home/bosborne/.bioinformatics. The old code was reading 1 *ini file
  and skipping the rest in OBDA_SEARCH_PATH, now it reads all the
  files specified in OBDA_SEARCH_PATH, as well as the standard
  locations.

  ActiveState has no getpwuid() so AS users can use /home/bosborne

Bio::Graphics::FeatureFile 
  - adding a symbol to access a feature's primary ID (eg, database PK)
  - remove unit variable warning when calling features() without
    arguments
  - fixed frend web-based feature renderer to accomodate recent changes
    in FeatureFile API

Bio::Graphics::Glyph::diamond 
  converted line-based outline to polygon calls

Bio::Graphics::Glyph::Factory 
  preliminary support for SVG output using GD::SVG

Bio::Graphics::Glyph::graded_segments 
  Fixed Bio::SeqFeature::Generic so that it will a

Bio::Graphics::Panel 
  preliminary support for SVG output using GD::SVG

Bio::Graphics::Glyph 
  fixed errors in the high-mag sequence alignments shown by the segments glyph

Bio::Graphics::Glyph 
  - preliminary support for SVG output using GD::SVG
  - polygon-based approach in filled_arrow to support SVG

Bio::Graphics::Glyph::generic 
  - generalized some code to support SVG output

Bio::Graphics::Glyph::segments 
  - added additional documentation for displaying multiple alignments
    with the segments glyph
  - fixed errors in the high-mag sequence alignments shown by the
    segments glyph
  - added a new "canonical_strand" option to the segments glyph

Bio::Graphics::Glyph::graded_segments 

  Fixed Bio::SeqFeature::Generic so that it will accept a score of 0;
  modified Bio::Graphics::Glyph::graded_segment so that it draws a fg
  box around each segment by default (can restore default behavior
  with -vary_fg=>1)

Bio::Graphics::Glyph::triangle 

  - more range checking on triangle glyph before fillToBorder call
  - try to fix GD buffer overrun in triangle glyph

Bio::Graphics::Glyph::xyplot 
  removed function-oriented GD calls for compatability with SVG output

Bio::Graphics::Panel 
  preliminary support for SVG output using GD::SVG

Bio::Graphics::Pictogram 
  support lowercase

Bio::LocatableSeq 
  - start() and end() now return undef if there is no sequence string
  - silence a spurious warning arising from unset strand
  - fixed trunc() when strand is -1.
    Also made end() calculate its value based on the length of the
    sequence and start.  no need to set end() expicitely any more.
  - Johnathan Segal's fixes for bug #1541 - problem with reverse
    complement alignments in bl2seq

Bio::SimpleAlign 
  adding a parser and tests for UCSC maf (multiple alignment format)
  format.

  added a method SimpleAlign::splice_by_seq_pos to allow splicing of
  all sequences based on the gap locations of one sequence within the
  alignment.  this could in principle be called repeatedly to remove
  all gaps from the MSA.

Bio::Matrix::PSM::InstanceSite PsmHeader 
  synopsis and doc fixes

Bio::Matrix::PSM::IO::mast 
  doc formatting fixes

Bio::Matrix::PSM::SiteMatrix SiteMatrixI 
  get/set method added to access accession_number

Bio::Matrix::PSM::SiteMatrix 

  Fixed bug Heikki pointed with the constructor when no input data for
  the vectors (A,G,C,T) is supplied This is still a temp solution

Bio::Matrix::PSM::SiteMatrix 
  Fixed bug Heikki pointed with the constructor when no input data for
  the vectors (A,G,C,T) is supplied This is still a temp solution

Bio::Matrix::PSM::IO::mast 

  sequence is unknown, but width is, so we supply it as 'NNN..'
  Accession number should be supplied as -accession_number

Bio::Matrix::PSM::InstanceSite 
  Bug fix: start method was overriding LocatableSeq method, and it
  shouldn't, fixed.

Bio::Matrix::PSM::IO::transfac 
  Throw exception if a position is not defined

Bio::Matrix::PSM::IO::mast meme transfac 
  Capitalization fixed when rearranging in new

Bio::Matrix::PSM::IO::mast meme transfac 
Capitalization fixed when rearranging in new

Bio::Matrix::PSM::InstanceSite 
  Bug fix: start method was overriding LocatableSeq method, and it
  shouldn't, fixed.

Bio::OntologyIO::dagflat 
  - fixes to ontology regex to parse a greater subset of DAG-Edit files.
    i have tracked down the files where DAG-Edit IDs are validated:

    GOFlatFileAdapter.java

    the regex still only matches a subset of the allowed characters in
    an identifier.  identifiers can be any non-whitespace, non ;$,:!\?
    characters > length 1 on either side of a : separator.  i've opted
    to match \w+:\w+, hopefully we don't need to go beyond this.

    adding escape of SGML and newlines/tabs.  is there a generic SGML
    escape module we want to add as a dependency?

Bio::OntologyIO
  adding escape of SGML and newlines/tabs.  is there a generic SGML
  escape module we want to add as a dependency?

Bio::Ontology::Term
  Bio::Phenotype::OMIM::OMIMentry OMIMparser finer parse the
  symptoms

Bio::PopGen Statistics
  update LD so that it will a) return an pair of values, LD and chiSQ.
  Also fix it so that composite_LD will calculate correctly with
  missing data

Bio::PrimarySeqI 
  translate() can take in a custom codon table

Bio::RangeI
  Make it so 'disconnected_ranges' sub don't cause warnings

Bio::Restriction::Analysis 
  Apply fix for bug #1548

Bio::Root::IO 
  - cleanup of debugging a little for uniformity
  - In order for rmtree() to work in cygwin

Bio::SearchIO::blastxml 
  blastxml expected <!DOCTYPE> and <BlastOutput> on the same line.  my
  version of blastall puts them on different lines, which caused the
  parse to fail (from internal refactoring of <?xml> and <!DOCTYPE>
  tags).

  this change fixes the bug.  tests added to SearchIO.t and a test
  blastxml file added.

Bio::SearchIO::Writer::GbrowseGFF 
  Gbrowse now allows tstart and tend tags for alignment features to
  make it more like normal GFF.

Bio::Seq::EncodedSeq 
  fixed strandedness issues

Bio::SeqFeature::Generic 
  It will accept a score of 0;
  modified Bio::Graphics::Glyph::graded_segment so that it draws a fg
  box around each segment by default (can restore default behavior
  with -vary_fg=>1)

Bio::SeqFeature::Tools::Unflattener 
  reuses exons (eg containment graph not a tree)

  improved algorithm for matching mRNAs with CDSs

Bio::SeqIO 
  alternate ABI extension for newer versions of software (requested by
  Jan Aerts)

Bio::SeqIO::swiss
Bio::SeqIO::genbank
Bio::SeqIO::embl 
  resoving bugzilla #1519

  1. fixed sprintf bug sometimes leading to extra space after ID tag

  2. OS line output for viri now contains all the information after
     species name. The complex strain/abbreviation/common name list is
     stored in sub_species() which was previously not in use for viri.
     This is a hack but the (first) OS line now makes a perfect round
     trip.

Bio::SeqUtils
   translate_6frames() failed on sequences where bioperl would guess
   that the sequence string is protein. Streamlined coding of the
   method to avoid guessing.

Bio::SimpleAlign 
  - offset location of new seq with features by location of original
    seq requested to build from.
  - added rudimentary key/value parsing for maf 'a' lines
  - run clean with -w on
  - cleaned up unit test spurious warnings.
  - bugfix in maf parser for detecting last record in file.

  - added functionality to trim gaps from a MSA for a given sequence
    to SimpleAlign.  trimming allowed implementation of exporting Seq
    and SeqFeatures from SimpleAlign.  the api here is still rough,
    comments appreciated.

  - added a method SimpleAlign::splice_by_seq_pos to allow splicing of
    all sequences based on the gap locations of one sequence within
    the alignment.  this could in principle be called repeatedly to
    remove all gaps from the MSA.

Bio::Species 
  commented out internal calls to methods not doing anything

Bio::Taxonomy 
  clean up the rank sets

Bio::Tools::BPlite::Iteration 
  have be set to '' instead of undef - perhaps this is not entirely
  the best thing - are we screwing up in the parsing instead?  use
  Bio::SearchIO instead I guess

Bio::Tools::BPlite 
  bug #1542 - improper detection of end of Query regexp

Bio::Tools::CodonTable
  if you know what you are doing you can add custom codon table

Bio::Tools::GFF 
  - needed to move header parsing outside of next_feature, as it may be
    useful to handle sequences before sequence features (think database
    inserts).
  - adding support for parsing GFF ##sequence-region header lines.
    these are transformed into featureless Bio::LocatableSeq objects,
    available via the next_segment method.

Bio::Tools::Phylo::PAML 
  silenced a warning reported in bugzilla #1560

Bio::Tools::Run::StandAloneBlast 
  Allow SearchIO to be used for all output format types now with
  _READMETHOD set

Bio::Tools::SeqWords 
 new method: count_overlap_words(), feature enhancement from bugzilla
 #1554

Bio::Tools::Signalp 
  add the SignalP-HMM result.
  $feat->score; # Signal peptide probability
  $feat->get_tag_values('peptideProb')->[0]; # signalp peptide probability
  $feat->get_tag_values('anchorProb')->[0]; # signalp anchor probability 

/examples/biblio
  more biblio examples

INSTALL.WIN 
  Bug 1451, PPM3 documentation wrong

scripts/Bio-DB-GFF/bp_genbank2gff.PLS 
  changes making genbank2gff.pl use SOFA terms for type names in generated 
GFF3

scripts/Bio-DB-GFF/bulk_load_gff.PLS fast_load_gff.PLS pg_bulk_load_gff.PLS 
  fixed a minor gff3 bug

scripts/Bio-DB-GFF/bulk_load_gff.PLS 
  added support for dsn strings in the form of 
"dbi:mysql:database=xxx;host=xxx"

scripts/Bio-DB-GFF/bulk_load_gff.PLS 
  added support for bulk loading from a local gff source to a remote db server

scripts/Bio-DB-GFF/fast_load_gff.PLS 
  added an option for setting MAX_BIN

scripts/Bio-DB-GFF/bulk_load_gff.PLS pg_bulk_load_gff.PLS 
  added option to set MAX_BIN, and updated the postgres loader to deal
  with gff3 (note that the gff3 stuff is completely untested though)

scripts/graphics/frend.PLS 
  Bio::Graphics::FeatureFile: remove uninit variable warning when
  calling features() without arguments; fixed frend web-based feature
  renderer to accomodate recent changes in FeatureFile API

scripts/popgen/composite_LD.PLS 
  - print with new API
  - fix to deal with newer API

scripts/utilities/search2gff.PLS 
  output 'match' and 'component' lines for GFF dumping


More information about the Bioperl-l mailing list