[Bioperl-l] Bioperl Developer snapshot 1.3.03
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Fri Nov 21 11:20:25 EST 2003
Bioperl developer snap shot 1.3.03
---------------------------------
This is the third developer snap shot from the BioPerl CVS head
that will eventually lead to release 1.4.
http://bioperl.org/DIST/current_core_unstable.tar.gz
http://bioperl.org/DIST/bioperl-1.3.03.tar.gz
Changes since 1.3.02
--------------------
A month is far too long time between snap shots, but I found it
difficult to find time to write an overview of what has
happend. Waiting made it harder, of course, so I'll be able to just
skim the top of the changes made. See the latter pat of the message for
emails.
Bio::LocatableSeq now gives reasonable values to start() and end()
without manually setting them if the values can be derived from the
sequence only.
Sequence database parsers now treat virus Bio::Species entries
differently form other taxons. Since virus nomenclature does not
follow the standard genus + species format, calling binomial() on viri
is not advisable. The output will merge group name and species name,
which is usually not what you want. This might need more work in the
future.
Bio::SimpleAlign has new methods. Help appreciated there too. (see
below)
If you really want, you can now add custom translation tables into
Bio::Tools::CodonTable and create Marsian proteins.
Stefan has continued finetuning his Bio::Matrix::PSM modules.
Number of fixes has been added to Bio::Graphics modules. Work is
under way to add SVG support.
Bio::Tools::SeqWords has a new method: count_overlap_words()
Remember: BPlite is getting superceded by SearchIO.
On behalf of the bioperl core team,
-Heikki
NEW DIRECTORIES and FILES
=========================
* AlignIO supports now MAF format
* SeqIO knows about KEGG and TIGR formats
* Bio/Tools/Analysis/Protein::ELM for
documentation
* two texts converted into SGML: Flat_Databases.sgml
* new HOWTO: SimpleWebAnalysis.sgml
* bioperl-live/doc/howto/txt - New directory for text-only versions of
howtos
examples
* sirna/rnai_finder.cgi
* db/bioflat_index.pl
models
* popgen.dia
CHANGES
=======
+ Lots of fixes to tests
* tests fail now cleanly when run without network access
------------------------- details ---------------------------
Bio::Align::DNAStatistics
code alignment formatting
Bio::AlignIO::bl2seq
Johnathan Segal's fixes for bug #1541 - problem with reverse
complement alignments in bl2seq
Bio::DB::Flat::BinarySearch
More detail on secondary namespaces
Bio::DB::Flat
Some -index value has to be passed, it's required
Bio::DB::GFF::Adaptor::biofetch
changes making genbank2gff.pl use SOFA terms for type names in
generated GFF3
Bio::DB::GFF::Aggregator
fixed errors in the high-mag sequence alignments shown by the
segments glyph
Bio::DB::GFF::Feature
Reworked the following methods to more closely resemble the
corresponding Bio::SeqFeatureI methods:
- all_tags (alias get_all_tags)
- gff_string
- get_tag_values
- aliased sub_SeqFeature to get_SeqFeatures
Bio::DB::GFF::Feature
silence the uninitialized value error
Bio::DB::Registry
The HOWTO says that one should be able to use 1 or more
seqdatabase.ini files. This is right, since the administrator could
put one in /etc/bioinformatics and I might want my own in
/home/bosborne/.bioinformatics. The old code was reading 1 *ini file
and skipping the rest in OBDA_SEARCH_PATH, now it reads all the
files specified in OBDA_SEARCH_PATH, as well as the standard
locations.
ActiveState has no getpwuid() so AS users can use /home/bosborne
Bio::Graphics::FeatureFile
- adding a symbol to access a feature's primary ID (eg, database PK)
- remove unit variable warning when calling features() without
arguments
- fixed frend web-based feature renderer to accomodate recent changes
in FeatureFile API
Bio::Graphics::Glyph::diamond
converted line-based outline to polygon calls
Bio::Graphics::Glyph::Factory
preliminary support for SVG output using GD::SVG
Bio::Graphics::Glyph::graded_segments
Fixed Bio::SeqFeature::Generic so that it will a
Bio::Graphics::Panel
preliminary support for SVG output using GD::SVG
Bio::Graphics::Glyph
fixed errors in the high-mag sequence alignments shown by the segments glyph
Bio::Graphics::Glyph
- preliminary support for SVG output using GD::SVG
- polygon-based approach in filled_arrow to support SVG
Bio::Graphics::Glyph::generic
- generalized some code to support SVG output
Bio::Graphics::Glyph::segments
- added additional documentation for displaying multiple alignments
with the segments glyph
- fixed errors in the high-mag sequence alignments shown by the
segments glyph
- added a new "canonical_strand" option to the segments glyph
Bio::Graphics::Glyph::graded_segments
Fixed Bio::SeqFeature::Generic so that it will accept a score of 0;
modified Bio::Graphics::Glyph::graded_segment so that it draws a fg
box around each segment by default (can restore default behavior
with -vary_fg=>1)
Bio::Graphics::Glyph::triangle
- more range checking on triangle glyph before fillToBorder call
- try to fix GD buffer overrun in triangle glyph
Bio::Graphics::Glyph::xyplot
removed function-oriented GD calls for compatability with SVG output
Bio::Graphics::Panel
preliminary support for SVG output using GD::SVG
Bio::Graphics::Pictogram
support lowercase
Bio::LocatableSeq
- start() and end() now return undef if there is no sequence string
- silence a spurious warning arising from unset strand
- fixed trunc() when strand is -1.
Also made end() calculate its value based on the length of the
sequence and start. no need to set end() expicitely any more.
- Johnathan Segal's fixes for bug #1541 - problem with reverse
complement alignments in bl2seq
Bio::SimpleAlign
adding a parser and tests for UCSC maf (multiple alignment format)
format.
added a method SimpleAlign::splice_by_seq_pos to allow splicing of
all sequences based on the gap locations of one sequence within the
alignment. this could in principle be called repeatedly to remove
all gaps from the MSA.
Bio::Matrix::PSM::InstanceSite PsmHeader
synopsis and doc fixes
Bio::Matrix::PSM::IO::mast
doc formatting fixes
Bio::Matrix::PSM::SiteMatrix SiteMatrixI
get/set method added to access accession_number
Bio::Matrix::PSM::SiteMatrix
Fixed bug Heikki pointed with the constructor when no input data for
the vectors (A,G,C,T) is supplied This is still a temp solution
Bio::Matrix::PSM::SiteMatrix
Fixed bug Heikki pointed with the constructor when no input data for
the vectors (A,G,C,T) is supplied This is still a temp solution
Bio::Matrix::PSM::IO::mast
sequence is unknown, but width is, so we supply it as 'NNN..'
Accession number should be supplied as -accession_number
Bio::Matrix::PSM::InstanceSite
Bug fix: start method was overriding LocatableSeq method, and it
shouldn't, fixed.
Bio::Matrix::PSM::IO::transfac
Throw exception if a position is not defined
Bio::Matrix::PSM::IO::mast meme transfac
Capitalization fixed when rearranging in new
Bio::Matrix::PSM::IO::mast meme transfac
Capitalization fixed when rearranging in new
Bio::Matrix::PSM::InstanceSite
Bug fix: start method was overriding LocatableSeq method, and it
shouldn't, fixed.
Bio::OntologyIO::dagflat
- fixes to ontology regex to parse a greater subset of DAG-Edit files.
i have tracked down the files where DAG-Edit IDs are validated:
GOFlatFileAdapter.java
the regex still only matches a subset of the allowed characters in
an identifier. identifiers can be any non-whitespace, non ;$,:!\?
characters > length 1 on either side of a : separator. i've opted
to match \w+:\w+, hopefully we don't need to go beyond this.
adding escape of SGML and newlines/tabs. is there a generic SGML
escape module we want to add as a dependency?
Bio::OntologyIO
adding escape of SGML and newlines/tabs. is there a generic SGML
escape module we want to add as a dependency?
Bio::Ontology::Term
Bio::Phenotype::OMIM::OMIMentry OMIMparser finer parse the
symptoms
Bio::PopGen Statistics
update LD so that it will a) return an pair of values, LD and chiSQ.
Also fix it so that composite_LD will calculate correctly with
missing data
Bio::PrimarySeqI
translate() can take in a custom codon table
Bio::RangeI
Make it so 'disconnected_ranges' sub don't cause warnings
Bio::Restriction::Analysis
Apply fix for bug #1548
Bio::Root::IO
- cleanup of debugging a little for uniformity
- In order for rmtree() to work in cygwin
Bio::SearchIO::blastxml
blastxml expected <!DOCTYPE> and <BlastOutput> on the same line. my
version of blastall puts them on different lines, which caused the
parse to fail (from internal refactoring of <?xml> and <!DOCTYPE>
tags).
this change fixes the bug. tests added to SearchIO.t and a test
blastxml file added.
Bio::SearchIO::Writer::GbrowseGFF
Gbrowse now allows tstart and tend tags for alignment features to
make it more like normal GFF.
Bio::Seq::EncodedSeq
fixed strandedness issues
Bio::SeqFeature::Generic
It will accept a score of 0;
modified Bio::Graphics::Glyph::graded_segment so that it draws a fg
box around each segment by default (can restore default behavior
with -vary_fg=>1)
Bio::SeqFeature::Tools::Unflattener
reuses exons (eg containment graph not a tree)
improved algorithm for matching mRNAs with CDSs
Bio::SeqIO
alternate ABI extension for newer versions of software (requested by
Jan Aerts)
Bio::SeqIO::swiss
Bio::SeqIO::genbank
Bio::SeqIO::embl
resoving bugzilla #1519
1. fixed sprintf bug sometimes leading to extra space after ID tag
2. OS line output for viri now contains all the information after
species name. The complex strain/abbreviation/common name list is
stored in sub_species() which was previously not in use for viri.
This is a hack but the (first) OS line now makes a perfect round
trip.
Bio::SeqUtils
translate_6frames() failed on sequences where bioperl would guess
that the sequence string is protein. Streamlined coding of the
method to avoid guessing.
Bio::SimpleAlign
- offset location of new seq with features by location of original
seq requested to build from.
- added rudimentary key/value parsing for maf 'a' lines
- run clean with -w on
- cleaned up unit test spurious warnings.
- bugfix in maf parser for detecting last record in file.
- added functionality to trim gaps from a MSA for a given sequence
to SimpleAlign. trimming allowed implementation of exporting Seq
and SeqFeatures from SimpleAlign. the api here is still rough,
comments appreciated.
- added a method SimpleAlign::splice_by_seq_pos to allow splicing of
all sequences based on the gap locations of one sequence within
the alignment. this could in principle be called repeatedly to
remove all gaps from the MSA.
Bio::Species
commented out internal calls to methods not doing anything
Bio::Taxonomy
clean up the rank sets
Bio::Tools::BPlite::Iteration
have be set to '' instead of undef - perhaps this is not entirely
the best thing - are we screwing up in the parsing instead? use
Bio::SearchIO instead I guess
Bio::Tools::BPlite
bug #1542 - improper detection of end of Query regexp
Bio::Tools::CodonTable
if you know what you are doing you can add custom codon table
Bio::Tools::GFF
- needed to move header parsing outside of next_feature, as it may be
useful to handle sequences before sequence features (think database
inserts).
- adding support for parsing GFF ##sequence-region header lines.
these are transformed into featureless Bio::LocatableSeq objects,
available via the next_segment method.
Bio::Tools::Phylo::PAML
silenced a warning reported in bugzilla #1560
Bio::Tools::Run::StandAloneBlast
Allow SearchIO to be used for all output format types now with
_READMETHOD set
Bio::Tools::SeqWords
new method: count_overlap_words(), feature enhancement from bugzilla
#1554
Bio::Tools::Signalp
add the SignalP-HMM result.
$feat->score; # Signal peptide probability
$feat->get_tag_values('peptideProb')->[0]; # signalp peptide probability
$feat->get_tag_values('anchorProb')->[0]; # signalp anchor probability
/examples/biblio
more biblio examples
INSTALL.WIN
Bug 1451, PPM3 documentation wrong
scripts/Bio-DB-GFF/bp_genbank2gff.PLS
changes making genbank2gff.pl use SOFA terms for type names in generated
GFF3
scripts/Bio-DB-GFF/bulk_load_gff.PLS fast_load_gff.PLS pg_bulk_load_gff.PLS
fixed a minor gff3 bug
scripts/Bio-DB-GFF/bulk_load_gff.PLS
added support for dsn strings in the form of
"dbi:mysql:database=xxx;host=xxx"
scripts/Bio-DB-GFF/bulk_load_gff.PLS
added support for bulk loading from a local gff source to a remote db server
scripts/Bio-DB-GFF/fast_load_gff.PLS
added an option for setting MAX_BIN
scripts/Bio-DB-GFF/bulk_load_gff.PLS pg_bulk_load_gff.PLS
added option to set MAX_BIN, and updated the postgres loader to deal
with gff3 (note that the gff3 stuff is completely untested though)
scripts/graphics/frend.PLS
Bio::Graphics::FeatureFile: remove uninit variable warning when
calling features() without arguments; fixed frend web-based feature
renderer to accomodate recent changes in FeatureFile API
scripts/popgen/composite_LD.PLS
- print with new API
- fix to deal with newer API
scripts/utilities/search2gff.PLS
output 'match' and 'component' lines for GFF dumping
More information about the Bioperl-l
mailing list