From redmine at redmine.open-bio.org Thu Jul 5 13:29:35 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:29:35 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2733] (Closed) Runing unit tests where Biopthyon wasn't built from source References: Message-ID: Issue #2733 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 Looking at this afresh, I think the test suite should fail in this situation. ---------------------------------------- Bug #2733: Runing unit tests where Biopthyon wasn't built from source https://redmine.open-bio.org/issues/2733#change-15401 * Author: Bruce Southey * Status: Closed * Priority: Low * Assignee: Biopython Dev Mailing List * Category: Unit Tests * Target version: Not Applicable * URL: ---------------------------------------- If Biopython is not built from source and the tests are run from a different place than the installation, the test that use C objects fail because these are not found (an example is below). Currently the test environment uses the Biopython in the build directory. It would be nice to be able to optionally specify some other Biopython such as the installed version using say a command line argument. Example of a failure: ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py.orig", line 125, in runTest self.runSafeTest() File "run_tests.py.orig", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/__init__.py", line 10, in from KDTree import KDTree File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== ---Files-------------------------------- run_test.patch (659 Bytes) bug2733.patch (813 Bytes) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:31:57 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:31:57 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2532] (Migrated) Using IUPAC alphabets in mixed case Seq objects References: Message-ID: Issue #2532 has been updated by Peter Cock. Description updated Status changed from New to Migrated Migrated to GitHub as https://github.com/biopython/biopython/issues/1716 ---------------------------------------- Bug #2532: Using IUPAC alphabets in mixed case Seq objects https://redmine.open-bio.org/issues/2532#change-15402 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: ---------------------------------------- Bio.Alphabets.IUPAC defines a number of alphabets with defined lists of valid letters which are in upper case ONLY. Bio.Nexus and Bio.Sequencing.Phd create Seq objects which use these alphabets even with mixed case sequences. This contradicts how I think the alphabet's .letters property is intended to be used (although currently this is not enforced by the Seq object). I suggest either: (a) Bio.Nexus etc switch to using generic DNA/RNA alphabets for any Seq objects including lower case letters (or more simply, all Seq objects). (b) We add lower case and mixed case variants of the alphabet objects, and use the mixed case IUPAC alphabets in Bio.Nexus etc for the Seq objects. There is also the option of (c) Extend the existing upper case only IUPAC alphabets to include lower case too, but I fear this could have unexpected side effects (e.g. where people looping over the expected set of letters). ---Files-------------------------------- phd_alpha.patch (759 Bytes) eee.txt (1.29 KB) nexus_alphabets.patch (1.83 KB) mixed_case_IUPAC.patch (2.57 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:38:38 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:38:38 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2526] (Migrated) SeqFeature's .id property is not preserved in BioSQL References: Message-ID: Issue #2526 has been updated by Peter Cock. Description updated Status changed from New to Migrated Migrated to GitHub as https://github.com/biopython/biopython/issues/1717 ---------------------------------------- Bug #2526: SeqFeature's .id property is not preserved in BioSQL https://redmine.open-bio.org/issues/2526#change-15403 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: BioSQL * Target version: Not Applicable * URL: ---------------------------------------- As per the title, a SeqFeature's .id property is not preserved after a save/retreive in BioSQL. I found this while working on Bug 2235, where my modified "swiss" parser creates SeqRecord objects with SeqFeature object which may have their .id set. Note that in GenBank and EMBL, the SeqFeature objects do not have their id property set, and so are not affected. I need to review the BioSQL schema to see if there is a suitable field that Biopython is ignoring, and if there is, use it. If not, we can probably use a tagged qualifier - ideally with the same name as the other Bio* projects. See also test_BioSQL_SeqIO.py revision 1.17 which includes a word arround to avoid this limitation. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:40:27 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:40:27 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2489] (Closed) KDTree NN search without specifying radius References: Message-ID: Issue #2489 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 I'm going to close this since the relevant code was re-written and there has been no action on this issue since. ---------------------------------------- Bug #2489: KDTree NN search without specifying radius https://redmine.open-bio.org/issues/2489#change-15404 * Author: sam n * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.45 * URL: ---------------------------------------- All the current searches in the KDTree require specifying a radius. If you don't know what the radius is, you don't know how far to search without taking a typical estimate of the data set. I just added a function to find the nearest neighbor to a coordinate without specifying this radius up front. I made the changes on the C++ side of Biopython's KDTree. It might be useful to other people so I will post the update, which is based on http://www.google.com/codesearch?hl=en&q=+kdtree+show:b099E8j0eYY:M9X8aTw_p7E:Tn8Xj-OBPYY&sa=N&cd=4&ct=rc&cs_p=ftp://ftp.diku.dk/diku/users/martinz/tabu.tar.gz&cs_f=kdtree.c#first However, I am not currently proficient in the Python C API, so someone else may be able to write the interface in 3 minutes... ---Files-------------------------------- KDTree.cpp (19.4 KB) KDTree.h (4.03 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:43:53 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:43:53 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2681] (Migrated) BioSQL: record annotations enhancements References: Message-ID: Issue #2681 has been updated by Peter Cock. Description updated Status changed from New to Migrated Migrated to GitHub issue https://github.com/biopython/biopython/issues/1718 ---------------------------------------- Bug #2681: BioSQL: record annotations enhancements https://redmine.open-bio.org/issues/2681#change-15405 * Author: Cymon J. * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: BioSQL * Target version: Not Applicable * URL: ---------------------------------------- BioSQL storage and retrieval of record annotations. See also bug 2396. Patch fixes 3 annotations: 1) Fixed date/dates typo. 2) comment's were being stored by not retrieved - fixed with test. 3) A 'reference' annotation, even if an empty list, was being retrieved in a DBSeqRecord. Fixed so that if there are no references there is no annotation in DBSeqRecord. Other annotations: 'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not handling correctly in the test suite. 'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because the current date is entered into table if a date is not present in the record. Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not present in the loaded SeqRecord as they are grabbed from the taxon table. We can therefore ignore this specific comparision: old record absent, new record present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the retrieved DBSeqRecord: sp012, sp014, Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in bioentry, if the gi annotation is missing, which is pulled as the gi annotation. So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi (GenBank identifier). I think this is misleading; annotation 'gi' in the DBSeqRecord should really be named a more generic 'identifier'... What to do here? 'contig' is ignored by loader because it's a SeqFeature object. Is there any reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) ---Files-------------------------------- annotations1.patch (3.9 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:45:41 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:45:41 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2705] (Migrated) Nicer GC and AT content and skew functions References: Message-ID: Issue #2705 has been updated by Peter Cock. Description updated Status changed from New to Migrated Migrated to GitHub issue https://github.com/biopython/biopython/issues/1719 ---------------------------------------- Bug #2705: Nicer GC and AT content and skew functions https://redmine.open-bio.org/issues/2705#change-15406 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: ---------------------------------------- This bug started out as a discussion on Bug 2671, based on some nucleotide scoring functions in GenomeDiagram which were used for plotting sequence properties along a sequence using a sliding window. The basic underlying functions could make a nice addition under Bio.SeqUtils (rather than hiding them under Bio.Graphics.GenomeDiagram). In particular, GenomeDiagram's Utilities.py included the following (non-windowed) nucleotide composition functions: calc_gc_content - returns a float in the range 0 to 1. calc_at_content - returns a float in the range 0 to 1. calc_gc_skew - returns a float [*] calc_at_skew - returns a float [*] [*] As discussed on Bug 2671, these currently give zero if there is no AT content, which was a reasonable shortcut given these functions were originally used for plotting only. They should instead raise an exception or return None or NaN instead. Also, as implemented in GenomeDiagram, these functions do not cope with mixed case sequences (easily rectified). Also, for GC and AT content these do not deal with ambiguous nucleotides (where we could follow the existing Bio.SeqUtils convention). Bio.SeqUtils already has several related functions including: GC - returns a float (a percentage in the range 0 to 100) GC123 - returns a tuple of four floats (percentages between 0 and 100) GC_skew - returns a list of floats using a default window size of 100bp. Gives a floating point exception if there is no GC content in any window. Personally I don't like the fact that the existing GC function returns a number between 0 and 100 (rather than 0 and 1). Leighton agreed. I don't think the current GC_skew function is intuitive and doesn't cover the non-windowed use-case where you want the GC_skew of the whole sequence passed in. This is important if you want to do your own windowing (e.g. comparing GC skew of individual genes to the whole genome). Because they differ from the existing Bio.SeqUtils code, I think there is a case for adding the four non-windowed functions from GenomeDiagram's Utilities.py under Bio.SeqUtils. Each would take a single argument, a sequence (coping with a string, Seq object or MutableSeq object). I have no particularly strong views on the naming of these functions. Perhaps they could be located under a sub module like Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions in Bio.SeqUtils could be deprecated or at least declared obsolete. This would also be a good opportunity to explicitly specify what we expect to get back for the GC content when there are ambiguous nucleotides. e.g. Following Bio.SeqUtils.GC, only count C, G and S (which means C or G) (in either case) and divide by the length giving a lower bound. Here GC("ACGTN") is 40%. An alternative approach might be to treat an N as 50% GC, and H (which is A, C or T) as 66.6% GC etc, meaning GC("ACGTN") gives 50%. The same approach should be used for the AT percentage, for example the current lower bound approach would count only A, T and W characters (in either case). -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:54:18 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:54:18 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2653] (Migrated) Bio.SeqUtils.CodonUsage is not translation table aware References: Message-ID: Issue #2653 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL set to https://github.com/biopython/biopython/issues/1720 Migrated to GitHub as https://github.com/biopython/biopython/issues/1720 ---------------------------------------- Bug #2653: Bio.SeqUtils.CodonUsage is not translation table aware https://redmine.open-bio.org/issues/2653#change-15410 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: https://github.com/biopython/biopython/issues/1720 ---------------------------------------- Looking at Bio/SeqUtils/CodonUsage.py there is a hard coded dictionary SynonymousCodons, presumably for the standard genetic code. Ideally Bio.SeqUtils.CodonUsage should support any of the genetic code tables defined in Bio.Data.CodonTable, perhaps via an optional initiation argument to the CodonAdaptationIndex object. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:48:55 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:48:55 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2560] (Closed) Adding BLAST support to Bio.AlignIO References: Message-ID: Issue #2560 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 Superceeded with the introduction of Bio.SearchIO, closing issue. ---------------------------------------- Bug #2560: Adding BLAST support to Bio.AlignIO https://redmine.open-bio.org/issues/2560#change-15409 * Author: Peter Cock * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: ---------------------------------------- I think it can sometimes be useful to regard a BLAST output file as a series of pairwise alignments - and therefore it makes sense to add it to Bio.AlignIO and another input file format. http://biopython.org/wiki/AlignIO Note that the AlignIO API will not allow any "clumping" of the pairwise alignments (or HSPs in Blast terminology) according to the query or the target sequence - you just get them all one after the other. I will attach a rough Bio/AlignIO/BlastIO.py file which attempts to mimic the naming conventions in the fasta-m10 parser. This currently using Bio.Blast to do the actual parsing, and then just using the Blast results to build alignment objects with two sequences each. I suggest using the format names "blast" and "blastxml" for the plain text and XML output formats following BioPerl (although I would prefer "blast-xml" to "blastxml"), see http://www.bioperl.org/wiki/HOWTO:SearchIO#Design ---Files-------------------------------- BlastIO.py (6.28 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 13:47:37 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 13:47:37 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2677] (Closed) BioSQL seqfeature enhancements References: Message-ID: Issue #2677 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 Most of this work seems to have been merged, or was not clearly defined in BioSQL itself. I'm going to close it but would be happy to have new issues logged on GitHub for anything specific still not implemented. ---------------------------------------- Bug #2677: BioSQL seqfeature enhancements https://redmine.open-bio.org/issues/2677#change-15407 * Author: Cymon J. * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: BioSQL * Target version: Not Applicable * URL: ---------------------------------------- Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator storage and test. Added remote location storage for sub-features, and test. Ive used the "Sequence Keys" ontology for the location operator and stored loc op in the location_qualifier_value table - not sure this is right... Patches attached. ---Files-------------------------------- BioSQL.patch (9.42 KB) test_BioSQL_SeqIO.patch (4.48 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:11:36 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:11:36 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2704] (Closed) Parser for the markx10 alignment format References: Message-ID: Issue #2704 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 I'm going to close this due to lack of activity - hopefully we could parse the output from the later versions of EMBOSS. ---------------------------------------- Bug #2704: Parser for the markx10 alignment format https://redmine.open-bio.org/issues/2704#change-15412 * Author: Osvaldo Zagordi * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: ---------------------------------------- Hi, I recently wrote some code to parse the Emboss alignment format markx10 (format explained at http://emboss.sourceforge.net/docs/themes/AlignFormats.html) Since it is slightly different from the Fasta m10 (not surprising, right?) I had to adapt FastaIO.py. I thought this might eventually be included in biopython. Important: I noticed that if the alignment program exits for some reason and does not close the alignment file with two lines like these #--------------------------------------- #--------------------------------------- bad things can happen (e.g., sucking all the memory of the system)). Could it be that a similar issue applies to FastaIO parser as well? Best, Osvaldo ---Files-------------------------------- m10test.tgz (5.64 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:17:27 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:17:27 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3066] (Migrated) Iterating/looping over colums/rows of a MultipleSeqAlignment References: Message-ID: Issue #3066 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL set to https://github.com/biopython/biopython/issues/1722 Migrated to GitHub as https://github.com/biopython/biopython/issues/1722 ---------------------------------------- Bug #3066: Iterating/looping over colums/rows of a MultipleSeqAlignment https://redmine.open-bio.org/issues/3066#change-15413 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.54b * URL: https://github.com/biopython/biopython/issues/1722 ---------------------------------------- The new MultipleSeqAlignment object (like the old Alignment object it replaces) stores the rows of the alignment as SeqRecord objects. This means column based access is slow. It can often be useful to be able to iterate over the columns, and a dedicated method to do this should be faster than repeatedly accessing columns by index (either via slicing with __getitem__ or the old get_column method). A related question here is should the columns be returned as strings or as Seq objects? Possible implementation to follow as a patch... ---Files-------------------------------- align_row_col_iter.patch (3.89 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:19:12 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:19:12 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3009] (Closed) Check the FASTA m10 alignment parser works with FASTA36 References: Message-ID: Issue #3009 has been updated by Peter Cock. Status changed from New to Closed % Done changed from 0 to 100 I lost track of this and don't have those failing examples to hand. ---------------------------------------- Bug #3009: Check the FASTA m10 alignment parser works with FASTA36 https://redmine.open-bio.org/issues/3009#change-15414 * Author: Peter Cock * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: ---------------------------------------- Bill Pearson has just announced the release of FASTA36: http://faculty.virginia.edu/wrpearson/fasta/fasta36/ >From his email, > This version is a major update from FASTA version 35. > It's main new feature is the ability to report all > statistically significant alignments between a query > and library sequence (equivalent to BLAST's multiple > HSPs). All previous versions of the FASTA program > reported only the best alignment between the query > and library sequence, a serious shortcoming when > comparing a query protein to a multi-exon gene or > multi-domain protein. We need to check the FASTA36 -m 10 output, add this to our unit tests, and update our parser as required. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:26:08 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:26:08 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2929] (Migrated) NCBIXML PSI-Blast parser should gather all information from XML blastgpg output References: Message-ID: Issue #2929 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL set to https://github.com/biopython/biopython/issues/1723 Migrated to GitHub as https://github.com/biopython/biopython/issues/1723 ---------------------------------------- Bug #2929: NCBIXML PSI-Blast parser should gather all information from XML blastgpg output https://redmine.open-bio.org/issues/2929#change-15415 * Author: Miguel empty * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.52 * URL: https://github.com/biopython/biopython/issues/1723 ---------------------------------------- With the problems encountered while parsing plain text output from blastpgp, perhaps an answer would be to use the XML output of this program. The XML output seems to have evolved in recent versions of blastpgp and now all the info gets in a single proper XML file (not several concatenated files) and, in principle, it would seem that all the information in the plain text format can also be found in the XML one. I will attach an XML output for a PSI-Blast search that converges after 3 passes. ---Files-------------------------------- blast.xml (59.2 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:29:03 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:29:03 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2964] (Closed) placing x-axis of graph track at the bottom or top of the track in GenomeDiagram References: Message-ID: Issue #2964 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 I'm going to mark this as closed after years of no activity. It seemed changes within the current framework would not be easy. ---------------------------------------- Bug #2964: placing x-axis of graph track at the bottom or top of the track in GenomeDiagram https://redmine.open-bio.org/issues/2964#change-15417 * Author: Daniel Nicorici * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Other * Target version: 1.52 * URL: ---------------------------------------- By default when one uses the graph track the axis is placed automatically in the middle of the track (which is given by the mean of the all values which are plotted). It would be great if the x-axis of the graph track could be placed at the bottom of the track also and the plotting of the values could be done accordingly. This would allow one to plot for example the short-read coverage in next-gen sequencing data. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:39:26 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:39:26 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2819] (Migrated) Bio.SeqIO support for NCBI protein tables (*.ptt files) References: Message-ID: Issue #2819 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL set to https://github.com/biopython/biopython/issues/1725 Migrated to GitHub as https://github.com/biopython/biopython/issues/1725 ---------------------------------------- Bug #2819: Bio.SeqIO support for NCBI protein tables (*.ptt files) https://redmine.open-bio.org/issues/2819#change-15419 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: https://github.com/biopython/biopython/issues/1725 ---------------------------------------- On their FTP site the NCBI provide a range of files for each genome/plasmid/chromosome, e.g. ftp://ftp.ncbi.nih.gov/genomes/Protozoa/Cryptosporidium_parvum/ The *.ptt files are simple tab separated tables listing all the proteins. They correspond to the CDS features in the GenBank file. This enhancement bug is about adding "ptt" as an input file format in Bio.SeqIO (and potentially as an output format too), where a single ptt file gives a single SeqRecord object containing a SeqFeature object for each protein. The header line gives the sequence length, so an UnknownSeq can be used for the SeqRecrd's seq property. One example application of this would be to draw a GenomeDiagram showing the protein locations. This can be done using the SeqFeature objects from parsing a GenBank file, but using the ptt file will be much faster. See earlier suggestions on the mailing list (part of the GFF thread): http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005725.html http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005745.html Patch to follow... ---Files-------------------------------- ProteinTableIO.py (8.12 KB) add_ptt.patch (486 Bytes) test_ptt.patch (957 Bytes) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:57:47 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:57:47 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #3057] (Migrated) Incremental parsing in Bio.Emboss.PrimerSearch References: Message-ID: Issue #3057 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL set to https://github.com/biopython/biopython/issues/1727 Migrated to GitHub as https://github.com/biopython/biopython/issues/1727 ---------------------------------------- Bug #3057: Incremental parsing in Bio.Emboss.PrimerSearch https://redmine.open-bio.org/issues/3057#change-15423 * Author: Peter Cock * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.54b * URL: https://github.com/biopython/biopython/issues/1727 ---------------------------------------- The Bio.Emboss.PrimerSearch module has a single function "read" which loads and parses an entire output file from the EMBOSS tool primersearch into memory at once, returning what is essentially a dictionary keyed by primer name, with as values lists of amplimer information objects. Even though this still seems to work with "large" output files for thousands of primer pairs, I think it would be useful to provide an iterator function "parse" returning the amplimers for each primer. The current "read" function could be retained for backward compatibility. The parsing code itself could be extended to extract information like the forward and reverse primer sequences, where the hit (location and strand) and with how many mismatches. This information is currently all held in a long string. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:56:08 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:56:08 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2968] (Closed) Modifications to Emboss eprimer3 parser and associated files References: Message-ID: Issue #2968 has been updated by Peter Cock. Description updated Status changed from New to Closed % Done changed from 0 to 100 I'm going to mark this as closed - the original git commit has gone, and it seems this work had some influence on the main reposoitory already. Thanks! ---------------------------------------- Bug #2968: Modifications to Emboss eprimer3 parser and associated files https://redmine.open-bio.org/issues/2968#change-15422 * Author: Leighton Pritchard * Status: Closed * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: 1.52 * URL: ---------------------------------------- The existing Emboss primer3/eprimer3 code has a couple of issues, and some scope for improvement: - The existing Primer3.py parser code can only parse output when eprimer3 is applied to a single sequence. When eprimer3 is applied to multiple sequence input, it groups all primers for all sequences into a single record, which may incorrectly associate primers with the wrong sequences in downstream analysis. - The current parser lacks an iterator for iterating over multiple sequence output - The current parser creates 'ghost' primers for all primer pairs, with length zero and sequence as an empty string; it does not do this for internal oligos. A more intuitive solution might be to return None for absent primers/oligos - The current data model stores all primer data as individual attributes. It might be more useful to group the attributes of individual primers into their natural associations I have written new code for Emboss/Primer3.py that adds iterator/multiple sequence parsing functionality to the parser, and extensively revises the object model for the data. The Record and Primers objects are retained, but each primer/oligo is now represented by a Primer object that collects the relevant data together. The Record object has a new attribute that allows the sequence to be recorded directly, rather than having to be parsed from the comments attribute. The new data model retains the old attribute-based access for compatibility, but adds direct access to the Primer objects (where present) by .forward, .reverse and .oligo attributes, and by keywords. One change was required to the unit test, to account for the reporting of absent primers as None, rather than having 'null' attributes. I've added two further test output files, which may be rather large for the distribution (60kb total), and doctests that use these. The code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0 This enhancement request also relates to bug 2966. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From redmine at redmine.open-bio.org Thu Jul 5 14:53:51 2018 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 05 Jul 2018 14:53:51 +0000 Subject: [Biopython-dev] [Biopython (old issues only) - Bug #2643] (Migrated) Proposal: fastPhaseOutputIO for SeqIO References: Message-ID: Issue #2643 has been updated by Peter Cock. Description updated Status changed from New to Migrated URL changed from http://github.com/dalloliogm/biopython---popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py to https://github.com/biopython/biopython/issues/1726 Migrated to GitHub as https://github.com/biopython/biopython/issues/1726 ---------------------------------------- Bug #2643: Proposal: fastPhaseOutputIO for SeqIO https://redmine.open-bio.org/issues/2643#change-15421 * Author: Marco Dall'Olio * Status: Migrated * Priority: Normal * Assignee: Biopython Dev Mailing List * Category: Main Distribution * Target version: Not Applicable * URL: https://github.com/biopython/biopython/issues/1726 ---------------------------------------- Hi, fastPHASE is software for haplotype reconstruction and missing genotype estimation from population genetic SNP data. - http://stephenslab.uchicago.edu/software.html It is commonly used by some population genetics bioinformaticians. I had to convert the output from a fastPhase run to fasta; so I wrote a module that reads a fastPhase output file, and returns SeqRecord objects. fastPhase output contains information about SNPs and genotyping, and would probably be supported by the PopGen module that is being written for biopython. However, my module is thought to be used only to read the sequence information from the output file, and to create SeqRecord objects, ignoring any other kind of information. So, in the future we could have to fastPhaseOutputIterator-like modules, one that creates SeqRecord objects, and one other to be used in PopGen. The module has been tested with doctest. I'll attach a file with the tests along with the module. ---Files-------------------------------- fastPhaseOutputIO.py (4.19 KB) test_fastPhaseOutputIO.py (6.92 KB) biopython_seqIO_fastPhaseOutputIO_patch (698 Bytes) fastPhaseOutputIO.py (5.44 KB) test_fastPhaseOutputIO.py (6.47 KB) fastphaseoutput (1.82 KB) fastPhaseOutputIO.py (5.39 KB) test_fastPhaseOutputIO.py (6.85 KB) fastPhaseTestFiles.zip (2.67 KB) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org -------------- next part -------------- An HTML attachment was scrubbed... URL: