[Bioperl-l] PMC and EUtilities, was bioperl::db
Chris Fields
cjfields at uiuc.edu
Mon Apr 30 15:15:16 UTC 2007
Bernd,
As a pretext to this discussion, I am in the middle of refactoring
EUtilities; the next incarnation should have a similar API but will
likely set parameters via simpler methods (no need for all the getter/
setters).
You'll likely have to parse out the tags yourself, AFAIK there is no
BioPerl XML parser for PMC XML and a quick grep search turns up
nothing but PubMed parsers. If you aren't familiar with XML parsing
you could try XML::Simple to get at what you want. I would pass the
XML in as small chunks (maybe by retrieving them in batches of 100 or
less) and initially use Data::Dumper to determine the data structure
XML::Simple returns (PMC XML has attributes and elements, so the
structure will be more complex). Then just iterate through articles
and grab what you want.
I think the predominant portion of articles in PubMed Central are
free full-text access (if not all):
http://www.pubmedcentral.nih.gov/about/faq.html#q9
You can retrieve them via ftp:
ftp://ftp.ncbi.nlm.nih.gov/pub/pmc
which contains an index file of all articles and their dir. location
(the readme gives more info).
chris
On Apr 30, 2007, at 4:07 AM, Bernd Mueller wrote:
> Hello,
>
> I think so. The ids from my wanted articles are retrieved by
> Bio::DB::EUtilities::esearch. Afterwards I download the articles
> with Bio::DB::EUtilities::efetch. It is only possible to download
> in XML format from PMC. So post processing is actually needed
> because I want the articles in plain format.
>
> But I don't know why I have results of non-free articles, i.e.
> abstracts where full articles should be found with a query
> constraining to only free fulltext. In the query I limit the search
> with the filter "AND free fulltext[filter]".Probably this is a
> matter concerning not directly bioperl but the eutilities interface
> of PMC.
>
> Regards,
> Bernd
More information about the Bioperl-l
mailing list