[Bioperl-l] Browsing the NCBI PubMed database

Martin Senger senger@ebi.ac.uk
Mon, 15 Jul 2002 13:31:10 +0100 (BST)


Hi Guiseppe,

>
> if I go to: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
> and I enter 'bernardi' (without the quotes) in text input field I get:
>
> 1) Heinemann MB, Fernandes-Matioli FM, Cortez A, Soares RM, Sakamoto SM, 
> Bernardi F, Ito FH, Madeira AM, Richtzenhain LJ.
> Related Articles
> Genealogical analyses of rabies virus strains from Brazil based on N gene 
> alleles.
> Epidemiol Infect. 2002 Jun;128(3):503-11.
> PMID: 12113496 [PubMed - in process]
> 
> and I enter 'bernardi' (without the quotes) in text input field I get:
>...  
> Is it possible to get almost the same with the script biblio.pl ?
>
   I think/hope so :-)
   What you have described was actually a two-step process. First you 
specify 'bernardi', and second you select what articles (in pubmed 
interface you select what 20-articles long pages) you want.
   The only (but substantial) difference, comparing with biblio.pl, is
that in pubmed you get the results already formatted, but biblio.pl brings
them either as XML data (default behaviour), or prints them more or less
as a big "dump".

   Let's look how and what to do with biblio.pl and your exmaple:

Type this:
   biblio.pl -p -m3 - -find bernardi
which means:
   (-find...) find citation having anywhere 'bernardi'
   (-m3)      give me back first 3 results
   (-p)       print an ID of just created collection with all citations 
              with 'bernardi'
We do not specify an output format, therefore we get an XML. This is what 
I got:
   Looking for 'bernardi'...       Found 3866
   <MedlineCitation Status="Completed">
   <MedlineID>21335095</MedlineID>
   <PMID>11442083</PMID>
   ... (many XML lines)
   </MedlineCitation>
   1026734874996

Note the number at the end. It is a collection ID and we got it because of 
'-p' option.

Now, we can ask for another three citations (this is equivalent when you 
select the next page in pubmed web site) by typing:
   biblio.pl -m3 -i 1026734874996
And we get:
   <MedlineCitation Status="Completed">
   <MedlineID>21379327</MedlineID>
   <PMID>11487213</PMID>
   ...

If you want to have a non-XML output, you have sort of out-of-luck with
biblio,pl because this script does not have many more choices. However,
the bioperl Bio::Biblio module can convert the citation XML data into a
set of Perl objects - and the biblio,pl script prints them as a "dump".  
If you take another three citations and print them like that (using option
-Oo where 'o' stands for 'objects', btw):
   biblio.pl -m3 -Oo -i 1026734874996
we get:
   $Citation = bless( {
                        '_authors' => [
                                        bless( {
                                              '_initials' => 'L',
                                              '_lastname' => 'Bernardi',
                                              '_type' => 'PersonalName',
                                              '_forename' => 'L',
                                              '_root_verbose' => 0
                                            }, 'Bio::Biblio::Person' ),
   ...(and so on)

I am afraid that bioperl (Biblio modules) does not have any "prettyfier" 
for citations at the moment. But you can look into biblio.pl how it gets 
all citations as the perl objects and from there you can add your own code 
for formatting them - the point is that for doing that you do not need to 
parse the XML, you deal only with perl objects.

   Another remark: you can also be more specific with you search, and 
specify that 'bernardi' should be looked only in 'authors'. So the first 
invocation in biblio.pl would be:
   biblio.pl -p -m3 - -find bernardi -attrs authors
Now you get:
   Looking for 'bernardi' in attributes 'authors'...       Found 3722
   <MedlineCitation Status="Completed">
   <MedlineID>21335095</MedlineID>
   <PMID>11442083</PMID>
   ...(and so on)
Note that now you get 3722 citations (comparing with 3866 before).

   Please let me know if you need more.
   With regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger