How to automate PubMed search using Perl, PHP or Java

No GoodNeed ImprovementOKGoodExcellent (1 votes, average: 5 out of 5)
Loading ... Loading ...

1, Use a Perl tool PubCrawler: http://pubcrawler.gen.tcd.ie/pubcrawler_pod.html

2, or write a program using Perl/PHP curl, Java HttpUnit

3, Or use EUtiles: eutilities are available from http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

EFetch for Literature Databases (PubMed, PubMed Central (PMC), Journals, OMIM)

EFetch: Retrieves records in the requested format from a list of one or more UIs or the user’s environment.

EFetch documentation is also available for the sequence and other molecular biology, and Taxonomy databases.


  • Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?


    URL parameters: (NOTE: Utility parameters may be case sensitive. Use lower case characters in all parameters except forWebEnv.

    Database

    db=[pubmed|pmc|journals|omim]

    pubmed - Journal publishers hold the copyright on the abstracts in PubMed. NLM provides no legal advice concerning distribution of copyrighted materials.
    pmc - PubMed Central contains a number of articles classified as “open access” for which you may download the full text as XML. For the remaining articles in PMC you may download only the abstracts as XML. See the PMC Open Access page for a description of open access terms and the list of open access journals in PMC.
    omim - The OMIM TM database including the collective data contained therein is the property of the Johns Hopkins University, which holds the copyright thereto. There are restrictions on use.

    Web Environment: Value previously returned in XML results from ESearch and EPost and used with EFetch in place of a primary UI result list.

    WebEnv=WgHmIcDG], etc.

    Query_key: The value used for a history search number or previously returned in XML results from ESearch or EPost.

    query_key=6

    Note: WebEnv is similar to the cookie that is set on a user’s computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.

    Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant ‘tool’ argument for all requests using the utilities.

    tool=

    E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.

    email=

    Record Identifier: UIs required if WebEnv is not used.

    id=11877539, 11822933,11871444

    Display Numbers: Used when the results from EPost or ESearch are maintained in the user’s environment. The maximum number of retrieved records is 10,000.

    retstart=x (x= sequential number of the first id retrieved - default=0 which will retrieve the first record)
    retmax=y (y= number of items retrieved - default=20)

    Retrieval Mode:

    retmode=output format

    Current values:

    xml (not journals)
    html
    text
    asn.1 (not journals)

    Use your web browser’s View Page Source function to display results in xml retrieval mode.

    Retrieval Type:

    rettype=output types based on database

    Current values:

    uilist
    abstract (not omim)
    citation (not omim)
    medline (not omim)
    full (journals and omim)

    Not all Retrieval Modes are possible with all Retrieval Types.

    PubMed Options:

    uilist abstract citation medline
    xml x x* x* x*
    text x x x x
    html x x x x
    asn.1 n/a x* x* x

    x = retrieval mode available
    *returned retrieval type is the complete record in the retrieval mode
    n/a - not available

    OMIM Options: (not case sensitive)

    uilist (MIM numbers) docsum synopsis
    (Clinical synopsis)
    variants
    (Allelic Variants)
    detailed ExternalLink
    xml x x* x* x* x* x*
    text x x x x x* x*
    html x x x x x* x*
    asn.1 x* x* x* x* x* x*

    x = retrieval mode available
    *returned retrieval type is the complete record in the retrieval mode
    n/a - not available

    Examples:
    In PubMed display PMIDs 12091962 and 9997 in html retrieval mode and abstract retrieval type:
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345,9997&retmode=html&rettype=abstract

  • Get one article’s abstract from PubMed using PMID:

  • http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=18324973&retmode=html&rettype=abstract

    In PubMed display PMIDs from history statement in html retrieval mode and medline retrieval type (where x is replaced by WebEnv and query_key values):
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&WebEnv=xxxx&query_key=x&retmode=html&rettype=medline

    In PubMed display PMIDs in xml retrieval mode:
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933,11700088&retmode=xml

    In Journals display records for journal IDs 22682,21698,1490:
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=journals&id=22682,21698,1490&rettype=full

    In PubMed Central display xml (only retmode available for pmc) for ID 212403:
    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=212403

    In OMIM show the full record for MIM number 601100 as XML:
    http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=omim&id=601100&retmode=xml&rettype=full

  • Search PubMed by author:

    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retstart=0&retmax=100&usehistory=y&retmode=xml&term=Sten%20H%20Vermund[author]

    Search PubMed by keyword:

    http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retstart=0&retmax=100&usehistory=y&retmode=xml&term=polymerase

4, or SOAP: http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html

References:

http://plindenbaum.blogspot.com/2007/06/i-will-not-spam-nature-network-with.html

http://java.sun.com/developer/EJTechTips/2004/tt0730.html

http://www.cpan.org/authors/id/V/VA/VALSALAM/bibAddPubMed-0.2

Technorati :

Popularity: 2%

Leave a Comment

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word