How to automate PubMed search using Perl, PHP or Java
1, Use a Perl tool PubCrawler: http://pubcrawler.gen.tcd.ie/pubcrawler_pod.html
2, or write a program using Perl/PHP curl, Java HttpUnit
- Search by PMID, example: http://www.ncbi.nlm.nih.gov/pubmed/18418893
- Search by Author, example: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Bonnet%20JE%22%5BAuthor%5D
- Search by Keywords, example: http://www.ncbi.nlm.nih.gov/pubmed?term=antioxidant%20chocolate
- get Abstract: http://www.ncbi.nlm.nih.gov/entrez/queryd.fcgi?db=pubmed&cmd=Retrieve&dopt=Abstract&list_uids=18461658&itool=pubmed_docsum
- get Related Articles: http://www.ncbi.nlm.nih.gov/entrez/queryd.fcgi?itool=pubmed_Abstract&db=pubmed&cmd=Display&dopt=pubmed_pubmed&from_uid=18276748
3, Or use EUtiles: eutilities are available from http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
EFetch for Literature Databases (PubMed, PubMed Central (PMC), Journals, OMIM)
EFetch: Retrieves records in the requested format from a list of one or more UIs or the user’s environment.
EFetch documentation is also available for the sequence and other molecular biology, and Taxonomy databases.
-
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
URL parameters: (NOTE: Utility parameters may be case sensitive. Use lower case characters in all parameters except forWebEnv.
Database
db=[pubmed|pmc|journals|omim]
pubmed - Journal publishers hold the copyright on the abstracts in PubMed. NLM provides no legal advice concerning distribution of copyrighted materials.
pmc - PubMed Central contains a number of articles classified as “open access” for which you may download the full text as XML. For the remaining articles in PMC you may download only the abstracts as XML. See the PMC Open Access page for a description of open access terms and the list of open access journals in PMC.
omim - The OMIM TM database including the collective data contained therein is the property of the Johns Hopkins University, which holds the copyright thereto. There are restrictions on use.Web Environment: Value previously returned in XML results from ESearch and EPost and used with EFetch in place of a primary UI result list.
WebEnv=WgHmIcDG], etc.
Query_key: The value used for a history search number or previously returned in XML results from ESearch or EPost.
query_key=6
Note: WebEnv is similar to the cookie that is set on a user’s computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.
Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant ‘tool’ argument for all requests using the utilities.
tool=
E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
email=
Record Identifier: UIs required if WebEnv is not used.
id=11877539, 11822933,11871444
Display Numbers: Used when the results from EPost or ESearch are maintained in the user’s environment. The maximum number of retrieved records is 10,000.
retstart=x (x= sequential number of the first id retrieved - default=0 which will retrieve the first record)
retmax=y (y= number of items retrieved - default=20)Retrieval Mode:
retmode=output format
Current values:
xml (not journals)
html
text
asn.1 (not journals)Use your web browser’s View Page Source function to display results in xml retrieval mode.
Retrieval Type:
rettype=output types based on database
Current values:
uilist
abstract (not omim)
citation (not omim)
medline (not omim)
full (journals and omim)Not all Retrieval Modes are possible with all Retrieval Types.
PubMed Options:
uilist abstract citation medline xml x x* x* x* text x x x x html x x x x asn.1 n/a x* x* x x = retrieval mode available
*returned retrieval type is the complete record in the retrieval mode
n/a - not availableOMIM Options: (not case sensitive)
uilist (MIM numbers) docsum synopsis
(Clinical synopsis)variants
(Allelic Variants)detailed ExternalLink xml x x* x* x* x* x* text x x x x x* x* html x x x x x* x* asn.1 x* x* x* x* x* x* x = retrieval mode available
*returned retrieval type is the complete record in the retrieval mode
n/a - not availableExamples:
In PubMed display PMIDs 12091962 and 9997 in html retrieval mode and abstract retrieval type:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345,9997&retmode=html&rettype=abstract -
Get one article’s abstract from PubMed using PMID:
-
In PubMed display PMIDs from history statement in html retrieval mode and medline retrieval type (where x is replaced by WebEnv and query_key values):
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&WebEnv=xxxx&query_key=x&retmode=html&rettype=medlineIn PubMed display PMIDs in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933,11700088&retmode=xmlIn Journals display records for journal IDs 22682,21698,1490:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=journals&id=22682,21698,1490&rettype=fullIn PubMed Central display xml (only retmode available for pmc) for ID 212403:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=212403In OMIM show the full record for MIM number 601100 as XML:
http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=omim&id=601100&retmode=xml&rettype=full -
Search PubMed by author:
Search PubMed by keyword:
4, or SOAP: http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html
References:
http://plindenbaum.blogspot.com/2007/06/i-will-not-spam-nature-network-with.html
http://java.sun.com/developer/EJTechTips/2004/tt0730.html
http://www.cpan.org/authors/id/V/VA/VALSALAM/bibAddPubMed-0.2
Popularity: 2%


















































