Using hit rates from one database to audit the completeness of another

inositol hexakisphosphate should fire its publicist?

By Rick Jelliffe
May 25, 2009

Michael Koon's How good is Wikipedia's coverage of chemical compounds? uses hit rates for compounds in NIH PubMed to make lists of the most important compounds or synonyms missing from Wikipedia.

He notes that this is possible because the Wikipedia names match the NIH PubChem names.

Of course, hit rate is not a measure of importance but a measure of celebrity. And celebrity breeds celebrity, whether reinforced by Google-like effects or not. But for an audit or heuristic it seems a pretty smart thing to do. Certainly, the possibility of having objective measures to test the completeness of a topical database is intriguing, especially as we rely on topical portals like Wikipedia and Wolfram Alpha more.

