Alice Fairfield

About Alice Fairfield

Alice is an Electronic Resources Librarian and Librarian for Gender and Sexual Minorities Studies.

When Everything Works But Still Fails: Federated Search Engines and Link Resolvers

As an Electronic Resources Librarian, I get to see all sorts of technical issues due to our various resources. Federated search engines and OpenURL link resolvers offer up a lot of interesting cases. But sometimes, it’s not their fault. Everything is actually working as intended, and yet the system still fails to deliver a full text article to our users.

An interesting case in point recently crossed my computer. The user had searched our federated search engine and found an article called “Music: Hip-Hop Nation” in Time magazine. The federated search engine identified that we should have full text access and provided a link. But when the user followed the link, it led to an error page.

So what happened?

Thinking that perhaps there was just an error in the link, I manually checked the source where the full text was supposed to be found. And there was indeed an entry for the article. Except, the title was “Hip-Hop Nation. (cover story).” The link resolver was trying to pass through the wrong title! But why?

That gets to the nature of how a federated search works. In order to find an article, it needs metadata about the article that tells it that it exists and where to find it. However, the companies that provide the full text don’t always share their metadata with the companies that make the federated search engines. They may be competitors, they may want money, they may want metadata in return; whatever the reason, they may not want to share their metadata.

Federated search engines get around this by, where possible, getting the metadata about an article from some third party that is more amenable to working with them. But, as in this case, just because two different places have created metadata for an article doesn’t mean they both did it in an identical way. Which leads to a metadata mismatch; the federated search engine knows about the article, and the full text source knows about the article, but they don’t agree on the details, and we end up with a frustrated user.

But in this case, that’s not the end of the story; there was yet another reason why the user couldn’t get the full text of the article; it didn’t actually exist in the full text source!

So why did the link resolver think that the full text of the article would be there?

In this case, the full text source was what’s known as an aggregator database; a resource, different from the original publisher, that aggregates content from thousands of different sources. The link resolver knows that full text exists by checking the year, volume, and issue of an article against an internal database of coverage dates for the magazine or journal where the article was published.

The problem arises, particularly with newspapers and magazines, when the original publisher may not have secured the rights to sell a particular article to the aggregator. This can mean that almost all of the content in a particular magazine issue may be available as full text, but one or two articles may be missing.

And that is likely what happened here.

Despite the federated search engine correctly identifying that the article existed, and the link resolver  identifying the correct full text source, in the end, the user left empty-handed.

Of course, that’s not the end of the story, because libraries have other tools, and in this case the user was able to request the article through our document delivery/inter-library loan service, providing a (slightly delayed) happy ending.


Citation Metrics: Trust But Verify

Citation metric: a) A numerical measure of the influence of a journal or journal article based on the number of citations.
b) An simplistic way of evaluating the value of research based on flawed numbers, that fails to take into account the actual complexities involved in evaluation, but is still seen as being valuable because it provides a clear numerical way to rank things, as the people doing the evaluation very often have neither the time nor the background to evaluate the research personally.

Citation Counts

In this post, we’ll look at the most basic citation metric of all: citation counts. Simply put, a citation count is a count of citations that have been made to <something>. The <something> could be a book, a book chapter or essay, a journal article, a white paper, a technical report, a conference paper, a video, a webpage, a blog, a newspaper article, a magazine article, a preprint, a dataset, a patent, an unpublished communication, a corporate research report, a standard, an image, a performance, and so on and so forth.

The citation could also appear in almost any one of those (though perhaps not an image!).

Where to find citation counts

It seems simple enough. We have Article A, written by one of our many notable faculty. How many times has it been cited? But of course, not all citations are equal! If we want to evaluate Professor A’s work, we only want citations made by people who are peers of Professor A; fellow researchers, academics, or professionals. We also only want citations that appear in proper research, academic, or professional publications, not popular magazines, newspapers, and the ilk, since we want to evaluate what impact Professor A’s research has had on their field.

To that end, the Library offers two commercial tools , and one free one.

Citation Tools

Web of Knowledge (also know as the Web of Science, the Science (or Social Science or Arts and Humanities) Citation Index. The oldest source of citation data (besides counting it yourself), with data going back to the 1970s in some disciplines  It has a fairly narrow focus, and only indexes citation made in what they consider the significant journals in particular fields. It doesn’t count citations made in anything that isn’t one of the significant journals. So if your article was cited in a book, a patent, a journal the WoK doesn’t consider significant, etc. the WoK doesn’t know about it.

Scopus. A newer database that has citation data for citations made in a selection of journal and conference papers published from 1996 onward. It covers quite a bit more journals than WoK, and does cover some conference proceedings, but it focuses a bit more on science, technology, and engineering disciplines. It does include some coverage of the social sciences (business and psychology predominately), and the arts and humanities, but those aren’t its strengths. Like the WoK, it doesn’t cover books, and doesn’t cover all journals.

Google Scholar. Google’s attempt to index all of the scholarly material on the Internet. It can generate citation counts, but it has a tendency to list duplicates if a particular research article is posted in multiple places, which throws off citation counts, and it can only index citations made in material that is a) on the Internet (it connects to Google Books so it’s one of the few places to find citations in books) and b) that allows Google to index the full text (or otherwise makes the list of references in an article available). The main drawback to Google Scholar is that, because it depends on publicly available data, it can be easy for the unscrupulous to manipulate.

The problems with citation counts

The core problem with citation counts is that assumption upon which it depends for value, is not entirely true. The assumption is that a work is cited because it is a worthwhile piece of research.

There are many reasons why an author may choose to cite a particular article that have nothing to do with approving of the article being cited.

Consider Fleischmann, Martin; Pons, Stanley (1989), “Electrochemically induced nuclear fusion of deuterium.” J. Electroanalytical Chemistry 261 (2A): 301–30. This article has, according to the Web of Knowledge, received over 790 citations. A naive reading of the citation count would lead one to declare that this must be a very influential article. And, in a sense, that’s correct; except for the fact that it’s one of the more famous cases of science performed badly, and that most of the citations are articles stating that they can’t duplicate the results, or that the results are wrong, or writing about it as an example of science done wrong.

Even if an article isn’t fraudulent, or wrong for more innocent reasons, there are plenty of reasons that a citing article may cite an article in order to disagree with its conclusions, particularly in the social science and humanities. These negative citations don’t necessarily diminish the worth of the article in question, but they don’t really support the idea of citation counts as a measure of worth.

There can also be political, or social reasons to cite an article, that have nothing to do with its value as a piece of research. Perhaps the author is a friend that is applying for tenure. Perhaps they did a favor for the author, and now the author is repaying them. Perhaps they want to borrow the legitimacy of a more famous author by citing them in their own work, even if it’s not particularly relevant. Perhaps they’re a grad student, and it’s been “suggested” that they cite some papers written by their advisor.

Sometimes the journals you’re publishing in can get in on the act. There are some less-than-scrupulous journals that, in order to boost their prestige, encourage authors to cite other articles within the same journal, or within another journal from the same publisher. Others  inadvertently stumble into the same patterns of distortion without intending to (particularly in very niche research areas where there may only be a few people worldwide writing on the topic).

Other issues with citation counts can be more systematic. A article written for a small, niche, audience will be cited less than one written for a broad audience, but may be (within that context) a very important, influential, article. A review article (one summing up the current state of research on a topic) will usually be cited many more times than an average original research article. Obviously, self-citations by any author on a paper should be discounted (but aren’t by any of the tools listed above). Sometimes authors will cite an article incorrectly (getting the pages wrong is common, but sometimes they get something as basic as the journal wrong!), which means that, unless you catch that, it won’t get counted.

Using citation counts

So, we’ve covered the trust part  of citation counts, which leaves us with the hard task of verifying them. Or, in other words, how do we use these flawed, easily misinterpreted numbers?

The answer is to not treat them as simple numbers with simple, obvious, meaning. There are no shortcuts when it comes to evaluating a work, whether it’s an article, or a book, or something else.

You can use the tools above to get the basic citation count for an article, but once you’ve done that, you have more work to do. You’ll need to look at the citing articles and see how they used the cited article, and then decide whether to count them. You’ll have to look at where the cited article, and the citing articles were published, their acceptance policies, and other factors. You’ll have to see what a typical citation count is for articles on the same (or similar) topic to make sure you’re correctly comparing it to its peers.

Or, you can use them without all of the supplemental analysis, but with an understanding that they are not error free, and that they have fairly large error bars. How large are those error bars? It’s difficult to estimate.

Ultimately, the most accurate way to evaluate an article, or book, or whatever, is to have someone you trust (or yourself), who understands the subject material and methodology, read the article and give their opinion. Citation counts can act as a very crude filter to find works that may be significant, but in the end, you have to dig deeper.