The term megajournal is used to describe publication platforms, like PLOS ONE, that claim to incorporate peer review processes and web technologies that allow fast review and publishing. These platforms also publish without the constraints of periodic issues and instead publish daily. We conducted a yearlong bibliometric profile of a sample of articles published in the first several months after the launch of PeerJ, a peer reviewed, open access publishing platform in the medical and biological sciences. The profile included a study of author characteristics, peer review characteristics, usage and social metrics, and a citation analysis. We found that about 43% of the articles are collaborated on by authors from different nations. Publication delay averaged 68 days, based on the median. Almost 74% of the articles were coauthored by males and females, but less than a third were first authored by females. Usage and social metrics tended to be high after publication but declined sharply over the course of a year. Citations increased as social metrics declined. Google Scholar and Scopus citation counts were highly correlated after the first year of data collection (Spearman rho = 0.86). An analysis of reference lists indicated that articles tended to include unique journal titles. The purpose of the study is not to generalize to other journals but to chart the origin of PeerJ in order to compare to future analyses of other megajournals, which may play increasingly substantial roles in science communication.
“What is a journal?” (Garfield, 1977, p. 6)
The term
With manuscript acceptance rates of around 70% for
Megajournals also accentuate theoretical tensions between the normative view of science (Merton, 1973a; Merton, 1973b; Merton, 1973c) and the social constructivism view of science, as outlined in Bornmann (2008) and Bornman & Marx (2012). Bornmann (2008) states that one assumption of the social constructivism view is that “scientific work is a social construction of the scientist under review and the reviewers” [emphasis original] (Bornmann, 2008, p. 31). Interestingly, authors and reviewers who participate in the
The existence of megajournals tacitly suggests dramatic changes in scientific and scholarly communication. If this is so, then a number of issues that have been explored in traditional journals need to be explored in megajournals in order to understand how particular differences are expressed. These issues include open access impact or citation advantage (Björk & Solomon, 2012; Eysenbach, 2006), open access author publication fees (Solomon & Björk, 2012), submission and acceptance rates (Opthof, Coronel, & Janse, 2000), the underrepresentation of women in science (Ceci & Williams, 2011), alternative peer review models (Birukou et al., 2011), gender author disparity (West, Jacquet, King, Correll, & Bergstrom, 2013) and order (Larivière, Ni, Gingras, Cronin, & Sugimoto, 2013), gender bias in peer review (Lloyd, 1990; Paludi & Bauer, 1983) and other forms of bias (Lee, Sugimoto, Zhang, & Cronin 2013), publication delays (Bornmann & Daniel, 2010; Björk & Solomon, 2013; Luwel & Moed, 1998; Pautasso & Schafer, 2010), journal internationality (Calver, Wardell-Johnson, Bradley, & Taplin, 2010), length or word count of review as a proxy for the quality of reviews and of journals (Bornmann, Wolf, & Daniel, 2012), inter-reviewer agreement, author and institutional prestige, and professional age (Petty, Fleming, & Fabrigar, 1999), manuscript corrections (Amat, 2008), and reviewer experience (Blackburn & Hakel, 2006).
Given that megajournals position themselves separately from traditional journals, studies of megajournals are needed to understand how their existence influences scholarly behavior and scholarly information use and to understand how they might function as disruptive forces (Cope & Kalantzis, 2009; Ewing, 2004), if they do. The increasing number of megajournals marks the present as a good time to document the beginnings of these platforms. In this spirit, the purpose of this study is (1) to examine the first articles published on
On July 20, 2013, we took a small, random sample of
The peer review histories included a document trail that contained the original manuscript submissions, the responses by the referees, the responses by the editors, the authors’ rebuttals, and final, revised submissions. Only peer review histories of accepted articles are public. Referees here are labeled first, second, and third by order listed on the public peer review history pages. Additional data included the number / count of reviewers, the count of referees who remained anonymous, the number of revisions, the standing after the first round of reviews (i.e., minor or major revisions needed), the standing after the second round of reviews, and the dates for manuscript submission, acceptance, and publication.
Author data points included author affiliations and gender.
We collected usage data and referral data for each article in the sample. These data are displayed on each article’s web page and constitute the cumulative counts of downloads, unique visitors, page views, social referrals, and top referrals and were collected throughout a one year period with a baseline count on August 20, 2013 and follow up counts every three months after: November 20, 2013, February 20, 2014, May 20, 2014, and August 20, 2014.
Reference lists for each of the sampled articles were harvested in order to analyze citing behavior. Each article’s web page was scraped using the scrapeR (Acton, 2010) library for the R programming language (R Core Team, 2014). The scraping process involved retrieving the articles in the sample and saving the results in an R list object. The list object was then parsed using XPath queries for journal and book titles in the reference lists. To parse these source types, we analyzed the source code of the sampled article web pages and noted that
To acquire a conventional view of the impact of these articles, citation counts were retrieved for each of the articles in
As an exploratory profiling, statistical analysis mainly involved descriptive statistics, Chi-square cross tabulation, and rank sum comparisons. The significance level for all tests is
The majority of the analysis was conducted using the R programming language (R Core Team, 2014). The R programming language (R Core Team, 2014) libraries that were used can be categorized into three parts: data gathering and preparation, data analysis, and data visualization. To gather and prepare the data, the following libraries were used: the
3.1. Characteristics of Peer Review
The sample contained 33 articles with open peer review histories and 16 articles with closed peer review histories, and we can reject the null hypothesis that authors have no preference for the public status of their peer review histories (
Among the publicly available peer review histories, referees varied in allowing their identities to be made public. Out of the 33 articles with publicly available review histories, nineteen referees were entirely anonymous for nine of the articles and 14 referees were entirely attributed for seven of the articles. For seventeen of the articles, the disclosure of 36 referee identities was mixed (both anonymous and public). However, we fail to reject the null hypothesis that no significant differences exist among reviewers preferences for attribution (
Each article with an open peer review history was attended to by at least one referee. Thirty-two of the articles were attended to by at least two referees. Five articles received responses by three referees. Most of the articles with open peer review histories underwent at least one round of revision, but one article was accepted
Again excluding as an outlier the single article that was accepted
Most articles with publicly available reviews had received comments from two referees, but five articles had received comments from three referees, and one had not received any comments although its peer review history was accessible. For round one of the revisions, the mean word count among all reviewers was 488.3 words per review (
3.2. Speed of Review and Publication
To determine the publication delay, or the time between submission, acceptance, and publication, we used the time stamps from each article’s web page to set the interval between the date of submission and the date of publication. We used the date of acceptance as a subinterval point. Furthermore,
Overall, the subinterval from date of submission to the date of acceptance comprised most of the post-submission time, indicating that the referee process took longer than the process to prepare the manuscript for publication. For all articles in the sample, the grand median time from submission to publication was 83 days (
There was no practical difference in the speed of review, or the time between submission to acceptance, between articles that were submitted before the launch date and articles that were submitted after launch date. For all articles in the sample, the grand median time from submission to acceptance was 47 days (
The buildup of submissions before the launch of
One component of the international status of a journal is whether a journal appeals to international collaborators (Calver, Wardell-Johnson, Bradley, & Taplin, 2010). Here we look at coauthorship as a proxy for collaboration. There was a median of 4 authors per article (
There was a statistically significant difference between articles that were authored only by men and articles that were authored by both men and women. Thirteen (26.5%) of the articles were authored only by men and 36 (73.5%) were coauthored by men and women (
Overall, mixed gender authorship outnumbered male only authorship and mixed gender authorship was slightly more common when authorship was multinational. However, it was more common for females to be first author on articles affiliated to institutions in a single nation than on articles affiliated with institutions in multiple nations; but a chi-square test showed no statistically significant difference between single or multiple institutional affiliations and the gender of the first author (
3.4. Usage and Social Referrals
The data for these metrics were collected five times. A baseline count was collected on August 20, 2013. To measure growth throughout the year, quarterly counts were collected on November 20, 2013, February 20, 2014, May 20, 2014, and August 20, 2014. The exception is the download statistics, which did not appear on the site for the first three data collection events.
3.5. Downloads, Unique Visitors, and Page Views
As of May 2014, the grand median number of cumulative downloads was 342 (
Download statistics for articles published during launch month (February 2013) and articles published after (March 2013 - July 2013). Levels indicate date of data collection. Percentage changes show the change between medians
By August 20, 2014, the grand median number of unique visitors was 703 (
As suggested by the download statistics, interest appears to be strongest after articles are published and then declines sharply throughout the year. For the cumulative number of unique visitors, the percentage increase from median number of visitors in August 2013 to median number of visitors in November 2013 was 71.86%. This dropped to a percentage increase of 22.12% for the period spanning November 2013 to February 2014, to a median percentage increase of 18.12% for the period spanning February 2014 to May 2014, and to a median percentage increase of 7.82% for the period spanning May 2014 to August 2014.
Page view results were similar. For the cumulative number of page views, the percentage increase in median number of page views in August 2013 to the median number of page views in November 2013 was 68.84%. This fell to a percentage increase of 22.77% for the period spanning November 2013 to February 2014, to a median percentage increase of 15.88% for the period spanning February 2014 to May 2014, and to a median percentage increase of 9.00% for the period spanning May 2014 to August 2014.
The Spearman
Social referrals on
Top referrals comprise article visits from any site, including social sites, blogs, web pages, and emails. In August 2013, a median of six sites contributed a median of 116.0 referrals. By May 2014, the median number of sites doubled to 12 and the median number of referrals tripled to 374. Three months later, as of August 2014, the median number of referral sites nearly tripled to 34 but the median number of referrals from these 34 sites only increased by a factor of 1.29, or from a median of 374 to a median of 483. This does not reveal a surge of new site referrals but only that top link referrals are not displayed on
[Table 2.] Median unique visitors and page views per day, controlling for days since publication
Median unique visitors and page views per day, controlling for days since publication
3.7. Bibliometric and Citation Analysis
3.7.1. Citing Sources
For all the sampled articles, published between February 2013 and July 2013, citation counts were collected from
In February 2014,
Unlike
Despite
3.7.2. Cited Sources
The
We applied the
[Table 3.] Cited title density for the most cited journal titles in PeerJ articles
Cited title density for the most cited journal titles in PeerJ articles
Figure 1 illustrates the distribution of unique journal titles among the references. The LOESS regression line shows that as the rank of a journal title decreases (that is, the overall number of times the title appearing in the reference lists decreases), the more likely the unique title will appear in a greater number of different article reference lists. However, articles that appear only once (singletons) in the entire sample will have a cited title density of 1, which is also equal to articles that, for example, appear 25 times in 25 separate articles. Given the number of singletons (
We examined a sample of articles published in the first months of
In this study, we examined the characteristics of peer review, publication delay, and author characteristics. We found evidence that there is some preference for an open peer review history among authors publishing at
Although there was a greater tendency for reviews to be public, we did not find that referees possessed a strong preference for attribution, and they often stayed anonymous. There was no statistical association between the number and the magnitude (minor/major) of revisions articles required. Publication delay was affected by the launch of the journal, and despite its overall speed, total delay remained a function of the peer review process (Amat, 2008). The reduced publication delay for the articles that were submitted post launch date highlighted a buildup of submissions before the launch date and the shortening of the overall post-submission process once
Most of the articles were coauthored by both men and women, but only a fraction were first authored by women, supporting the fractionalized authorship findings in the much larger study conducted by Larivière, Ni, Gingras, Cronin, and Sugimoto (2013).
As usage metrics declined, more conventional metrics increased. Downloads, unique visitors, and page views tended to be very high after an article’s publication but rates of increase dropped sharply over the course of the year. However, as the rate of usage declined, the articles began receiving citations in both
Out of the 973 unique journal titles, we found that as unique journal titles were cited less, their cited title density increased. Inversely, unique journal titles that were cited multiple times were often cited multiple times by fewer
This purpose of this study was to conduct an exploratory case study of one megajournal, and therefore it does not lend itself to a generalization of all journals or of journals limited to specific fields or topics. Future research should conduct cross comparisons among megajournals as well as traditional journals in order to tease out differences in article characteristics, and such studies should entail increased sample sizes and a limited number of variables. However, this study should be useful to those projects since it highlights some of the unique parameters of the megajournal.
Megajournals as born digital scholarly publication platforms provide an opportunity to understand more about scientific norms and values. That is, scientometricians can use megajournals as devices that can help reveal whether issues in scholarly communication are a function of, for example, web-based publication technologies or of scientific norms or constructs. In particular, as a device, megajournals may help information scientists discover whether the technological advantages that megajournals have, by re-imagining scholarly publishing given present day technological affordances, over traditional journals, which are often based on a print paradigm, result in different patterns of authorship, peer review, and other characteristics.