Academic search engines: constraints, bugs, and recommendations

A. Rainer Zhengyang Li

Abstrak

Academic search engines (i.e., digital libraries and indexers) play an increasingly important role in systematic reviews however these engines do not seem to effectively support such reviews, e.g., researchers confront usability issues with the engines when conducting their searches. To investigate whether the usability issues are bugs (i.e., faults in the search engines) or constraints, and to provide recommendations to search-engine providers and researchers on how to tackle these issues. Using snowball-sampling from tertiary studies, we identify a set of 621 secondary studies in software engineering. By physically re-attempting the searches for all of these 621 studies, we effectively conduct regression testing for 42 search engines. We identify 13 bugs for eight engines, and also identify other constraints. We provide recommendations for tackling these issues. There is still a considerable gap between the search-needs of researchers and the usability of academic search engines. It is not clear whether search-engine developers are aware of this gap. Also, the evaluation, by academics, of academic search engines has not kept pace with the development, by search-engine providers, of those search engines. Thus, the gap between evaluation and development makes it harder to properly understand the gap between the search-needs of researchers and search-features of the search engines.

Artikel Ilmiah Terkait

Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources

Neal R Haddaway Michael Gusenbauer

28 Januari 2020

Rigorous evidence identification is essential for systematic reviews and meta‐analyses (evidence syntheses) because the sample selection of relevant studies determines a review's outcome, validity, and explanatory power. Yet, the search systems allowing access to this evidence provide varying levels of precision, recall, and reproducibility and also demand different levels of effort. To date, it remains unclear which search systems are most appropriate for evidence synthesis and why. Advice on which search engines and bibliographic databases to choose for systematic searches is limited and lacking systematic, empirical performance assessments. This study investigates and compares the systematic search qualities of 28 widely used academic search systems, including Google Scholar, PubMed, and Web of Science. A novel, query‐based method tests how well users are able to interact and retrieve records with each system. The study is the first to show the extent to which search systems can effectively and efficiently perform (Boolean) searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of search systems, meaning that their usability in systematic searches varies. Indeed, only half of the search systems analyzed and only a few Open Access databases can be recommended for evidence syntheses without adding substantial caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal search system. We call for database owners to recognize the requirements of evidence synthesis and for academic journals to reassess quality requirements for systematic reviews. Our findings aim to support researchers in conducting better searches for better evidence synthesis.

Analysis of Academic Databases for Literature Review in the Computer Science Education Field

Dilma Da Silva R. Furuta A. M. Mariano + 2 lainnya

8 Oktober 2022

Literature review is a fundamental part of a research process, and systematic protocols for this activity have been used for a long time, mainly in the field of health. Specifically in the Computer Science Education area, the use of systematic literature review has grown. One of the steps in a systematic literature review (SLR) is the selection of academic databases in which to search for articles. There are several databases with academic documents that may be relevant to SLR, for example: Google Scholar, which indexes different types of documents, such as articles, dissertations, theses, and others; Scopus and Web of Science are large databases that index articles from different conferences and journals. ACM Digital Library and IEEE Xplore are also important sources of information in the field of Computer Education. These tools have different characteristics, some charge a fee, others have only information about the title and authors and do not have access to the full article, others have advanced features, with many filters. In this context, this article presents the following research questions: RQ1) What metadata can be extracted automatically from the databases?; RQ2) What kind of visualization tools are available?; RQ3) Do the documents returned by the databases cover the research topic?; RQ4) Do the databases have papers from the main CSE venues?; and RQ5) How many databases are required to perform a literature review in CSE? To answer these questions we used five academic databases: Google Scholar, Scopus, Web of Science, ACM Digital Library, and IEEE xplore. Regarding the results, Scopus and Web of Science have the best visualization of the documents and a robust query engine, however those academic databases are not free. ACM Digital library, IEEE Xplore, Scopus and Web of Science allow the automatic download of the papers’ metadata (author, title, abstract, affiliation and others). Specifically in the field of Computer Science Education, the ACM Digital Library and the IEEE Xplore have important papers from conferences (SIGCSE and FIE) and journals (ACM Transaction on Education and IEEE Transaction on Education). In this full paper, the results will be presented to help researchers to choose the most appropriate academic databases based on their requirements and available options.

Googling for Software Development: What Developers Search For and What They Find

André C. Hora

1 Mei 2021

Developers often search for software resources on the web. In practice, instead of going directly to websites (e.g., Stack Overflow), they rely on search engines (e.g., Google). Despite this being a common activity, we are not yet aware of what developers search from the perspective of popular software development websites and what search results are returned. With this knowledge, we can understand real-world queries, developers’ needs, and the query impact on the search results. In this paper, we provide an empirical study to understand what developers search on the web and what they find. We assess 1.3M queries to popular programming websites and we perform thousands of queries on Google to explore search results. We find that (i) developers’ queries typically start with keywords (e.g., Python, Android, etc.), are short (3 words), tend to omit functional words, and are similar among each other; (ii) minor changes to queries do not largely affect the Google search results, however, some cosmetic changes may have a non-negligible impact; and (iii) search results are dominated by Stack Overflow, but YouTube is also a relevant source nowadays. We conclude by presenting detailed implications for researchers and developers.

SEGRESS: Software Engineering Guidelines for REporting Secondary Studies

D. Budgen L. Madeyski B. Kitchenham

1 Maret 2023

Context: Several tertiary studies have criticized the reporting of software engineering secondary studies. Objective: Our objective is to identify guidelines for reporting software engineering (SE) secondary studies which would address problems observed in the reporting of software engineering systematic reviews (SRs). Method: We review the criticisms of SE secondary studies and identify the major areas of concern. We assess the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement as a possible solution to the need for SR reporting guidelines, based on its status as the reporting guideline recommended by the Cochrane Collaboration whose SR guidelines were a major input to the guidelines developed for SE. We report its advantages and limitations in the context of SE secondary studies. We also assess reporting guidelines for mapping studies and qualitative reviews, and compare their structure and content with that of PRISMA 2020. Results: Previous tertiary studies confirm that reports of secondary studies are of variable quality. However, ad hoc recommendations that amend reporting standards may result in unnecessary duplication of text. We confirm that the PRISMA 2020 statement addresses SE reporting problems, but is mainly oriented to quantitative reviews, mixed-methods reviews and meta-analyses. However, we show that the PRISMA 2020 item definitions can be extended to cover the information needed to report mapping studies and qualitative reviews. Conclusions: In this paper and its Supplementary Material, we present and illustrate an integrated set of guidelines called SEGRESS (Software Engineering Guidelines for REporting Secondary Studies), suitable for quantitative systematic reviews (building upon PRISMA 2020), mapping studies (PRISMA-ScR), and qualitative reviews (ENTREQ and RAMESES), that addresses reporting problems found in current SE SRs.

Exploring Web Search Engines to Find Architectural Knowledge

Matthias Riebisch Yikun Li Mohamed Soliman + 2 lainnya

1 Maret 2021

Software engineers need relevant and up-to-date architectural knowledge (AK), in order to make well-founded design decisions. However, finding such AK is quite challenging. One pragmatic approach is to search for AK on the web using traditional search engines (e.g. Google); this is common practice among software engineers. Still, we know very little about what AK is retrieved, from where, and how useful it is. In this paper, we conduct an empirical study with 53 software engineers, who used Google to make design decisions using the AttributeDriven-Design method. Based on how the subjects assessed the nature and relevance of the retrieved results, we determined how effective web search engines are to find relevant architectural information. Moreover, we identified the different sources of AK on the web and their associated AK concepts.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.