🚀 Building the Knowledge Base of Tomorrow: Our Systems Started Crawling the Scientific Landscape! One of the most significant challenges of combining scientific research and artificial intelligence is the amount of data to be handled. To give you a sense of scale, around 3 million scientific articles are published each year—that is over 8,000 articles per day! At first sight, that might not seem like a lot. Analyzing 8,000 articles daily is undoubtedly an impossible task for a single researcher. For us, it totals to petabytes of raw data that need to be analyzed but, most importantly, discovered at first. After extensive research—and loads of optimization—we have put our experimental "discovery cluster" into operation. After this test run, we plan to launch the final version of our cluster, which is expected to index the majority of scientific literature in a matter of weeks 🎉 . 💭 Current estimates say there are around 250 million scientific articles in existence. We ourselves are very interested in how much larger (or smaller) the final number will turn out. Do you agree with the estimate or have a different number in mind? Feel free to comment. 🔔 Follow us for more updates
Yubetsu’s Post
More Relevant Posts
-
Working on a problem that has been around for hundreds of years is like climbing a mountain. In part we do it for the challenge, no doubts. However, this could be a perfect benchmark for the area of “precision machine learning”. Finding very accurate mathematical models that approximate unknown or hard to compute functions. Here we present a case study of a new technique on a problem once approached by Kepler, Gauss, Euler, Peano, Ramanujan and other great mathematicians over the past centuries.
To view or add a comment, sign in
-
Artificial Intelligence Is the Crisis We Need: Raising questions about the future of citation metrics and the effectiveness of peer review in a world where authorship may not solely reside with humans. https://lnkd.in/eC7-2imx
Artificial Intelligence Is the Crisis We Need
https://cacm.acm.org
To view or add a comment, sign in
-
I want to learn generative ai in data science.so that I can get the job where I can earn money easily
science.so - science Resources and Information.
science.so
To view or add a comment, sign in
-
When Maryam and I began exploring the emerging sociotechnical phenomenon of algorithmic bias from an information systems perspective about five years ago, we quickly realized that the concepts and theoretical foundations needed to be streamlined. To clarify these aspects for ourselves and to guide future research, we decided to conduct a comprehensive review first. We had the opportunity to participate in a fantastic paper development event hosted by the EJIS guest editors Patrick Mikalef, Kieran Conboy, Jenny Eriksson Lundström, and Aleš Popovič at ICIS 2019 in Munich. With invaluable feedback from participants, along with insights from the EJIS editors and reviewers, we refined the paper to provide theory-driven conceptualizations and an overarching framework to guide future scholarly work in this important area. It’s incredibly rewarding to see that this paper, now with over 400 citations (according to Google Scholar) and 20,000 views, has become a key driver in advancing research on algorithmic bias and fairness in information systems and beyond. I’m grateful for the continued interest in this timely and crucial topic! European Journal of Information Systems #algorithmicbias #algorithmicfairness #informationsystems #sociotechnicalresearch
Algorithmic bias: review, synthesis, and future research directions
tandfonline.com
To view or add a comment, sign in
-
When we encounter data that doesn’t immediately align with our own results, I believe it’s important not to discredit or dismiss it outright. Instead, let's consider how there is value in considering that different researchers or perspectives might be observing the same underlying phenomenon, but from different angles or contexts. Just because two sets of results differ doesn’t necessarily mean that one is wrong; rather, they might be revealing complementary aspects of a larger, more complex system. For instance, in the natural sciences—and particularly in fields like quantum mechanics and gravity—it’s common to see phenomena that manifest differently based on the scale or conditions in which they are observed. Much like how the behavior of light can be understood both as a wave and as a particle under different experiments, seemingly contradictory results might both be valid under their respective frameworks. They are not necessarily opposing truths but rather different dimensions of the same reality. This opens the door to collaborative exploration. Instead of viewing differing data as a problem or invalidation, I see it as an opportunity to deepen our understanding. When we compare and contrast different data sets, we may find that they don’t contradict each other as much as they provide unique insights into different layers or scales of the system we’re studying. In this way, diverse perspectives can complement one another, leading to a more holistic understanding of the phenomena in question. For me, it’s less about proving who is right or wrong and more about discovering how our respective insights might converge or intersect. By embracing these differences, we can piece together a broader and more unified picture of the systems we are studying. After all, science thrives on multiple perspectives, and even apparent discrepancies can lead to new discoveries when viewed as part of the same intricate puzzle.
To view or add a comment, sign in
-
A major milestone in my career this week, as I discovered that my article with Elisabetta Petrucci has been published in the Conference Proceedings of ‘Normalize’, a workshop at the 18th International Conference on Recommender Systems. Woop woop! So why is that such a big deal, you might ask? Well, in my field Arts and Humanities we don't normally publish as proceedings, we present at conference and publish afterwards in journals. Publishing as proceedings means that you have to get accepted at the conference to publish, which is difficult, but we did. A fantastic pre-conference workshop called 'Normalize', but still. And I went to Bari in Italy to present the paper, which is also a critical discussion of some of the papers previously published at this conference. Secondly, I am not a data scientist, and it felt daunting to present to a room full of data scientists. Even the format that the article had to be submitted in nearly killed me. But I did it, and am really pleased that the article is out! And the conference was a great experience and I understood nearly everything (and whenever there were too many formulas on the slides, I just sat back and enjoyed the ride) Funding for the project came from the Villum Foundation Synergy Grant, and me standing there and the article being published is more synergy than I had ever hoped for. Special thanks to Alain Starke who has been amazingly helpful along the way! And the the reviewers, who made the article much stronger with their feedback and encouragement. Find the article here:
NORMalize 2024
ceur-ws.org
To view or add a comment, sign in
-
⛳Delve into the insightful research published in #MathematicsMdpi #ProbabilityStatistics with the article “Sharper Sub-Weibull Concentrations”. The full paper is available for free at https://buff.ly/3Z7utBq This significant contribution has garnered 1913 views and 3 citations. #Distributiontheory #Parametricinference #MDPIOpenAccess #ComSciMathMdpi
Sharper Sub-Weibull Concentrations
mdpi.com
To view or add a comment, sign in
-
Thrilled to announce that my paper, "A Machine Learning Approach to Detect Collusion in Public Procurement with Limited Information," has been accepted for publication in the Journal of Computational Social Science. This research proposes a new, flexible algorithm to identify potential collusion in public procurement using readily available data on bidding outcomes. Existing methods often require in-depth data that authorities may not have access to. My findings suggest that a significant portion (4.5%) of European public procurement contracts could be susceptible to collusion. Furthermore, unsupervised clustering analysis reveals a strong correlation between contracts with a high likelihood of collusion and inflated contract prices – up to 7% higher. This research paves the way for more efficient detection of collusion in public procurement and promoting competition in public procurement. #PublicProcurement #MachineLearning #TED Self-organizing map of one of the collusion measures and cost-effectiveness measured as ratio of contract price and estimated cost:
To view or add a comment, sign in
-
-
We are delighted to announce the publication of our chapter, “Synchronization-Driven Community Detection: Dynamic Frequency Tuning Approach,” authored by Abdelmalik Moujahid and Alejandro Cervantes. This work is part of the book “Advances in Data Clustering,” edited by F. Dornaika, D. Hamad, J. Constantin, and V.T. Hoang, and published by Springer. The chapter presents a novel methodology for detecting community structures in complex networks by leveraging synchronization dynamics and dynamic frequency tuning. This approach offers new perspectives and tools for understanding the organization and behavior of complex systems.
Advances in Data Clustering
link.springer.com
To view or add a comment, sign in
-
Can we measure polarization in real time? Yes! and by doing it we can execute actions to mitigate division with the goal of having more healthy and productive conversations. In this research (one of the chapters of my PhD thesis) I developed a technique to do it based on the texts of the discussion. Using the text to measure it enables two important aspects: 1. The possibility to implement the method in almost any context as we only need the information of what is being discussed 2. A broader analysis, such as: Who is in the semantic frontier? Who is at the extremes? Who is in the center of each side? Could those people serve as bridges or to have a better understanding of the issue? How long is the semantic distance of each side? I have spent many years researching polarization and this is one of the works that makes me feel more proud. From Data Voices we are trying to implement it in a concrete product to release a tool to fight against division, but science should be open source and there are many more people more intelligent than me who can probably use it more efficiently than I. So, if you are willing to fight division, the spread of misinformation, hate speeches, and that kind of phenomenon you can use this method or contact me for any other kind of help 🤗 Paper link: https://lnkd.in/eVVnMRki
To view or add a comment, sign in
-