Toggle light / dark theme

Ginormous New ‘Index’ Shares Data From 100 Million Science Papers For Free

Posted in computing, science

The general index is a collection of 100+ million scientific papers that can be downloaded in 38 Terabytes. It is structured and can be searched via code.


There’s a vast amount of research out there, with the volume growing rapidly with each passing day. But there’s a problem.

Not only is a lot of the existing literature hidden behind a paywall, but it can also be difficult to parse and make sense of in a comprehensive, logical way. What’s really needed is a super-smart version of Google just for academic papers.

Enter the General Index, a new database of some 107.2 million journal articles, totaling 38 terabytes of data in its uncompressed form. It spans more than 355 billion rows of text, each featuring a key word or phrase plucked from a published paper.

Leave a Reply