Docs

Public data dump

We were obliged to take down the public data dump service on Friday 26th January. This is because, due to a sudden spike, the number of data downloads was so large that it was incurring unforeseen costs. We are committed, however, to providing a data dump service but need some time to investigate a more sustainable way of delivering this data. If you have questions, please email Dominic Mitchell, Operations Manager: [email protected].

Structure

The data dumps are structured as follows:

  1. When you unzip/untar the file, you will have a single directory of the form doaj_zx[type]_data_[date generated].
  2. Inside that directory, you will find a list of files with names of the form [type]_batch_[number].json.
    • For example, journal_batch_3.json or article_batch_27.json.
  3. Each file contains up to 100,000 records and is UTF-8 encoded. All files should contain the same number of records, apart from the last one, which may have fewer.
  4. The structure of each file is as a JSON list: [ { ... first record ... }, { ... second record ... }, { ... third record ...}, ... etc ... ]
  5. Records are not explicitly ordered and the order is not guaranteed to remain consistent across data dumps produced on different days.