DOAJ Data Dumps

You can access full data-dumps of the entire DOAJ public metadata below. Data dumps are generated weekly. The following files provide you complete sets of articles and journals as JSON, in the form that they would also be retrieved via the API.

Each file is a tar.gz. The data dumps are structured as follows:

When you unzip/untar the file, you will have a single directory of the form doaj_[type]_data_[date generated].

Inside that directory, you will find a list of files with names of the form [type]_batch_[number].json. For example journal_batch_3.json or article_batch_27.json.

Each file contains up to 100000 records, and is UTF-8 encoded. All files should contain the same number of records, apart from the last one, which may have fewer.

The structure of each file is as a JSON list:

    [
        { ... first record ... },
        { ... second record ... },
        { ... third record ...},
        ... etc ...
    ]
    

Records are not explicitly ordered and the order is not guaranteed to remain consistent across data dumps produced on different days.

The data structures are formatted as per the API.