File Formats Tutorial

DoCTER requires input data to be in csv (comma separated values) files, one of the most widely used non-proprietary formats for tabular data. DoCTER output files are also generated in csv format. Spreadsheet data can easily be saved as csv files; similarly, csv files can be readily viewed, modified and saved in standard spreadsheet software.

DoCTER assumes the first row of the input file is reserved for column headings; however, users are required to specify column numbers rather than column headings when interacting with DoCTER (for example, if the second column contains the text of the documents, users must specify the number 2 when interacting with the tool).

A typical DoCTER input file can be visualized as a table with multiple columns:

Additional input data format requirements are available in the tutorials for each of the DoCTER functions or may be surmised from examining the dummy input data files available for each function.

Many databases of academic literature (e.g. PubMed) will readily output document search results (e.g. titles and abstracts retrieved from the literature based on keyword searches) in csv format which may be directly run through DoCTER.