Classifies a set of documents as relevant or non-relevant to a topic. Assigns discrete (1-6) priority scores to each document.
Best used for document classification and prioritization when a relatively low level of training resources are available. This function helps to:
Requires a limited set (as few as 25-50 relevant documents) of seed documents annotated as relevant to the topic of interest. If the seed studies are randomly chosen from the larger pool of unclassified documents, predictions will be unbiased.
Your browser will display a model performance table which presents various statistical performance – the most important is typically the predicted recall of the ensemble. In addition, the output csv available for download will contain all the original columns in the input csv, plus two additional output columns in ensemble mode:
Documents with a 0 score may be discarded with the expectation that the remaining documents will achieve the desired recall threshold.
Overview of Input and Output file formats.
Clusters a set of documents into a user-specified number of bins. For each bin, identifies the defining topics/keywords.
Classifies a set of documents as relevant or non-relevant to a topic. Assesses the probability of a document being relevant to the topic of interest.