Classifies a set of documents as relevant or non-relevant to a topic. Assesses the probability of a document being relevant to the topic of interest.
Best used for document classification and prioritization when moderate-to-high training resources are available. This functions helps
You must have a training dataset (a set of documents annotated as being relevant or not to the topic of interest) to train the machine learning algorithm. At least 100 training documents are recommended with a minimum of 25 relevant documents; more training data will produce better results. If the training data are randomly chosen from the larger pool of unclassified documents, prediction will be unbiased.
Your browser will display a model performance table which presents various statistical performance metrics. In addition, the output csv available for download will contain all the original columns in the input csv, plus two additional output columns:
Overview of Input and Output file formats.
Clusters a set of documents into a user-specified number of bins. For each bin, identifies the defining topics/keywords.
Classifies a set of documents as relevant or non-relevant to a topic. Assigns discrete (1-6) priority scores to each document.