From b255338295587246292dc978e7d4d5687ee01fb4 Mon Sep 17 00:00:00 2001 From: Samuel Fadel Date: Fri, 19 Aug 2016 14:20:57 -0300 Subject: Scripts and other files for building all datasets. --- datasets/newsgroups/README | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 datasets/newsgroups/README (limited to 'datasets/newsgroups/README') diff --git a/datasets/newsgroups/README b/datasets/newsgroups/README new file mode 100644 index 0000000..78e92ee --- /dev/null +++ b/datasets/newsgroups/README @@ -0,0 +1,7 @@ +For running the script newsgroups_extract.py we used N = 500 and topics are +'comp.graphics', 'misc.forsale', and 'sci.med'. + +The randomly generated ids in our case are in the file 'newsgroups-500-3.ids'. + +Before running the script, be sure to generate the stopwords file using +'stop.sh'. -- cgit v1.2.3