aboutsummaryrefslogtreecommitdiff
path: root/datasets/newsgroups/README
diff options
context:
space:
mode:
authorSamuel Fadel <samuelfadel@gmail.com>2016-08-19 14:20:57 -0300
committerSamuel Fadel <samuelfadel@gmail.com>2016-08-19 14:20:57 -0300
commitb255338295587246292dc978e7d4d5687ee01fb4 (patch)
tree1581b76a03f4929c5132dcb3c6920fa761f8261c /datasets/newsgroups/README
parentfbf8d82cdd3720c4bbf2a94035b6779e56d73448 (diff)
Scripts and other files for building all datasets.
Diffstat (limited to 'datasets/newsgroups/README')
-rw-r--r--datasets/newsgroups/README7
1 files changed, 7 insertions, 0 deletions
diff --git a/datasets/newsgroups/README b/datasets/newsgroups/README
new file mode 100644
index 0000000..78e92ee
--- /dev/null
+++ b/datasets/newsgroups/README
@@ -0,0 +1,7 @@
+For running the script newsgroups_extract.py we used N = 500 and topics are
+'comp.graphics', 'misc.forsale', and 'sci.med'.
+
+The randomly generated ids in our case are in the file 'newsgroups-500-3.ids'.
+
+Before running the script, be sure to generate the stopwords file using
+'stop.sh'.