aboutsummaryrefslogtreecommitdiff
path: root/datasets/newsgroups/README
blob: 78e92eea85699bb8120980a409b4b1ad6cdca9c5 (plain)
1
2
3
4
5
6
7
For running the script newsgroups_extract.py we used N = 500 and topics are
'comp.graphics', 'misc.forsale', and 'sci.med'.

The randomly generated ids in our case are in the file 'newsgroups-500-3.ids'.

Before running the script, be sure to generate the stopwords file using
'stop.sh'.