程序代写代做 Exercise 1: Hints & FAQ

Exercise 1: Hints & FAQ
1. Indexing
Q: I get an error about OutOfMemoryError during indexing .GOV on Windows?
You can fix this easily – on line 73 (at end) of bin/terrier.bat, remove the -Xmx512M.
Then delete the partial index in var/index and retry indexing.
As per the current exercise specification, you should also set the invertedfile.lexiconscanner=pointers property in etc/terrier.properties file (regardless of Windows/Mac/Linux) – this makes indexing more robust.
Q: Can I do the coursework at home or on my laptop?
We provide a working environment in the lab, however you may conduct these experiments at home or on your laptop. Indeed, you can still access the University filestore from Eduroam or, from home, when connected to the University VPN. 
Indeed, you can connect to \\file-alpha.campus.gla.ac.uk\csdata using your GUID credentials directly from your laptop when connected to Eduroam or VPN. On Windows, you need to Map Network Drive; on Mac, in Finder click Go.. Connect to Server and enter smb://file-alpha.campus.gla.ac.uk/csdata. You only need to connect to copy the collection, or to index.
Please remember that you must delete everything related to the collection after the completion of the exercise expected to be on 13th March 2020.
Q: I get lots of warnings about empty documents during indexing .GOV.
This is expected. There are many files in the collection. Each file contains hundreds of Web documents, but some documents do not contain any terms. Hence, it is expected that each file produces a few warnings about empty documents.

2. Retrieval
Q: My performance is very high!
You are not using the correct qrels file.
Q: I am spending a lot of time editing the terrier.properties file!
You can specify properties on the command line, e.g.
           bin/terrier batchevaluate –Dtrec.qrels=/path/to/qrels
Q: How can I see what terms are added when running query expansion?
The expanded query is logged by Terrier – look for “NEWQUERY” in the console.

3. Coding
Q: How do I compile a new weighting model etc.
This is detailed in the README.md of Github sample code project – see https://github.com/cmacdonald/IRcourseHM
NB: There is NO NEED to recompile Terrier.
You should create two new WeightingModels – one for SimpleTF_IDF and one for VSM. 
Q: Can I use Eclipse or IntelliJ?
Yes. This is detailed in the README.md of Github sample code project. Note that the lab machines do not have IntelliJ installed. 
Note also that you should run your experiments form the commandline using the shell scripts such as bin/terrier.
Q: I cannot use Maven from the commandline
The lab machines don’t have Maven installed. You can install it if you wish to use it.

4. Others, including Linux
Q: I get permission denied when I access files in the Resources directory or TopicsQrels/training etc?
This is deliberate. You don’t need these files for Exercise 1.
Q: I cannot see the /users/level4 directory from Linux. Do I have permission?
Assuming you completed the agreement on time, then Yes. Our School Linux machines “mount” directories on demand.  If you “cd /users/level4/software/IR”, Linux will automatically mount the directory. Until you do, it might not show when using “ls”.