CS代写 CSCI572/HW5 folder. Uploading a .txt file- with the

Homework 5:

Adding Spell Checking and AutoComplete to Your Search Engine

Copyright By PowCoder代写 加微信 powcoder

Objectives

o Experience using a third-party spell program

o Developing efficient methods for accomplishing autocomplete

In the previous document (AutocompleteInSolr.pdf) you saw how to enhance the Solr program

with spelling correction and an autocomplete (suggest) function. In this exercise you are asked

to use an external spelling correction program in conjunction with Solr and to enhance the

autocomplete functionality of Solr. For spelling correction, you may use an existing third-party

program adapted to your downloaded files. In the case of autocomplete you will need to enhance

your client program that communicates with Solr to deliver autocomplete suggestions to the web

interface you created in an earlier homework

Description of the Exercise

Spelling Correction: in the class lecture you saw a complete spelling correction program
developed by . The program was written in Python. For this exercise you are

welcome to use whatever third-party spelling program you wish, or you may even write your

own. Since most of you wrote your previous homework client using PHP, you may want to adopt

a version of Norvig’s spelling program written in PHP and run it on your server. You can download

the PHP version of Norvig’s spelling corrector from here:

http://www.phpclasses.org/package/4859-PHP-Suggest-corrected-spelling-text-in-pure-PHP.html#download

(you will have to register at the site before being able to download the software, registration is free)

If you prefer to use Norvig’s program in a different language, a wide variety of implementations

can be found at the bottom of this page, http://norvig.com/spell-correct.html

You should make sure to enhance your spelling correction program with a set of terms that are
specific to the news website that you are responsible for. You should make sure that common
terms such as climate, election, etc., and the terms used in the queries of homework #4 are
handled. Norvig’s spell correction program uses a text file(‘’big.txt”) to get a set of words to
calculate edit distance. For this you should create your own “big.txt” for your specified news
website. You can use any parser (our suggestion – Apache Tika) and Instructions on using
apache Tika for this purpose can be found here, (https://tika.apache.org/1.5/gettingstarted.html

Autocomplete: for the autocomplete portion of the exercise, you will have to modify your
client program, so it accepts single character insertions to the text box and returns a list of

completions/suggestions.

http://www.phpclasses.org/package/4859-PHP-Suggest-corrected-spelling-text-in-pure-PHP.html#download
http://norvig.com/spell-correct.html
https://tika.apache.org/1.5/gettingstarted.html

There are several ways to implement the autocomplete functionality while using Solr. One

possible way is to use the FuzzyLookupFactory

(https://solr.apache.org/guide/7_7/suggester.html) feature of Solr/Lucene. The

FuzzyLookupFactory creates suggestions for misspelled words in fields. It assumes that what

you’re sending as the suggest.query parameter is the beginning of the suggestion. It will match

terms in your index starting with the provided characters. So, if the query is “ca” it will return all

the words starting with “ca”, e.g. “california” and “carolina” etc. For the first character and

second character that is entered, some autocomplete suggestions should appear.

For this to work you need to enable the suggest component as described in the tutorial but add

some options.

Note: with respect to specific issues about how spelling corrections are displayed or how
autocomplete corrections are displayed you should imitate the way Google handles both. For

example, while typing in the search box, the top suggestions should automatically appear and be

updated as the user keeps typing. The spellcheck suggestion should appear at the top of the

retrieved results. If the word typed is correct no suggestion should appear at the top.

Submission Instructions

You need to place the YouTube URL in your CSCI572/HW5 folder. Uploading a .txt file- with the

link to the YouTube URL of your HW5 video- to your CSCI572/HW5 Google drive folder is

acceptable. Please refer GuidelinesVideoRecordingHW5 for more information on how to create

the Youtube video.

https://solr.apache.org/guide/7_7/suggester.html

Suggested config change for making ‘AND’ as default instead of ‘OR’ for multi-word queries in

Solr default boolean model uses OR instead of AND. So, if your query is “Elon Musk”, then the

result will match all pages which either have ” Elon ” OR ” Musk ” present and not the entire query

” Elon Musk”.To solve this problem, please do as following to set up the standard Query Parser

Parameters:

In solrconfig.xml add this line:

AND

within this tag:

and Inside default tag within requestHandler tag:

Remember to reload after editing.

Q1. Can we use default spell checker for HW5?
A. You are not supposed to use default spell-checker for Hw5.

Q2. How to handle multi word queries?
A. To handle queries with two words, please handle as follows.

Eg: You have to handle word as if you are typing the same for suggest on Solr UI.
For example:
If you type ‘new’, you may get ‘news’.
If you type ‘new ‘, you should not get ‘news’
One of the ways you could replicate this behavior:
User types ‘n’ => query for ‘n’ and display suggestions
User types ‘ne’ => query for ‘ne’ and display suggestions
User types ‘new’ => query for ‘new’ and display suggestions
User types ‘new ‘ => query for ‘new ‘ and display suggestions (suggestions should be same as
the ones from previous step)
User types ‘new y’ => query for ‘y’, append each suggestion to ‘new’, and display resulting
suggestions
User types ‘new yo’ => query for ‘yo’, append each suggestion to ‘new’, and display resulting
suggestions.

Another Explanation Way:

If you type “new “, you make get “new”, “new book”, “new york” or “new years”.

If you type “new”, you may get “news”, “newspaper”.

After you type the first word “new”, your just keep this word as the first word in your list and
you just find the query of second word and append the result after the new.

If you type “new new”, you may get “new news”, “new newspaper” (If you type “new”, you
get “news”, “newspaper”)

When the space after new is typed, the autocomplete behavior should take this under
consideration as well. Hence you have to handle “new “ and not just “new”.

Q3. Can we use solr’s inbuilt auto-complete features?

Q4. How should the spell correction and auto complete working look like.
A. Imitate googles auto complete and spell correction, your result should look like that

Q5. when using the php corrector and when loading big.txt, error log says allowed memory size
exhausted.
A. Add the following code, at the start of your php
file. This should solve it. If it still doesn’t, change the code to ” could solve it.

should also be the folder that the php is trying to write to.
Or this issue may be solved by putting the file in the same folder as your php script.

Q27. How to create our own big.txt?

A. What you are supposed to do initially is to parse all the HTML files that were provided to you

in the last assignment, parse the contents of these HTML files and store all the words found

in a file called big.txt. This file will be an input to your spell corrector program. As per the

document, you can use Apache Tika for parsing purposes.

Q28. Solr will automatically concat the multi-words in the result, like “newyork”,”newyorktimes”.

Is that acceptable?

A. No, in the video you give us, the result words should be seperated.

Q29. Unable to call suggest query with solr php client, I’m having trouble calling suggest instead

of the default “select.” The Solr call I try to make the default “select” call instead of “suggest.”

A. Try following this website -> https://skipperkongen.dk/2011/01/11/solr-with-jsonp-with-

Q30. Error 404 while searching

A. FundA 404 response from your server, most likely you need to adjust the path to match

your solr installation. Try to check your path.F

Q31. Request was blocked due to MIME type (“text/plain”) mismatch (X-Content-Type-Options:

A. Try changing `text/plain` to `application/javascript` as it’s the correct response type for

jsonp. -> https://stackoverflow.com/a/39228881

SOLR with JSONP with JQUERY

SOLR with JSONP with JQUERY


https://stackoverflow.com/a/39228881

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com