240434 Graph

Đang Thực Hiện Đã đăng vào Jul 31, 2008 Thanh toán khi bàn giao
Đang Thực Hiện Thanh toán khi bàn giao

Summary

* Better Web Search

* Graph Representation

* Degrees of Separation

* Page Rank

* Web Search

* Implementation and Compiling

* Submission

* Points

Better Web Search

Denitsa, and Shakeel, Pai-Hsi, and Lei are planning to start a new web-search engine to rival Google. We need your help to implement and test our ideas on a graph built from a portion of the Internet. The graph is directed with an edge from page A to page B if there is at least one link from page A to page B (If there is more than one link between two pages, there is still only one edge in the web graph). For each page, we have also collected a title and a set of key phrases.

Your task is to implement four parts: Graph Building, Degrees of Separation, Page Rank, and Web Search. You will build a menu-driven program to support these four options. Below is an example run demonstrating what your program should do on the following graph:

Below is an example run. User input is in red.

> java WebSearch [url removed, login to view]

Welcome to WebSearch!

Choose an option or type 'quit' to quit:

1. Print Graph

2. Degrees of Separation

3. Page Rank

4. Web Search

> 1

Printing Graph...

0 (computer, internet, device, ipod, ipod nano): 2

1 (education, news, internet): 0,3

2 (computer, internet): 0,5

3 (university, education, computer, internet, news, power): 2,4,5,0

4 (university, education): 5

5 (power, computer, internet): 2

6 (news, internet, university): 3,4,5

7 (power, news): 1,4

Choose an option or type 'quit' to quit:

1. Print Graph

2. Degrees of Separation

3. Page Rank

4. Web Search

> 2

Title of origin page > New York Times

Title of destination page > Microsoft

Degrees of separation between pages "New York Times" and "Microsoft": 2

Choose an option or type 'quit' to quit:

1. Print Graph

2. Degrees of Separation

3. Page Rank

4. Web Search

> 3

Apple (.94)

New York Times (.25)

Intel (.97)

Wikipedia (.68)

Rutgers (.64)

Microsoft (.77)

Targum (.13)

Star-Ledger (.13)

Choose an option or type 'quit' to quit:

1. Print Graph

2. Degrees of Separation

3. Page Rank

4. Web Search

> 4

Enter a key phrase to search: > news

Results:

Wikipedia (0.68)

New York Times (0.25)

Targum (0.13)

Star-Ledger (0.13)

Choose an option or type 'quit' to quit:

1. Print Graph

2. Degrees of Separation

3. Page Rank

4. Web Search

> quit

Part 1: Graph Building

You must build your own graph representation using the provided ReadFile class to read and parse the input graph file. The number of pages on the Internet is very large and most pages link to only a few other pages (i.e. the web graph is sparse). Be sure to choose an appropriate data structure for the graph. After passing the filename of a graph to the constructor, the ReadFile class provides the following methods:

// returns the number of pages (vertices) in the graph.

int getNumberOfWebpages();

// returns the name (title) of the page with id "pageNo"

String getPageTitle(int pageNo);

// returns the number of out-links of the page with id "pageNo"

int getNumberOfOutLinks(int pageNo);

// returns the list of pages that the page with id "pageNo" links to

ArrayList<Integer> getOutLinks(int pageNo);

// returns the number of key phrases of the page with id "pageNo"

int getNumberOfKeyPhrases(int pageNo);

// returns the list of key phrases for the page with id "pageNo"

ArrayList<String> getKeyPhrases(int pageNo);

We have adopted the following standard for representing the web graph:

* Each web page (vertex) has an int identifier. Identifiers range from 0 to numPages-1.

* Each page has a Title, List of Key Phrases, and List of Pages that it links to

Note: The ReadFile class returns keyphrases containing all characters between commas (including spaces). You can remove leading and trailing spaces by using the "trim()" method of the String class. This method returns a new copy of the String without leading and trailing spaces.

Input Graph Format

The first line in the input file is the number of pages (has to be an int > 0). Following this is the information for every webpage, 4 lines per webpage:

1. unique page id (has to be an int in ascending order starting from 0)

2. page title (string) may include spaces

3. outlinks (a semicolon after each unique int neighbor id)

4. key phrases (a comma after each key phrase) may include spaces

Notes:

1. If the input file has fewer lines than it is supposed to, an exception is thrown.

2. If the input file has more lines than it is supposed to, the extra lines are ignored.

3. If a line is formatted incorrectly, an exception is thrown.

Here is the contents of the [url removed, login to view] file corresponding to our sample graph:

8

0

Apple

2;

computer, internet, device, ipod, ipod nano

1

New York Times

0;3;

education, news, internet,

2

Intel

0;5;

computer, internet,

3

Wikipedia

2;5;4;0;

university, education, computer, internet, news, power,

4

Rutgers

5;

university, education,

5

Microsoft

2;

power, computer, internet,

6

Targum

3;4;5;

news, internet, university,

7

Star-Ledger

1;4;

power, news,

Part 2: Degrees of Separation

Your program should be able to answer queries of the form: What is the minimum number of clicks to get from page X to page Y? One possible solution to this problem is to use breadth-first search from page X and stop when page Y is removed from the queue.

Special considerations:

* If X=Y (same page), then degrees of separation is 0.

* If there isn't a path from page X to page Y, then degrees of separation is -1.

Example: What is the minimum number of clicks to get from "New York Times" to "Microsoft?"

Answer: By following the red edges through Wikipedia, one can get from New York Times to Microsoft using 2 clicks.

Note: You may assume that for Degrees of Separation the user enters valid page titles (both page titles exist in the graph).

Part 3: Page Rank

We have devised an algorithm for calculating the popularity, or page rank, of a webpage. You must implement this algorithm to calculate the page rank of every webpage in a given graph. The details of our page ranking algorithm follow:

1. Let rank(x,t) be the rank of page x at iteration t.

2. Initialize the rank of every page to 1.0 (i.e. rank(x,0)=1.0 for every page x)

3. For t=1,2,3:

rank(x,t) = 0.5*[ rank(x,t-1) ] + 0.5*[ average( rank(y,t-1) of y's that have an edge to x ) ]

If x has no neighbors, rank(x,t) = 0.5*[ rank(x,t-1) ] + 0.5*[ 0 ] = 0.5*rank(x,t-1)

4. A higher rank means that a page is more "popular."

5. All rankings should be between 0 and 1, inclusive.

Example:

Web Page Rank

t=0 t=1 t=2 t=3

Apple (0) 1.00 0.5[1.00]+0.5[(1+1+1)/3] = 1.00 0.5[1.00]+0.5[(1+1+1)/3] = 1.00 0.5[1.00]+0.5[(0.5+1+0.86)/3] = 0.94

New York Times (1) 1.00 0.5[1.00]+0.5[1/1] = 1.00 0.5[1.00]+0.5[0.5/1] = 0.75 0.5[0.75]+0.5[[url removed, login to view]] = 0.50

Intel (2) 1.00 0.5[1.00]+0.5[(1+1+1)/3] = 1.00 0.5[1.00]+0.5[(1+1+1)/3] = 1.00 0.5[1.00]+0.5[(1+0.86+0.94)/3] = 0.97

Wikipedia (3) 1.00 0.5[1.00]+0.5[(1+1)/2] = 1.00 0.5[1.00]+0.5[(1+0.5)/2] = 0.86 0.5[0.86]+0.5[(0.75+0.25)/2] = 0.68

Rutgers (4) 1.00 0.5[1.00]+0.5[(1+1+1)/3] = 1.00 0.5[1.00]+0.5[(1+0.5+0.5)/3] = 0.83 0.5[0.83]+0.5[(0.86+0.25+0.25)/3] = 0.64

Microsoft (5) 1.00 0.5[1.00]+0.5[(1+1+1+1)/4] = 1.00 0.5[1.00]+0.5[(1+1+1+0.5)/4] = 0.94 0.5[0.94]+0.5[(1+0.86+0.83+0.25)/4] = 0.84

Targum (6) 1.00 0.5[1.00]+0.5[0] = 0.50 0.5[0.50]+0.5[0] = 0.25 0.5[0.25]+0.5[0] = 0.13

Star-Ledger (7) 1.00 0.5[1.00]+0.5[0] = 0.50 0.5[0.50]+0.5[0] = 0.25 0.5[0.25]+0.5[0] = 0.13

Note: You may use decimals without worrying how many decimal places are displayed.

Part 4: Web Search

Your final task is to implement a procedure to return pages matching a given key phrase, sorted by decreasing page rank. If multiple pages have the same rank then they can be returned in arbitrary order.

Example: Consider the key phrase, "news." The pages that match this key phrase, and their associated page ranks are:

Page Title Page Rank

Wikipedia 0.68

New York Times 0.25

Targum 0.13

Star-Ledger 0.13

Note: You may not assume that the search term appears in the data.

Implementation and Compiling

Before you start working on your program, make sure your CVS repository has been set up properly, and that you have pointed Eclipse to it. (You should have done all this for the first and second assignments, following Step 1, Step 2, and Step 3 of the section Instructions for writing and submitting your program.)

Now do the following:

1. In Eclipse, create a new Java project, name it WebGraph.

Note: Make sure you create a Java project (select Java project from the choices Eclipse provides for types of project) and name it exactly as asked. There will be a penalty applied to your assignment if you do not name your project exactly as asked.

Note: Make sure you use the project folder for both source and class files. (Do not create separate folders for source and class files.) If you don't know how, ask your TA. If you do not use the same folder for source and class files, there will be a penalty applied to your assignment.

Note: Make sure you are using Java version 1.5 for the compiler as well as for the JRE. If you don't know how, ask your TA. If you do not use version 1.5, there will be a penalty applied to your assignment.

2. Download ReadFile.jar. This jar file contains the class ReadFile (see section Graph) and its dependent classes.

* Right click on project name.

* Click on Properties.

* In the dialog that opens up, click on Java Build Path in the left frame.

* In the right frame, click on the Libraries tab.

* Click on the Add External Jars... button on the right.

* Browse to the saved jar file and click OK.

* You will then see this jar file appear in the JARs and class folders list on the build path text area. You will also see it under Referenced Libraries in your project in the Java package explorer perspective.

3. Create a new Java file called [url removed, login to view] and complete your program in it.

4. Share your project with your 112 CVS repository (/ug/users/projects/cs112/s2008/). If you don't know how, ask your TA. If you do not share the project with your CVS repository correctly, there will be nothing for us to grade, and you will get no credit. There will be nothing we can do later about this, so be vigilant.

5. Input graphs notes:

Place all input graph files in the main project folder. To supply input graph files as arguments to your program:

* Right-click on [url removed, login to view], choose Run As -> Run...

* Look under Java Application in the left frame and make sure that WebSearch shows up there. If it does not, double-click on Java Application.

* Click on WebSearch to display its configurations.

* From the tabs on the right, choose the (x)=Arguments tab.

* Type the name of the input file you want to use in the box Program Arguments.

* Click on Apply, then Run.

For an example of how your program should run, see the example run.

* You will implement a program called WebSearch that takes the filename of a graph as the first parameter. The progam should be menu-driven with the following menus and submenus:

1. Print Graph

2. Degrees of Separation

o Enter the title of origin page.

o Enter the title of destination page.

3. Page Rank

4. Web Search

o Enter a key phrase to search.

5. Quit

*

To prompt the user and read commands, you must first setup a BufferedReader wrapped around [url removed, login to view] (do this only once).

BufferedReader br = new BufferedReader(new InputStreamReader([url removed, login to view]));

Then, for instance, to read a key phrase, prompt and read like this:

[url removed, login to view]("Enter a key phrase to search: ");

String keyPhrase = [url removed, login to view]();

To use BufferedReader, you must add the following line to the top of your file:

import java.io.*;

BufferedReader's constructor and readLine() methods can throw an IOException, which is a "checked" exception. Likewise, the constructor of ReadFile can throw a BadInputException. You must either use try/catch to handle these exceptions, or put a "throws" clause onto the declaration of your method. The following example illustrates the use of "throws":

public static void main(String[] args) throws IOException, BadInputException

{ ...

-END

Java Odd Jobs

ID dự án: #1986682

Về dự án

Dự án từ xa Jul 11, 2012 đang mở