程序代写 EA142F0AA54EF1849BE4A29″,”client_ip”:”183.80.224.65,10.84.246.141″,”platfor

Task: Tiny recommender system
1. Objectives
To get familiar with the tech stacks:
C++, Golang

Copyright By PowCoder代写 加微信 powcoder

Hadoop, Map-reduce
Remote Procedure Call: bRPC
Micro-service architecture

There are two sub-tasks in this task:
1. Data process: process one-day tracking logs containing all user behaviors of buyer.
2. Online service: build services to query the data processed in the previous sub-task.

2. Terminology
(1) Tracking log
The tracking logs are stored in HDFS under this path: /resource/task/tracking_log/2021-12-22.

drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
– flink supergroup
– flink supergroup
– flink supergroup
0 2021-09-23
0 2021-09-24
0 2021-09-25
01:31 /operation_data/ads_tracking/2021-09-22
01:30 /operation_data/ads_tracking/2021-09-23
01:30 /operation_data/ads_tracking/2021-09-24

Each line in the tracking log is a user operation record. It is stored in JSON format.
Sample log
{“userid”:16216623,”sessionid”:”\”xTRaTpi5Vmju4kOFMkfTcdA978xTAIhfZvnZ2nzLWBa7nYwA3vu7mEZsWU+OX8fnmQ9mOTI1yj+wtJtDsLuAq3mOn43Y+AAXMopzGQFkWi8=\””,”deviceid”:”0709D0355EA142F0AA54EF1849BE4A29″,”client_ip”:”183.80.224.65,10.84.246.141″,”platform”:2,”operation”:2,”items”:[{ “itemid”:6790317855,”shopid”:48284291,”discount”:0,”free_shipping”:false,”is_prefered”:false,”location”:7,”query”:{

(2) Items degree
Two items A and B are said to be associated with a degree of K if they are both clicked by K users within the same day. The order of clicks does not matter, however, the interval between the two clicks has to be less than a threshold T. In this entry task, we use T=3600s.

For example:
Given the following click events:
click timestamp

1633000100

1633000200

1633000300

1633050000

1633000200

1633000300

1633005100

The degrees of the two items:
Description

Clicked by user 1.

Clicked by users 1 and 2. But the intervals between two clicks of both users are more than 3600.

Clicked by users 1, 2.

Clicked by user 1. But the interval between two clicks is more than 3600.

Clicked by user 1.

Clicked by users 1 and 2. But the intervals between two clicks of both users are more than 3600.

3. Requirements
Part 1: Data process
Task: Analyze the tracking log of one full day(path given above) using map/reduce, store the output into storage.

Input: tracking log

Output: all items, and its associated item degrees (you can design the output format by yourself)

Notes: please carefully read and understand “Part 2” requirements (and data it needs) before you design this part, as data generated in “Part 1” will be used in “Part 2”

Programming language: Golang

Performance: the Hadoop (map/reduce) jobs complete in 4 hours.

Part 2: Online services
Design and build an online system containing 1 services: 1 recommend service (mixer) and 2 recall services (recall_item, recall_user).
What it is:
This is a HTTP server and gateway of recommend system. App or website will call this service to get recommended products.

How it works:
It calls the 2 recall services, get list of items from the 2 servers, and mix (merge) the 2 lists of items.
Mix operation: concatenate items from recall_item (high precedence) and recall_user (low precedence), remove duplicated items from recall_user

It provides 1 HTTP GET API: /api/v1/recommend
3 parameters: itemid, shopid , userid and min_degree. return at most 50 items.

2. recall_item, recall_user
What they are:
They are RPC (by bRPC) servers, query data (generated in part 1) from storage (you should do survey and pick a proper one), and return to mixer

How they work:
recall_item: get items associated with the input itemid , shopid by the minimum degree of min_degree . The length of the returned item list should not exceed 50. If there are more than 50 items that can meet the criteria, return the top 50 items starting from the one with the highest degree. Whenever there is a tie, e.g., two items with the same degree, use random tie breaker (return any of the tied items is fine).

recall_user: get items associated with top 20 clicked (if click count same, order by “last click time”) items by the userid. Same requirement for length of returned items.

· top 20 clicked items, if item A and B has same click count, the one which has newer “last click time” wins
· for each top clicked items A, find its associated items asso_items, merge all asso_items and filter by min_degree , if more than 50, pick the ones with highest degrees

· recall_item: 1 API, accepting 3 parameters: itemid , shopid and min_degree
· recall_user: 1 API, 2 parameters: userid and min_degree

Programming language: C++

Performance:
mixer: able to handle 500 queries per second.
recall_item: able to handle 500 queries per second.
recall_user: able to handle 300 queries per second.
These services should support being scaled up to multiple instances.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com