程序代写代做 中文:

中文:
许多计算系统提供日志记录服务(logging service),因此系统管理员可以监视系统的活动,并根据从日志(logs)中收集的知识来诊断问题。连续的和自动化的处理日志条目可以加速问题的识别以及实施纠正措施的过程。Web服务器(Web servers)就是这样的系统。web服务器日志(web server log)包含关于基于HTTP访问由服务器承载的web页面的信息,包括访问页面的主机的Internet协议(IP)地址,一个特定的访问(a particular access)的时间,由服务器发送的字节数,和特定页面地址(the specific page addresses that have been served)。
请使用已提供的的文件。

1. 编写一个读取web服务器产生的日志(log)的应用程序(application)。对于每个惟一的IP,计算服务到该IP地址的字节总数。
2. 扩写应用程序以返回提供最多字节数的前k个IP(k是变量)。
3. 扩写应用程序以计算每个唯一IP在每个时间窗口(time window)1小时内所服务的字节总数(带有翻转窗口(tumbling windows))。我们需要计算每个IP地址在一小时内的字节数(例如,00:00:00到00:59:59和01:00:00到01:59:59等)。它也可以从数据中的第一个时间戳(timestamp)开始,而不是从00:00:00开始)。
4. 修改应用程序以计算子网(subnet)的相同统计数据,即根据指定的前缀跨IP聚合(aggregate)。例如,假设一个IP(v4)地址是4字节,那么能否用相同的前3字节(MSB到LSB)来聚合IP的所有数据。例如,为所有以123.100.099.*开头的IP地址聚合字节,即最后三位数字可以是任何数字。

要求:
1. 请将四道题目的代码放入单独的项目(或目录)中。
2. 请使用python3。
3. 请注释代码增加可读性。
4. 请使用spark。

英文:
Many computational systems provide a logging service so system administrators can keep tabs on a system’s activity as well as diagnose problems based on knowledge gleaned from the logs. Continuous and automated processing log entries can speed up the identification of problems as well as the process of implementing corrective measures. Web servers are examples of such systems. A web server log contains information regarding HTTP-based accesses to web pages hosted by the server, including the Internet Protocol (IP) address of the host accessing the page, the time of a particular access, the number of bytes sent by the server, and the specific page addresses that have been served.

1. Write an application that reads in the logs produced by a web server. For each unique IP, compute the total number of bytes served to that IP address.
2. Extend the application to return the top-K IPs that were served the most number of bytes. K is not a pre-defined value. It is a configurable input.
3. Extend the application to compute the total number of bytes served per time window of 1 hour, (with tumbling windows) for each unique IP. we need to compute counts of bytes per IP address per one hour window (e.g. 00:00:00 to 00:59:59 and 01:00:00 to 01:59:59 etc.). It can also start with the first timestamp in the data, as opposed to starting at 00:00:00).
4. Modify the application to compute the same statistics for a subnet, i.e. aggregate across IPs based on a specified prefix. For instance, given that an IP (v4) address is 4 bytes, can you aggregate all data for IPs with the same first 3 bytes (MSB to LSB). As an example, aggregate bytes for all IP addresses that start with 123.100.099.*, i.e the last three digits can be anything.