CS代写 Week6_MongoDB_Aggregations

Week6_MongoDB_Aggregations

Week 6 Class Exercise :: MongoDB Aggregations¶

Copyright By PowCoder代写 加微信 powcoder

Initialize MongoDB client and database¶

from pymongo import MongoClient
client = MongoClient(‘localhost’,27017) ## or MongoClient(“localhost:27”)
db = client.apan5400

Create collection ‘cities’ with some data¶

db.cities.insert_many([
{“name”: “Seoul”, “country”: “South Korea”, “continent”: “Asia”, “population”: 25.674 },
{“name”: “Mumbai”, “country”: “India”, “continent”: “Asia”, “population”: 19.980 },
{“name”: “Lagos”, “country”: “Nigeria”, “continent”: “Africa”, “population”: 13.463 },
{“name”: “Beijing”, “country”: “China”, “continent”: “Asia”, “population”: 19.618 },
{“name”: “Shanghai”, “country”: “China”, “continent”: “Asia”, “population”: 25.582 },
{“name”: “Osaka”, “country”: “Japan”, “continent”: “Asia”, “population”: 19.281 },
{“name”: “Cairo”, “country”: “Egypt”, “continent”: “Africa”, “population”: 20.076 },
{“name”: “Tokyo”, “country”: “Japan”, “continent”: “Asia”, “population”: 37.400 },
{“name”: “Karachi”, “country”: “Pakistan”, “continent”: “Asia”, “population”: 15.400 },
{“name”: “Dhaka”, “country”: “Bangladesh”, “continent”: “Asia”, “population”: 19.578 },
{“name”: “Rio de Janeiro”, “country”: “Brazil”, “continent”: “South America”, “population”: 13.293 },
{“name”: “São Paulo”, “country”: “Brazil”, “continent”: “South America”, “population”: 21.650 },
{“name”: “Mexico City”, “country”: “Mexico”, “continent”: “North America”, “population”: 21.581 },
{“name”: “Delhi”, “country”: “India”, “continent”: “Asia”, “population”: 28.514 },
{“name”: “Buenos Aires”, “country”: “Argentina”, “continent”: “South America”, “population”: 14.967 },
{“name”: “Kolkata”, “country”: “India”, “continent”: “Asia”, “population”: 14.681 },
{“name”: ” “, “country”: “United States”, “continent”: “North America”, “population”: 18.819 },
{“name”: “Manila”, “country”: “Philippines”, “continent”: “Asia”, “population”: 13.482 },
{“name”: “Chongqing”, “country”: “China”, “continent”: “Asia”, “population”: 14.838 },
{“name”: “Istanbul”, “country”: “Turkey”, “continent”: “Europe”, “population”: 14.751 }

Using $match aggregation stage¶

pipeline = [
{ “$match”: { “continent”: “North America” } }

list(db.cities.aggregate(pipeline))

[{‘_id’: ObjectId(‘621481d2bd0a467f42aea96a’),
‘name’: ‘Mexico City’,
‘country’: ‘Mexico’,
‘continent’: ‘North America’,
‘population’: 21.581},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea96e’),
‘name’: ‘ ‘,
‘country’: ‘United States’,
‘continent’: ‘North America’,
‘population’: 18.819}]

pipeline = [
{ “$match”: { “continent”: { “$in”: [“North America”, “Asia”] } } }

list(db.cities.aggregate(pipeline))

[{‘_id’: ObjectId(‘621481d2bd0a467f42aea95e’),
‘name’: ‘Seoul’,
‘country’: ‘South Korea’,
‘continent’: ‘Asia’,
‘population’: 25.674},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea95f’),
‘name’: ‘Mumbai’,
‘country’: ‘India’,
‘continent’: ‘Asia’,
‘population’: 19.98},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea961’),
‘name’: ‘Beijing’,
‘country’: ‘China’,
‘continent’: ‘Asia’,
‘population’: 19.618},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea962’),
‘name’: ‘Shanghai’,
‘country’: ‘China’,
‘continent’: ‘Asia’,
‘population’: 25.582},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea963’),
‘name’: ‘Osaka’,
‘country’: ‘Japan’,
‘continent’: ‘Asia’,
‘population’: 19.281},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea965’),
‘name’: ‘Tokyo’,
‘country’: ‘Japan’,
‘continent’: ‘Asia’,
‘population’: 37.4},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea966’),
‘name’: ‘Karachi’,
‘country’: ‘Pakistan’,
‘continent’: ‘Asia’,
‘population’: 15.4},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea967’),
‘name’: ‘Dhaka’,
‘country’: ‘Bangladesh’,
‘continent’: ‘Asia’,
‘population’: 19.578},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea96a’),
‘name’: ‘Mexico City’,
‘country’: ‘Mexico’,
‘continent’: ‘North America’,
‘population’: 21.581},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea96b’),
‘name’: ‘Delhi’,
‘country’: ‘India’,
‘continent’: ‘Asia’,
‘population’: 28.514},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea96d’),
‘name’: ‘Kolkata’,
‘country’: ‘India’,
‘continent’: ‘Asia’,
‘population’: 14.681},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea96e’),
‘name’: ‘ ‘,
‘country’: ‘United States’,
‘continent’: ‘North America’,
‘population’: 18.819},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea96f’),
‘name’: ‘Manila’,
‘country’: ‘Philippines’,
‘continent’: ‘Asia’,
‘population’: 13.482},
{‘_id’: ObjectId(‘621481d2bd0a467f42aea970’),
‘name’: ‘Chongqing’,
‘country’: ‘China’,
‘continent’: ‘Asia’,
‘population’: 14.838}]

Using $sort aggregation stage¶

pipeline = [
{ “$sort”: { “population”: -1 } }

list(db.cities.aggregate(pipeline))

pipeline = [
{ “$match”: { “continent”: “North America” } },
{ “$sort”: { “population”: 1 } }

list(db.cities.aggregate(pipeline))

Using $group aggregation stage¶

pipeline = [
{ “$group”: { “_id”: “$continent” } }

list(db.cities.aggregate(pipeline))

[{‘_id’: ‘Asia’},
{‘_id’: ‘Europe’},
{‘_id’: ‘North America’},
{‘_id’: ‘Africa’},
{‘_id’: ‘South America’}]

list(db.cities.aggregate([
“$group”: {
“continent”: “$continent”,
“country”: “$country”

[{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Japan’}},
{‘_id’: {‘continent’: ‘North America’, ‘country’: ‘United States’}},
{‘_id’: {‘continent’: ‘Europe’, ‘country’: ‘Turkey’}},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘South Korea’}},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Bangladesh’}},
{‘_id’: {‘continent’: ‘South America’, ‘country’: ‘Argentina’}},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Philippines’}},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘China’}},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘India’}},
{‘_id’: {‘continent’: ‘South America’, ‘country’: ‘Brazil’}},
{‘_id’: {‘continent’: ‘Africa’, ‘country’: ‘Egypt’}},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Pakistan’}},
{‘_id’: {‘continent’: ‘North America’, ‘country’: ‘Mexico’}},
{‘_id’: {‘continent’: ‘Africa’, ‘country’: ‘Nigeria’}}]

– highest_population: this field contains the maximum population value in the group. The max accumulator operator computes the maximum value for population across all documents in a group.¶
– first_city: contains the name of the first city in the group. The “first” accumulator operator takes the value of “name” from the first document appearing in the group. Notice that since the list of documents is now unordered, this doesn’t automatically make it the city with the highest population, but rather the first city MongoDB finds within each group.¶
– cities_in_top_20: holds the number of cities in the collection for each continent-country pair. To accomplish this, the “sum” accumulator operator is used to compute the sum of all the pairs in the list. In this example, the sum takes one for each document and doesn’t refer to a particular field in the source document.¶

list(db.cities.aggregate([
“$group”: {
“continent”: “$continent”,
“country”: “$country”
“highest_population”: { “$max”: “$population” },
“first_city”: { “$first”: “$name” },
“cities_in_top_20”: { “$sum”: 1 }

[{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Japan’},
‘highest_population’: 37.4,
‘first_city’: ‘Osaka’,
‘cities_in_top_20’: 2},
{‘_id’: {‘continent’: ‘North America’, ‘country’: ‘United States’},
‘highest_population’: 18.819,
‘first_city’: ‘ ‘,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Europe’, ‘country’: ‘Turkey’},
‘highest_population’: 14.751,
‘first_city’: ‘Istanbul’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘South Korea’},
‘highest_population’: 25.674,
‘first_city’: ‘Seoul’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Bangladesh’},
‘highest_population’: 19.578,
‘first_city’: ‘Dhaka’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘South America’, ‘country’: ‘Argentina’},
‘highest_population’: 14.967,
‘first_city’: ‘Buenos Aires’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Philippines’},
‘highest_population’: 13.482,
‘first_city’: ‘Manila’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘China’},
‘highest_population’: 25.582,
‘first_city’: ‘Beijing’,
‘cities_in_top_20’: 3},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘India’},
‘highest_population’: 28.514,
‘first_city’: ‘Mumbai’,
‘cities_in_top_20’: 3},
{‘_id’: {‘continent’: ‘South America’, ‘country’: ‘Brazil’},
‘highest_population’: 21.65,
‘first_city’: ‘Rio de Janeiro’,
‘cities_in_top_20’: 2},
{‘_id’: {‘continent’: ‘Africa’, ‘country’: ‘Egypt’},
‘highest_population’: 20.076,
‘first_city’: ‘Cairo’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Asia’, ‘country’: ‘Pakistan’},
‘highest_population’: 15.4,
‘first_city’: ‘Karachi’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘North America’, ‘country’: ‘Mexico’},
‘highest_population’: 21.581,
‘first_city’: ‘Mexico City’,
‘cities_in_top_20’: 1},
{‘_id’: {‘continent’: ‘Africa’, ‘country’: ‘Nigeria’},
‘highest_population’: 13.463,
‘first_city’: ‘Lagos’,
‘cities_in_top_20’: 1}]

Import the dataset of Webhose news on Netflix¶

import json

json_data = open(“webhose_netflix.json”).readlines()
newsfeeds = []
for line in json_data:
newsfeeds.append(json.loads(line))
print(len(newsfeeds))

#newsfeeds[10]

collection = db.webhose_netflix
collection.insert_many(newsfeeds)

total_docs = collection.count_documents({})
total_docs

“title”: {
“$regex”: ‘Netflix’,
“$options” :’i’ # case-insensitive
results = collection.find(query)
print(collection.count_documents(query))

pipeline = [
{ “$match” : { “published” : {“$regex”: ‘2020-06-03’}} }

#list(collection.aggregate(pipeline))

from bson.son import SON
pipeline = [
{“$unwind”: “$title”},
{“$group”: {“_id”: “$title”, “count”: {“$sum”: 1}}},
{“$sort”: SON([(“count”, -1), (“_id”, -1)])}

#list(collection.aggregate(pipeline))

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com