1. Using the data schema in table 1, write a query that would bucket the users based on the following criteria:
A user cannot be in more than 1 bucket
Date: Jan 1, 2016 to Jan 7, 2016
The buckets are prioritized below:
1. Priority 1: only users who made contact on 2016-01-01
2. Priority 2: exchanged more than 50 messages
3. Priority 3: interacted with more than 30 other users
Assumptions:
· A user cannot be in more than 1 bucket.
//This user is not delimiter clearly, but I treat this both for host and guest.
· Date: Jan 1, 2016 to Jan 7, 2016
// (1) date not confirmed it’s >= Jan 1, 2016 or > Jan 1, 2016, same with the Jan 7, 2016.
But I treat it as >= ‘Jan 1, 2016’and <=’Jan 7, 2016’
· The buckets are prioritized below:
1. Priority 1: only users who made contact on 2016-01-01
//there are 2 understandings,
(1) This could be the case that if user made contact on 2016-01-01 then the user is in the scope.
(2) The user only made contact on 2016-01-01 no any chat out of 2016-01-01, then the user in the scope.
I pick (1) as my assumption.
2. Priority 2: exchanged more than 50 messages.
The table1’s num_messages should count all the interaction messages between host and guest in the same day, but there are still 2 understanding of the 50 messages.
(1) The user exchanged more than 50 messages in total from Jan 1, 2016 to Jan 7, 2016.
(2) The user exchanged more than 50 messages in any of the chat between Jan 1, 2016 to Jan 7, 2016.
I pick (1) as my assumption.
3. Priority 3: interacted with more than 30 other users
//this part is a trick part, and there is no definition of the tables and business rules.
But for normal business, the system should not allow id_host and id_guest are using the same id and the same time all the host and guest id cannot be null.
But even so, interacted with more than 30 other users can be understand in 2 ways:
(1) The user (could be host or guest) has chat with 30 host or guest (rows in the table) from Jan 1, 2016 to Jan 7, 2016.
This means the guest could chat with the same host for 3 days and could chat few messages for each day. Then the interacted with users only count as 3.
(2) The user (could be host or guest) has chat with 30 DIFFERENT host or guest from Jan 1, 2016 to Jan 7, 2016.
This means the guest could chat with the same host for 3 days and could chat few messages for each day. But the interacted with users only count as 1.
Both could be correct, so I write 2 script, but I prefer (2) as better business understanding.
Test data:
Whole Script:
Priority3 using case (1)
SET group_concat_max_len=100000000;
select user,case when length(group_concat(date))>length(replace(group_concat(date),’2016-01-01′,”)) then ‘Priority 1: only users who made contact on 2016-01-01’
when sum(num_messages)>50 then ‘Priority 2: exchanged more than 50 messages’
when count(user)>30 then ‘Priority 3: interacted with more than 30 other users’ end as Priority
from
(select id_host as user,date,num_messages from temp_test.table1 t1
union all
select id_guest as user,date,num_messages from temp_test.table1 t2) a
group by user
having Priority is not null
order by Priority asc;
#Logic:
(1)length(group_concat(date))>replace(group_concat(date) if this is true, then it means date filed has date 2016-01-01
(2)sum(num_messages)>50 if this is ture that means the exchanged messages more than 50.
(3)count(user)>30 if this is true that means interacted with more than 30 other users
Priority3 using case (2)
SET group_concat_max_len=100000000;
select user,
case when find_in_set(‘2016-01-01’,group_concat(date)) then ‘Priority 1: only users who made contact on 2016-01-01’
when sum(num_messages)>50 then ‘Priority 2: exchanged more than 50 messages’
when count(user)>30 then ‘Priority 3: interacted with more than 30 other users’ end as Priority
from
( (select id_host as user,group_concat(date) as date,sum(num_messages) as num_messages from temp_test.table1 group by id_host,id_guest)
union all
(select id_guest as user,group_concat(date) as date,sum(num_messages) as num_messages from temp_test.table1 group by id_host,id_guest))a
group by user
having Priority is not null
order by Priority asc;
#Logic:
(1) find_in_set(‘2016-01-01′,group_concat(date)) if this is true, then it means date filed has date 2016-01-01
(2)sum(num_messages)>50 if this is ture that means the exchanged messages more than 50.
(3)count(user)>30 if this is true that means interacted with more than 30 other users
2. Using table 2, how would you go about testing the assumption that there can only be one valid tax_area_id per listing?
Assumption:
(1) This is MySQL DB
(2) We can check the table definitions
(3) Listing means id_listing in the table2
Solutions:
method1:
Step1: show create table2;
Step2: if id_listing and tax_area_id are built as joint primary key then no need check, Databse already has restriction on this.
method2:
if there is no primary key for (id_listing,tax_area_id)
select * from table2 group by id_listing having count(distinct tax_area_id)>1;
if retrun 0 rows then there can only be one valid tax_area_id per listing.
3. Using table 2, assuming that there are duplicate entries per listing, write a query that only returns the latest valid information entered per listing.
Assumption:
(1) deleted_at filed in the table2 either be null or real date value.
(2) Created_at always earlier than the deleted_at
#Script
select * from (select * from table2 where deleted_at is null or deleted_at=” order by create_at desc) a group by id_listing;
#Logic:
1. filter the deleted_at filed make sure the record is not deleted.
2. order by the rest of the records by field create_at sort by desc, then group by the id_listing to make sure we only fetch the first row for each id_listing, that means the latest vlaid informaiton.
Test Data.sql
create table table1
(id bigint(14) NOT NULL AUTO_INCREMENT,
date date,
id_host bigint(14),
id_guest bigint(14),
num_messages bigint(14),
primary key (id),
key index_id_host(id_host),
key index_id_guest(id_guest));
insert into temp_test.table1 values
(1,’2016-01-01′,101,102,1),
(2,’2016-01-01′,102,103,10),
(3,’2016-01-02′,101,104,4),
(5,’2016-01-02′,103,104,50),
(9,’2016-01-02′,101,104,50),
(10,’2016-01-02′,108,100,50),
(11,’2016-01-03′,101,101,50),
(12,’2016-01-04′,101,102,50),
(13,’2016-01-07′,101,103,50),
(14,’2016-01-06′,101,104,50),
(15,’2016-01-03′,101,105,50),
(16,’2016-01-06′,101,106,50),
(17,’2016-01-07′,101,107,50),
(18,’2016-01-02′,101,108,50),
(19,’2016-01-02′,101,109,50),
(20,’2016-01-02′,101,200,50),
(21,’2016-01-02′,101,201,50),
(22,’2016-01-02′,101,202,50),
(23,’2016-01-02′,101,203,50),
(24,’2016-01-02′,101,204,50),
(25,’2016-01-02′,101,205,50),
(26,’2016-01-02′,101,206,50),
(27,’2016-01-02′,101,208,50),
(28,’2016-01-02′,101,209,50),
(29,’2016-01-02′,101,300,50),
(30,’2016-01-02′,301,301,1),
(31,’2016-01-02′,301,302,1),
(32,’2016-01-02′,301,303,1),
(33,’2016-01-02′,301,304,1),
(34,’2016-01-02′,301,305,1),
(35,’2016-01-02′,301,306,1),
(36,’2016-01-02′,301,307,1),
(37,’2016-01-02′,301,308,1),
(38,’2016-01-02′,301,309,1),
(39,’2016-01-02′,301,310,1),
(40,’2016-01-02′,301,311,1),
(41,’2016-01-02′,301,312,1),
(42,’2016-01-02′,301,313,1),
(43,’2016-01-02′,301,314,1),
(44,’2016-01-02′,301,315,1),
(45,’2016-01-02′,301,316,1),
(46,’2016-01-02′,301,317,1),
(47,’2016-01-02′,301,318,1),
(48,’2016-01-02′,301,319,1),
(49,’2016-01-02′,301,320,1),
(50,’2016-01-02′,301,321,1),
(51,’2016-01-02′,301,322,1),
(52,’2016-01-02′,301,323,1),
(53,’2016-01-02′,301,324,1),
(54,’2016-01-02′,301,325,1),
(55,’2016-01-02′,301,326,1),
(56,’2016-01-02′,301,327,1),
(57,’2016-01-02′,301,328,1),
(58,’2016-01-02′,301,329,1),
(59,’2016-01-02′,301,330,1),
(60,’2016-01-02′,301,331,1),
(61,’2016-01-02′,401,332,1),
(62,’2016-01-02′,401,332,1),
(63,’2016-01-03′,401,332,1),
(64,’2016-01-04′,401,333,1),
(65,’2016-01-02′,401,334,1),
(66,’2016-01-02′,401,335,1),
(67,’2016-01-02′,401,336,1),
(68,’2016-01-02′,401,337,1),
(69,’2016-01-02′,401,338,1),
(70,’2016-01-02′,401,339,1),
(71,’2016-01-02′,401,340,1),
(72,’2016-01-02′,401,341,1),
(73,’2016-01-02′,401,342,1),
(74,’2016-01-02′,401,343,1),
(75,’2016-01-02′,401,344,1),
(76,’2016-01-02′,401,345,1),
(77,’2016-01-02′,401,346,1),
(78,’2016-01-02′,401,347,1),
(79,’2016-01-02′,401,348,1),
(80,’2016-01-02′,401,349,1),
(81,’2016-01-02′,401,350,1),
(82,’2016-01-02′,401,351,1),
(83,’2016-01-02′,401,352,1),
(84,’2016-01-02′,401,353,1),
(85,’2016-01-02′,401,354,1),
(86,’2016-01-02′,401,355,1),
(87,’2016-01-02′,401,356,1),
(88,’2016-01-02′,401,357,1),
(89,’2016-01-02′,401,358,1),
(90,’2016-01-02′,401,359,1),
(91,’2016-01-02′,401,400,1),
(92,’2016-01-02’,401,900,1);