2: What are the most checked out items per year?
Books query:
Code: Select all
select ye as 'Year', MAX(num) as '# of couts', title, subject.subject, deweyClass, itemtype, subject.bibNumber
from (SELECT year(cout) as ye, count(*) as num, title, itemtype, inraw.bibNumber, deweyClass
FROM spl_2016.inraw
where title != '' and cout != '1970-01-01 00:00:00' and (itemtype = 'acbk' or
itemtype = 'arbk' or
itemtype = 'bcbk' or
itemtype = 'drbk' or
itemtype = 'jcbk' or
itemtype = 'jrbk')
group by title, year(cout)
order by num DESC, year(cout) DESC) as sth
join subject where sth.bibNumber = subject.bibNumber
group by ye
order by ye DESC;
Code: Select all
select ye as 'Year', MAX(num) as '# of couts', title, subject.subject, deweyClass, itemtype, subject.bibNumber
from (SELECT year(cout) as ye, count(*) as num, title, itemtype, inraw.bibNumber, deweyClass
FROM spl_2016.inraw
where title != '' and cout != '1970-01-01 00:00:00'
group by title, year(cout)
order by num DESC, year(cout) DESC) as sth
join subject where sth.bibNumber = subject.bibNumber
group by ye
order by ye DESC;
Results and processing time: 31 rows were returned for both queries and book search took 283.25 secs (4.7 mins) whereas item search was longer with 469.641 secs (7.8 mins). I know the queries are not in the most optimal shape at the moment. Using inraw table and applying a join on that is costly but gathering this information requires lots of computation and it deserves the time it took.
Analysis: When the item list is examined, it's easy to see most of them are digital media products (CDs, DVDs etc.). One of the Interesting items is 2015, where the book "Gone Girl" surpasses all digital media and become number one checked out item. Another one is 2016 in which something called "SPL HotSpot connecting Seattle" is at the top place with an itemtype of "aceqnh", which doesn't appear in itemTypes list.
Question 3: What are the newest items in the library?
Code: Select all
SELECT title, datediff(NOW(), min(cout)) as days, itemtype, (cout) as regDate
FROM inraw
WHERE cout > '1970-01-01 00:00:00'
GROUP BY itemNumber
HAVING days < 30
LIMIT 500
Results and processing time: 500 items gathered and it took 117.266 seconds. The long query time is expected and it is due to the HAVING statement, which forces the query to run over all instances in the database. If HAVING is deleted, it runs in fractions of a second. However, without that statement, I couldn't retrieve small numbers in the set.
Analysis: I would not expect any specific item in this query since newest items could be from every domain, background or item type. But when I quickly look at the item types and titles, I find that most of the entries are fictional books.