** ======================================================== ** MAT259 Visualizaing Data | Winter 2011 | George Legrady ** ======================================================= ** Project 1 : Checkout Durations by Dewey Category ** ================================================ ** Nichole Stockman ** ================ ** ** ** ** ============================================================================ ** THE DATASET ** ============================================================================ ** XML dataset stored in a mySQL database containing information on checked-out ** items (books, cds, movies, etc.) from the Seattle Public Library. The data ** has been continuously recorded since August, 2005 and includes information ** from earlier dates. ** ** ** ** ============================================================================ ** PROJECT DESCRIPTION AND THE DATA IT USES ** ============================================================================ ** This project visualizes checkout duration data for all books checked out and ** returned between 2005 and 2010. That is, it seeks to show the average ** lengths of time (in days) for which books of different categories are ** checked out. It shows the breakdown according to the year, month, and/or day ** on which they were checked out. ** ** ** ** ============================================================================ ** FUNCTIONALITIES OF THE APPLICATION ** ============================================================================ ** ** There are two types of interaction with this applications: ** ** 1) Mouse Clicking ** 2) Mouse-Over Highlighting ** ** ** ** 1. MOUSE CLICKING ** ================== ** CLICKABLE ELEMENT: ** ** The sentence "Switch to (Sub)Category View" in the top right corner ** of the display. ** ** FUNCTION: ** ** Switches between showing averages for the main Dewey categories and ** those of the ten subcategories for each main category. ** ** -------- ** ** CLICKABLE ELEMENT: ** ** The year, month, and day labels at the bottom of the display. ** ** FUNCTION: ** ** Clicking on the label immediately below the graph zooms in to that ** period of time by switching to month view or day view. ** ** Clicking on a lower label stays in the current view mode, but changes ** the year or month for which data is reflected. ** ** Example: At start, data is displayed for each year. By clicking on ** 2006, the display shows monthly duration averages for 2006. ** At this point, by clicking 2007, we will see monthly duration ** averages for 2007 - so the year and data have changed, but we ** are still in month view mode. ** However, by clicking on a month label, we will see all the ** averages for books checked out on each DAY of that month. At ** this point, we can change the year or month to look at, but we ** will continue seeing daily averages displayed ** ** The "All Years" clickable element in the lower left corner of the ** display will show the application's starting view again. ** ** -------- ** ** ** ** 2. Mouse-Over Highlighting ** ============================ ** (Active Only in Subcategory View) ** ** As the mouse is moved over a particular bar, it and the corresponding bars ** in each column will be highlighted with a white outline. ** ** The name of the subcategory represented by those bars will also appear in ** parentheses below the title of the main category to which it belongs. (The ** titles of the main categories are always shown as the y-axis labels). ** ** ** ** ** ============================================================================ ** PARSING THE DATA ** ============================================================================ ** Three pde files were run which parsed the data. ** They each consist of a mySQL query that calculates the average number of ** days for which items were checked out. The averages are calculated for 100 ** different dewey classifications (those starting with 00... to those ** beginning with 99...). ** ** Three text files were generated in order to work with the yearly, monthly ** and daily averages separately rather than requiring the program to ** calculate each. ** ** yearDurations.txt has three columns: ** year ---------- the year (2005 to 2010 at the time of this project) ** dewey class: -- the first two numbers of each dewey category (00 to 99) ** duration: ----- average number of days for which all books for the given ** category were checked out for in the given year. ** ** monthDurations.txt has the same columns as yearDurations, plus a column ** to specify the month. ** ** dayDurations.txt which has the same columns as monthDurations, plus a ** column to specify the day. ** ** ** ** ** ============================================================================ ** ADDITIONAL OBSERVATIONS AND NOTES ** ============================================================================ ** There is a noticeable trend from 2005 to 2010 of decreasing check out ** durations.Possibly, it was at that time that the library changed its ** policies regarding book returns. For example, they may have changed the ** length of time people are allowed to check out books for, or they may have ** begun charging higher later fees, etc. I have not yet researched these ** possibilities or alternate explanations. ** ** As expected, many holidays show zero days duration because no books were ** checked out on those days. However, the first week of September 2009 shows ** no check outs as well. After some googling, I discovered that due to ** city-wide budget cuts, the Seattle Public Library was closed for the first ** week of September in both 2009 and 2010. ** ** It is also worth noting that the check out durations for books checked out ** in early 2005 were so long, on average, that I decided to take any durations ** longer than 6 months and map them to exactly 6 months. This is why you will ** see some bars that span the whole width of the column (especially in ** months-view mode). Thus, it would perhaps be a good to disregard the data ** from after July 2010 until now (January 2011) because it could still change ** as new data comes in up until July 2011. ** ** ============================================================================