2022f MAT265 - Projects in Optical/Computational Processes


2022f Instructor		MAT 265 Open Projects in Optical/Motion-Computational Processes George Legrady Elings Hall, lab 2611, CNSI Building 2nd floor - Tues-Thurs 3:30-5:50pm

Course Format		MAT265 is a studio production course with two directions: 1) Directed Research for advanced MAT students focusing on a topic intersecting computation and visualization. The goals of the course are to provide opportunities for creative experimentation, potentially culminating with a project, or proposal, or research development, or dissertation work. The workload consists of regular individual meetings and occasional group presentations on Tuesdays to share with the class the evolution of each students' work. Research format: a) Identifying a topic of interest at the start of the quarter b) Define the schedule to realize the project by the end of the 10-week course c) Document the 10-week work in a report to be posted at this website.

		2) Data Analysis of a Growing Database For students with some experience in MySQL and data analysis to work with a collection of over 104 million datasets collected by the hour since September 2005. The goal of the course is to acquire skills in analyzing the collection, and to discover patterns, anomalies, content distribution, and overall performance during the 17-year period in which the data has been collected. Students will acquire experience in working with a real world data collection that continues to grow. They will benefit in both experience and also identifying performance perspectives of the data, which may be ideal for research and publication. The data is to be analyzed through MySQL. Workload includes weekly meetings on Thursdays, becoming proficient in MySQL queries, and resulting in insightful data queries. Projects are to be posted at the Student Forum. Skills to be gained in the course include: Experience in MySQL programming Application of algorithms for cluster analysis, association rules, etc. Identification of patterns in data performance (frequency pattern) Identifying anomalies within classification systems Exploration of outliers and causes


Data Analysis Schedule 09.22.22 09.27.22 10.04.22 10.11.22 10.18.22 10.25.22 11.01.22 11.08.22 11.15.22 11.22.22 11.29.22 12.6.22		The following section of this schedule is for the MySQL data analysis work Review of various MySQL queries Database exploration through MySQL Database exploration through MySQL Database exploration through MySQL Research Topic Development Research Topic Development Mid-Term Class Presentation Focused Research Focused Research Thanksgiving Focused Research Final Presentation & Project Reports due



The Seattle Library Database Ordinal Interval Scale Categorical Semantic		The database consists of a rich and unique content-based multivariate collection, consisting of the checkouts and returns of items from the main Seattle Public Library by patrons of the library. The database has been recording entries since September 2005. It captures daily in the evening the activities of the day’s checkouts. There are currently 104,548,543 entries as of September 14, 2022. Each data entry is time-stamped by the minute. Items checked-out by patrons consist of a broad range of types from books to cds, to dvds, etc. There are currently 75 itemtype labels. The data is multivariate. Each recorded entry has the following: (In a numeric sequence) ID: Each database entry has a unique ID number ItemNumber: Assigned when object enters system Dewey Classification (Dewey is numeric) (Time-Stamp) Check-out/check-in by minute, hour, day, month, year (Not necessarily numerically orderable) BibNumber: Each title has a specific number, copies of titles all have same number Barcode: Each item has a unique number on RFID sticker CallNumber: by which to locate items on shelves - Ordinal if Dewey, otherwise categorical Collection Code: What the item is and where its located (Text-based) Title: Each Item has a title ItemType: books, cds, dvds, music sheets, etc. Subjects: Keywords (arbitrary labeling)


Topics & Approaches Circulation Media Patterns Life of An object Classification Anomalites		The database provides a granular history of the interactions between patrons with a database (the library) recorded by the minute over a 17-year period starting in the 2nd decade of the internet. Potential questions to explore include: . What items circulate? For how long? At what volume? . Tracking the performance of specific titles, topics, categories, etc. . Correlate the performance of two or more titles . What are checkout performance and correlation to external world events . What media circulate? Comparison of different media with the same titles (movie, book, cd, etc.) . What patterns emerge in terms of what circulates at what hour of the day . What are temporal patterns throughout the day, days of the week, months, years . Is there a correlation between checkout and return times and topics? . What are co-occurrence patterns through frequency-pattern algorithm searches? . Prediction analysis: If certain things circulate over certain periods, what are the chances of … . Are there correlations between topics and items that disappear? . What are short-term, long-term performance of titles, topics, media, etc. . What is an object’s life expectancy in relation to the subject’s performance based on their ID? . Sequential history: when something is returned, what items are then checked-out . What are classification systems in the database . What are labeling, title length, and other classification conventions and anomalies . What are outliers in the database . What are anomalies in checkouts (items only circulate once) . What are errors in the system, in classification, etc.