2022f


Instructor



MAT 265 Open Projects in Optical/Motion-Computational Processes


George Legrady

Elings Hall, lab 2611, CNSI Building 2nd floor - Tues-Thurs 3:30-5:50pm


Course Format

MAT265 is a studio production course with two directions:

1) Directed Research
for advanced MAT students focusing on a topic intersecting computation and visualization. The goals of the course are to provide opportunities for creative experimentation, potentially culminating with a project, or proposal, or research development, or dissertation work.

The workload consists of regular individual meetings and occasional group presentations on Tuesdays to share with the class the evolution of each students' work. Research format:
a) Identifying a topic of interest at the start of the quarter
b) Define the schedule to realize the project by the end of the 10-week course
c) Document the 10-week work in a report to be posted at this website.


2) Data Analysis of a Growing Database
For students with some experience in MySQL and data analysis to work with a collection of over 104 million datasets collected by the hour since September 2005. The goal of the course is to acquire skills in analyzing the collection, and to discover patterns, anomalies, content distribution, and overall performance during the 17-year period in which the data has been collected.

Students will acquire experience in working with a real world data collection that continues to grow. They will benefit in both experience and also identifying performance perspectives of the data, which may be ideal for research and publication. The data is to be analyzed through MySQL. 

Workload includes weekly meetings on Thursdays, becoming proficient in MySQL queries, and resulting in insightful data queries. Projects are to be posted at the Student Forum

Skills to be gained in the course include:
  • Experience in MySQL programming

  • Application of algorithms for cluster analysis, association rules, etc.

  • Identification of patterns in data performance (frequency pattern)

  • Identifying anomalies within classification systems

  • Exploration of outliers and causes
 

Data Analysis Schedule

09.22.22

09.27.22

10.04.22

10.11.22

10.18.22

10.25.22

11.01.22

11.08.22

11.15.22

11.22.22

11.29.22

12.6.22

The following section of this schedule is for the MySQL data analysis work

Review of various MySQL queries

Database exploration through MySQL

Database exploration through MySQL

Database exploration through MySQL

Research Topic Development

Research Topic Development

Mid-Term Class Presentation

Focused Research

Focused Research

Thanksgiving 

Focused Research

Final Presentation & Project Reports due


 
The Seattle Library Database







Ordinal




Interval Scale


Categorical





Semantic

The database consists of a rich and unique content-based multivariate collection, consisting of the checkouts and returns of items from the main Seattle Public Library by patrons of the library. The database has been recording entries since September 2005. It captures daily in the evening the activities of the day’s checkouts. There are currently 104,548,543 entries as of September 14, 2022. Each data entry is time-stamped by the minute. Items checked-out by patrons consist of a broad range of types from books to cds, to dvds, etc. There are currently 75 itemtype labels. The data is multivariate. Each recorded entry has the following:

(In a numeric sequence)
ID: Each database entry has a unique ID number
ItemNumber: Assigned when object enters system
Dewey Classification (Dewey is numeric)

(Time-Stamp)
Check-out/check-in by minute, hour, day, month, year

(Not necessarily numerically orderable)
BibNumber: Each title has a specific number, copies of titles all have same number
Barcode: Each item has a unique number on RFID sticker
CallNumber: by which to locate items on shelves - Ordinal if Dewey, otherwise categorical
Collection Code: What the item is and where its located

(Text-based)
Title: Each Item has a title
ItemType: books, cds, dvds, music sheets, etc.
Subjects: Keywords (arbitrary labeling)


 
Topics & Approaches



Circulation




Media

Patterns





Life of An object




Classification


Anomalites



The database provides a granular history of the interactions between patrons with a database (the library) recorded by the minute over a 17-year period starting in the 2nd decade of the internet. Potential questions to explore include:

. What items circulate? For how long? At what volume?
. Tracking the performance of specific titles, topics, categories, etc.
. Correlate the performance of two or more titles
. What are checkout performance and correlation to external world events

. What media circulate? Comparison of different media with the same titles (movie, book, cd, etc.)

. What patterns emerge in terms of what circulates at what hour of the day
. What are temporal patterns throughout the day, days of the week, months, years
. Is there a correlation between checkout and return times and topics?
. What are co-occurrence patterns through frequency-pattern algorithm searches?
. Prediction analysis: If certain things circulate over certain periods, what are the chances of …

. Are there correlations between topics and items that disappear?
. What are short-term, long-term performance of titles, topics, media, etc.
. What is an object’s life expectancy in relation to the subject’s performance based on their ID?
. Sequential history: when something is returned, what items are then checked-out

. What are classification systems in the database
. What are labeling, title length, and other classification conventions and anomalies

. What are outliers in the database
. What are anomalies in checkouts (items only circulate once)
. What are errors in the system, in classification, etc.