Other undergraduate research projects under supervision.
Some projects I'm involved (in progress):
Data attribute-driven object selection technique for pen-based interfaces
Gaze-aware system for video comprehension powered by LLMs
Language learning support for low resource languages using machine translation
This project presents LangEye, an application that enables in-situ language learning. Learners can take a picture of real objects of their daily lives, save them as memories, and when ready review those memories. LangEye is a web mobile application for vocabulary learning and training in a foreign language.
LangEye applies the current machine translation, computer vision, large language models, and generative images technology to contextual vocabulary learning. LangEye draws from a theoretical framework for augmented reality and presents a practical application of those concepts using an ubiquitous and accessible platform, smartphones.
Card-it is a web application for learning Italian verb morphology, in other words, Italian verb conjugations. Unlike other flashcard applications (i.e., Anki), Card-it’s offers (1) the semi-automatic creation of cards using a Finite-State Morphological (FSM) analyzer, reducing repetitive labour and human error inputting the morphological data, and (2) the possibility of classroom integration with student analytics supporting students, teachers and autonomous learners of Italian as a second language.
See more: https://vialab.ca/research/card-it
M. Shimabukuro, J. Zipf, S. Yama, and C. Collins. 2023. “Evaluating Classroom Potential for Card-it: Digital Flashcards for Studying and Learning Italian Morphology,” in Proc. of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 130–136, Toronto, Canada. ACL.
S. Yama, “Card-IT Versus: A Competitive Multiplayer Game for Testing Italian Verb Morphology,” Bachelors Thesis, 2022.
This project presents a visualization technique for cross-linguistic error analysis in large learner corpora. H-Matrix combines a matrix, which is commonly used by linguists to investigate cross-linguistic patterns, with a tree diagram to aggregate and interactively re-weight the importance of matrix rows to create custom investigative views. Our technique can help experts to perform data operations, such as feature aggregation, filtering, ordering and language comparison interactively without having to reprocess the data. H-Matrix dynamically links the high-level multi-language overview to the extracted textual examples, and a reading view where linguists can see the detected features in context, confirm and generate hypotheses.
See more: https://vialab.ca/research/h-matrix
M. Shimabukuro, J. Zipf, M. El-Assady, and C. Collins, “H-Matrix: Hierarchical Matrix for Visual Analysis of Cross-Linguistic Features in Large Learner Corpora,” in Proceedings of the IEEE Conference on Information Visualization (short papers), 2019.
A known problem in information visualization labelling is when the text is too long to fit in the label space. There are some commonly known techniques used in order to solve this problem like setting a very small font size. On the other hand, sometimes the font size is so small that the text can be difficult to read. Wrapping sentences, dropping letters and text truncation can also be used. However, there is no research on how these techniques affect the legibility and readability of the visualization. In other words, we don’t know whether or not applying these techniques is the best way to tackle this issue. This thesis describes the design and implementation of a crowdsourced study that uses a recommendation system to narrow down abbreviations created by participants allowing us to efficiently collect and test the data in the same session. The study design also aims to investigate the effect of semantic context on the abbreviation that the participants create and the ability to decode them. Finally, based on the study data analysis we present a new technique to automatically make words as short as they need to be to maintain text legibility and readability.
See more: https://vialab.ca/research/abbreviating-text-labels-on-demand
M. Shimabukuro, “An Adaptive Crowdsourced Investigation of Word Abbreviation Techniques for Text Visualizations,” Master Thesis, 2017.
M. Shimabukuro and C. Collins, “Abbreviating Text Labels on Demand,” Proc. of IEEE Conf. on Information Visualization (InfoVis), 2017.
Many visualizations, including word clouds, cartographic labels, and word trees, encode data within the sizes of fonts. While font size can be an intuitive dimension for the viewer, using it as an encoding can introduce factors that may bias the perception of the underlying values. Viewers might conflate the size of a word’s font with a word’s length, the number of letters it contains, or with the larger or smaller heights of particular characters (‘o’ vs. ‘p’ vs. ‘b’). We present a collection of empirical studies showing that such factors-which are irrelevant to the encoded values-can indeed influence comparative judgements of font size, though less than conventional wisdom might suggest. We highlight the largest potential biases and describe a strategy to mitigate them.
See more:https://vialab.ca/research/perceptual-biases-in-font-size-as-a-data-encoding
E. Alexander, C. Chang, M. Shimabukuro, S. Franconeri, C. Collins, and M. Gleicher, “Perceptual Biases in Font Size as a Data Encoding,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, iss. 8, pp. 2397-2410, 2017.
E. Alexander, C. Chang, M. Shimabukuro, S. Franconeri, C. Collins, and M. Gleicher, “The Biasing Effect of Word Length in Font Size Encodings,” Proc IEEE Information Visualization (InfoVis), Posters, 2016.