Data Gathering

The process of gathering data is a vital step in the data science workflow, playing a significant role in influencing the outcomes of data science endeavors. This phase affects the accuracy of the analysis, defines the study's scope, aids in decision-making processes, improves the accuracy of predictions, and supports the identification of trends and patterns within the data. Conclusions drawn from data of subpar quality can be misleading or biased, whereas high-quality data contributes to the development of more reliable and effective models. It's essential to exercise careful consideration and thorough attention during the data collection process to ensure the data's integrity and the validity of any subsequent findings derived from it. 

Considering the scope of this project,  multiple datasets from various sources will be leveraged. 

Source 

Audio Data of One Piece Game Characters: 

"The Sound Resource" is an invaluable online repository for anyone seeking high-quality sound effects from video games. For this project, this platform was leveraged to source sound clips for three characters from the "One Piece" game. This website stands out for its comprehensive and diverse collection, focusing exclusively on game sound effects. It's regularly updated with sounds from a vast array of games, ranging from contemporary hits to classic titles. The site's meticulous categorization and easy-to-navigate interface make it an ideal source for finding specific audio clips, which are essential in bringing an authentic auditory dimension to projects that involve game character sounds. The downloaded sound clips are in the '.wav' file format. 

Dataset Link 🔗

Source 

Ganyu: Genshin Impact Anime Faces: 

This dataset, presents a unique collection of 850 high-resolution images of Ganyu, a character from the popular game Genshin Impact. This dataset was specifically created to explore the effectiveness of GANs in generating anime-style faces. The images were meticulously compiled by scraping from Danbooru, followed by cropping with lbpcascade_animeface and further processing to align and prune the images for optimal quality. Tools like dfiPy and groupImg were used for duplicate removal and organization, respectively. After a thorough manual selection process, the images were upscaled and standardized to a resolution of 512x512 pixels, making this dataset a valuable resource for research and development in anime-style character generation using advanced GANs.

Dataset Link 🔗

Source 

Grid Audio-Visual Speech Corpus Data: 

The Grid Audio-Visual Speech Corpus, available on Zenodo, is a substantial multi-talker audiovisual sentence corpus specifically designed for supporting studies that combine computational and behavioral approaches in speech perception. This corpus includes high-quality audio and video recordings of 1000 sentences, each spoken by 34 talkers (18 male and 16 female), totaling 34,000 sentences. The comprehensive nature of this corpus, with its varied range of speakers and sentences, makes it a valuable resource for research in speech perception and related fields​. 

Dataset Link 🔗