Collection of Jupyter Notebooks

A set of Jupyter notebooks covering distinct data science topics I've been researching.

Vector search optimization using NVIDIA Nsight Systems

This notebook focuses on optimizing vector search operations by comparing implementations such as Faiss, cuVS, and CuPy. It uses NVIDIA Nsight Systems for profiling and performance analysis to enhance GPU-accelerated nearest neighbor search speed and scalability.

CUDA Kernel Execution Timeline for Vector Search

Timeline visualization from NVIDIA Nsight Systems showing the execution of CUDA kernels and memory operations during the vector search process.

View Notebook

Detecting near duplicates using Jaccard similarity and MinHashing

Explores methods to detect near duplicates in data using Jaccard similarity and MinHashing techniques.

View Notebook

Topic Modeling of Austrian Reddit Posts Using BERTopic

Demonstrates the use of BERTopic for topic modeling on Reddit posts related to Austria, from the period of the 2024 European Parliament elections. It includes data preprocessing, topic extraction, and visualization of topic trends over time. The analysis uncovers key themes in the Reddit dataset, leveraging statistical learning and unsupervised clustering of keywords.

Topics Over Time During the Austrian Election Cycle

This graph visualizes the frequency of selected discussion topics on Austrian Reddit over time, highlighting how public sentiment aligns with the election cycle.

View Notebook

Data preprocessing for Bayesian model of bike rentals

Covers essential steps for preparing a (bike rental) dataset for Bayesian network modeling. It includes data distribution inspection, outlier and multicollinearity checks, missing value imputation, continuous variable categorization, and calculation of Weight of Evidence (WoE) and Information Value (IV) scores.

View Notebook