This work addresses the challenge of semantic segmentation for COVID-19 CT lung lesions using three different models: U-Net, TransUNet, and Swin-Unet. These models were selected for their representation of pure CNNs, a combination of CNNs and Transformers, and pure Transformer architectures, respectively. The results show that all three models achieved an Intersection over Union (IoU) greater than 70% and a Dice coefficient exceeding 80%. Among them, TransUNet delivered the best performance, but with over 105M parameters. In contrast, U-Net, a simpler architecture, achieved similar results with significantly fewer parameters (30M), demonstrating that CNN-based architectures remain competitive with Transformers for semantic segmentation tasks.
Task-specific knowledge distillation of BERT
This project focused on task-specific knowledge distillation of BERT to smaller models, such as BiLSTM, 1D-CNNs, and small BERT (with fewer layers), for the classification of COVID-19 related tweets. The BERT-base model was first fine-tuned on the dataset, and then distilled to smaller models using soft-labels with KL-divergence of the predictions or MSE loss on the logits. The best performing model retained 94% of BERT's performance while reducing the number of parameters by 97%. Worked with PyTorch and Hugging face
GMVAE for clustering
In this project, I implemented a Gaussian Mixture Variational Autoencoder by representing the Categorical latent variable with the Gumbel-Softmax distribution avoiding the problem of multiple gradient estimators used in marginalization. Experiments showed around ~80% of clustering accuracy with multilayer perceptrons. Worked with PyTorch and Tensorflow.
CS231n: Convolutional Neural Networks for Visual Recognition
Implemented the assignments given by the CS231n course offered by Stanford University, which covers different machine learning topics including image classifiers (kNN, SVM, Softmax), CNNs, RNNs, LSTMs and GANs. Worked with python and Tensorflow.
This work presents a study of transfer learning applied to trademark image retrieval. Initially, selective search is used to obtain region proposals, which are then passed through pretrained CNN architectures (AlexNet, GoogleNet, and ResNet) trained on the ImageNet dataset. Feature representations are enhanced through the development of feature aggregation methods, such as average pooling, max pooling, and R-MAC, applied over intermediate layers. A re-ranking process based on a graph query-specific fusion algorithm is then used to further improve retrieval results. Experimental results demonstrate that using intermediate layers yields better performance for image retrieval, achieving approximately a 15% increase in mean average precision (mAP) compared to the baseline, which relied on transfer learning from the final layers. Worked with python and Caffe.
Project based on a competition offered by Kaggle. I built models based on feature selection, PCA and ensemble of classifiers by combining Random Forests and Gradient Boosting, increasing in ~5% the baseline accuracy. Worked with python.
University project based on a competition offered by Kaggle. Feature engineering was applied to the initial data, most of the data was based on categorical/text information, tf-Idf was applied for text data and one-hot encoding for categorical data. The classifiers employed in this activity were Logistic Regression and Neural Networks. Visualizations and plots were employed for better understanding of the data. *Rankings shown in the report are outdated. Worked with python and R.
In this work, I developed a parallel version of Gauss filter to reduce image noise and Sobel filter to detect corners. Two versions were implemented: a naive approach were only global memory was used and an improvement based on shared memory. Experiments showed an improvement of ~55x speedup over the serial version. Images resolutions of 720p, 4k, 8k and 16622x4740 were considered in the experiments. Worked with C++ and CUDA.
Jhosimar Arias, Darwin Saire, Juan Hernández, Ricardo Nishihara, Marcos Piaia.
The project consisted of three phases: Detection, Segmentation and Character Separation. I implemented the HOG descriptor to extract features of plate candidates in the detection phase and character separation based on projection of pixels. Worked with C.