Hire me!
Summary
I have nearly 8 years of experience in the AI field, including 6 years specializing in natural language processing (NLP) for the Vietnamese language. I am adept at solving NLP problems at both low and high levels, utilizing traditional machine learning techniques as well as deep learning approaches. I have developed and deployed numerous ultra-compact models suitable for edge integration, as well as large-scale models requiring interaction with extensive, distributed infrastructures, such as large language models.
Additionally, I have experience working on projects involving computer vision and signal processing. My cross-disciplinary project background enables me to apply knowledge across various domains, particularly in multi-modal applications.
I am also capable of organizing and teaching beginner-level classes on machine learning and natural language processing.
Education
B.S. in Information System, Hanoi University of Science and Technology (HUST)
Major: Artificial Intelligence, Data science, Natural language processing
Work experience
- May 2021 - August 2025: Vingroup Big Data Institute
- Natural Language Processing Specialist
- Lead of the Language Representation Team
- Lead Physical AI team (AI for robotics)
- Main responsibilities:
- Working on Vietnamese virtual assistant project for VinFast cars
- Development of the VinBase chatbot platform
- Building large language models and large multi-modal model
- Sep 2018 - Jun 2021: Sun Asterisk Inc.
- AI Research Engineer
- Main responsibilities:
- Create AI features for internal products
- Build Proof of Concepts
- Research, develop projects/applications for the company’s customers
- Working in a research team, writing the conference papers
- Jan 2020 - Jun 2020: TechMaster Vietnam Ltd
- Trainer/ Teacher
- Main responsibilities:
- Teaching about Natural Language Processing with deep learning
- Creating Hand on lab
Skills
- Advantages: Artificial Intelligence, Math, Data Structure & Algorithms, Logic Thinking
- Data storage: MySQL, SQL server, MongoDB, Elasticsearch
- Programming language: Python
- Library and Framework: Tensorflow, Keras, Pytorch, Huggingface’s transformers, Scikit-learn, Nltk, Gensim, Spacy, Pandas, Flask, Rasa, etc.
- Version control: Git
- Operating systems: Windows, Linux
- Background: Natural language processing, Machine learning, Deep learning, Data mining, Recommendation system
- LLM skills: Pretraining from scratch, Continous training, Adating, SFT, Deployment,…
Projects
- Multi-modal LLM - Visual Language Action for general robot control (low-level control in Unitree G1 and Google robot simulation)
- Vietnamese Speech2Speech Model for Human-Robot Interaction (speech and command fusion)
- Vietnamese large language model - ViGPT, VBD-Buffalo, LLaMA3-Frog,…
- Large multi-modal model - VLM, VVLM (V-LLaVA, V-LLASM,…)
- Multi-modal retrieval (image - textual), Visual-docs retrieval
- Vietnamese word segmentation, Part-of-speech tagging, Named entity recognition, Dependency parsing.
- Text classification: Sentiment analysis, Hate speech detection, Fake news detection, Identification of Informative tweets.
- Sequence Tagging: Toxic spans detection
- Relation extraction, Entity-Relation extraction, Event Extraction
- Automatic summarization: Automatic scientific article summarization by using an abstractive approach
- Accent restoring for the Vietnamese language
- Machine Reading Comprehension
- Banking chatbot, 3D Virtual Assistant, Japanese chatbot.
- OCR (ID card identification, OCR Engine, Receipt Recognition)
- Speech synthesis (text-to-speech)
- Face recognition, Face searching
- Speech emotion recognition
- Audio sentiment analysis, Multimodal sentiment analysis (image-text).
- Similarity-based product search engine
- Audio steganography, Protect the copyright of DNN, etc.
Publications
Q. P. Huu, T. H. Dinh, N. N. Tran, T. P. Van and T. T. Minh, "Deep Neural Networks based Invisible Steganography for Audio-into-Image Algorithm," 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE), 2019, pp. 423-427, doi: 10.1109/GCCE46687.2019.9015498.
Q. H. Pham, V. Anh Nguyen, L. B. Doan, N. N. Tran and T. M. Thanh, "From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection," 2020 12th International Conference on Knowledge and Systems Engineering (KSE), 2020, pp. 37-42, doi: 10.1109/KSE50997.2020.9287406.
@inproceedings{doan-bao-etal-2020-sunbear, title = "{S}un{B}ear at {WNUT}-2020 Task 2: Improving {BERT}-Based Noisy Text Classification with Knowledge of the Data domain", author = "Doan Bao, Linh and Nguyen, Viet Anh and Pham Huu, Quang", booktitle = "Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.wnut-1.73", doi = "10.18653/v1/2020.wnut-1.73", pages = "485--490", }
@inproceedings{nguyen-etal-2021-nlp, title = "{S}-{NLP} at {S}em{E}val-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging", author = "Nguyen, Viet Anh and Nguyen, Tam Minh and Quang Dao, Huy and Huu Pham, Quang", booktitle = "Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.semeval-1.120", doi = "10.18653/v1/2021.semeval-1.120", pages = "888--897", }
@inproceedings{thanh-van-2020-reintel, title = "{R}e{INTEL} Challenge 2020: Exploiting Transfer Learning Models for Reliable Intelligence Identification on {V}ietnamese Social Network Sites", author = "Thanh, Kim Nguyen Thi and Van, Kiet Nguyen", booktitle = "Proceedings of the 7th International Workshop on Vietnamese Language and Speech Processing", month = dec, year = "2020", address = "Hanoi, Vietnam", publisher = "Association for Computational Lingustics", url = "https://aclanthology.org/2020.vlsp-1.9", pages = "45--48", }
@InProceedings{Nguyen_2021_CVPR, author = {Nguyen, Tam Minh and Pham, Quang Huu and Doan, Linh Bao and Trinh, Hoang Viet and Nguyen, Viet-Anh and Phan, Viet-Hoang}, title = {Contrastive Learning for Natural Language-Based Vehicle Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {4245-4252} }
Activities
January 01, 2015
Training at HUSTer, Hanoi, Vietnam
June 06, 2019
Talk at Vietnam Mobile Day, Hanoi, Vietnam
October 06, 2019
Talk at VTV24, rubikAI và Nexus Frontier Tech, Hanoi, Vietnam
December 04, 2019
Talk at Hanoi University of Science and Technology, Hanoi, Vietnam
December 07, 2019
Talk at Rubik AI, Hanoi, Vietnam
January 01, 2020
Training at Techmaster Vietnam, Hanoi, Vietnam
December 13, 2022
Training at Ministry of National Defense, Hanoi, Vietnam
December 22, 2024
Talk at NTI-VietAI, Online
Honors and Awards