I compiled fastp v0.20.1 for WebAssembly application based on the patch file from Biowasm. It worked well if I only process one pair of fastq files (single run). However, after converting my web app for processing multiple files with a loop, the memory increased about 200 Mb after each cycle, so it can only process about 10 paired-end fastq files before exceeding the 2 Gb memory limit.
Firstly, I thought it was a WebAssembly problem because this did not happen when running on my Linux Desktop. I searched and got this article but did not understand it. So I had to use a dummy way: terminate the web worker after processing each file using this JavaScript file fastp-multiplex-v3.js. It works but is a little slower. I knew this was not the perfect way to solve the problem and I really wanted to find out why the memory increased after each cycle. Later it occurred to me that this could be a memory leak problem of fastp
itself, although I thought this should not be, because all the authors should have checked the memory leak of their software. I checked it anyway just in case with the program Valgrind
. And there were! The major leak is from the function process
of files “peprocessor.cpp” and “seprocessor.cpp”: there is only initPackRepository()
at the beginning but not destroyPackRepository()
at the end of the function, which causes 240 Mb of memory leak! Now I will NOT trust any developers and I will always check by myself. For all the fixes of fastp v0.20.1, please see my GitHub page fastp-0.20.1-JZ.
Here are the commands I used for the memory check:
- For single-end fastq files:
valgrind --tool=memcheck --leak-check=yes /path/to/fastp -i 1_R1_001.fastq.gz -o test1.fq.gz
- For paired-end fastq files:
valgrind --tool=memcheck --leak-check=yes /path/to/fastp -i 1_R1_001.fastq.gz -I 1_R2_001.fastq.gz -o test1.fq.gz -O test2.fq.gz
I found Valgrind
is really useful because it can point out which line in which file the bug is from. The quick start page of Valgrind
is a must-see. Several important notes are listed below:
- Modify the Makefile to make sure it has
-g
for including exact line numbers in the report and-O0
(or-O1
) for more accurate error messages; - It’s worth fixing errors in the order they are reported, as later errors can be caused by earlier errors;
- The first “by” in each error block usually is the causes and later “by”s are just due to the first one;
- Message “Mismatched free() / delete / delete []”: possibly need to use
delete []
other thandelete
; - Possibly you can first go to the end of the report “HEAP SUMMARY” and find out the most serious memory leak such as”80,000,000 bytes in 1 blocks are possibly lost in loss record 71 of 72” (the 2nd “by” is the cause in this case).