I found a performance problem in a bigger project which uses Minipar as our dependency parser. Since our project is written in Java and Minipar only provides a static library for Linux, every sentence is process with the pdemo program comes with Minipar. When process each sentence, the pdemo program has to load, process and exit. Apparently, many time are wasted on loading pdemo. In addition, our Java program communicates with pdemo through reading the console input and output streams, which may be not so efficient. So, I decided to write a wrapper library for Minipar using C++, which then can be called in Java through JNI.
The first thing is to get the pdemo program compiled and linked to Minipar static library. Then I can write my own wrapper code. And I decided to do this on our Linux box.
Fedora Core 6
Kernel 18.104.22.168-72.fc6 on an i686.
gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)
There are a lot of linking problems come out during the process. First, the Minipar static library was compiled on an older gcc version (3.2.2). You’ll see when you open libminipar.a with a text editor. I tried many options with the current gcc. I am still not able to successfully link pdemo.o to the Minipar library. The problem, I think, is to link code compiled with different versions of gcc.
So, I downloaded gcc 3.2.3 and tried to compile and install it. It wasn’t going to make through the build process. Some errors came out again. After a little bit googling, it seems the problem is compiling older gcc with newer one. A suggested solution is to install a version of gcc between these two as a way of bootstrap. Following this method, I tried gcc 3.3.6 which also failed to build. Finally, I successfully build gcc 3.4.0 and was able to build gcc 3.2.3 using gcc 3.4.0.
Then, I tried to statically link pdemo to Minipar library. Some errors with glibc and libz poped out. It looks like that those two libraries are compiled with newer gcc on the system so they had problem linking to library compiled with older gcc.
5AM in the morning, I built the glibc and libz which is such a long process. I went to bed; when I got up, the build finished. With all the libraries compiled with gcc 3.2.3, I finally linked pdemo to Minipar library. Cheers!
As I was trying statically link, the generated pdemo executable is about 20Mb. No doubt, I removed “–static” option. This time, it successfully linked even without my glibc and libz. In conclusion, if you are going to link a program to Minipar library, use the right version of gcc to compile your program and link without “–static” option.
libretto 31% ./pdemo -p ../data/ > This is a test sentence. ( E0 (() fin C * ) 1 (This ~ N 2 s (gov be)) 2 (is be VBE E0 i (gov fin)) E2 (() this N 5 subj (gov sentence) (antecedent 1)) 3 (a ~ Det 5 det (gov sentence)) 4 (test ~ A 5 mod (gov sentence)) 5 (sentence ~ N 2 pred (gov be)) 6 (. ~ U * punc) )