More on Minipar – Java interface
As I mentioned in my last post, I was trying to access the Minipar library from Java. Our current approach which uses the pdemo program to parse each sentence has a performance problem. It took more than an hour to parse 28 research articles (about 265ms per sentence). Most of time may be spent on creating the pdemo process and loading the data files, which were done before parsing every sentence. So, I wrote a Java proxy class which calls a C++ proxy class which then calls the Minipar library. The initialization code is only called once at the beginning and there is no need to create a process. The Java code calls the C++ library through the Java Native Interface (JNI). The illustration below shows the basic process of a call of parsing a sentence.
Main Java program -> MiniparProxy.java -> MiniparProxy.cpp -> Minipar library
The improvement on performance is significant. See the table below for a comparison. 28 research articles are processed (17558 sentences, 341980 word tokens). The current method is about 20 times faster than the original one.
| New Minipar2.java | Original Minipar.java | |
| Total time (min) | 3.90 | 77.77 |
| Time per document (s) | 8.37 | 166.66 |
| Time per sentence (ms) | 13.34 | 265.77 |
| Time per token (ms) | 0.68 |
13.65 |
Since this is an ongoing project, I’ll not publish the source code here. Please contact me if you need more details on accessing Minipar in Java.
June 16, 2009 at 5:28 am
Hi there,
I am currently looking at the Stanford Parser, but it is very slow. I would like to test your work using the Java proxy and Minipar. Is there a way in which I can obtain the proxy stuff you have written?
Best regards,
Jethro
July 31, 2009 at 10:45 am
Hi:
I am currently working on a project with the mini with the intention of comparing it with the parser stanford, including a clustering method called k-means also with JAVA.
I would love to see how to implement the java code to work with the mini.
My email contact is garciatjm@gmail.com
Greetings.
Jesus.
August 10, 2009 at 8:00 am
HI:
I was seeing how your code works with java but I do not understand something, how can I use the java print_triples from? I see that got MiniparProxy.ccp but not in use or maybe I have not noticed, unlike pdemo be passed parameters (-h,-t,-p) I hope your help if you’ve used this function I try to make the relationships in this way but not in the form of trees that normally goes.
Jesus.
August 11, 2009 at 6:49 am
hi:
In your code MiniparProxy.java has a function called print_triples, but not using, and try to squeeze that is leaving me but I need to mark errors in compiling.
Jesus.