Okay, so I have a simple interface that I designed with the Django framework that takes natural language input from a user and stores it in table.
Additionally I have a pipeline that I built with Java using the cTAKES library to do named entity recognition i.e. it will take the text input submitted by the user and annotate it with relevant UMLS tags.
What I want to do is take the input given from the user then once, its submitted, direct it into my java-cTAKES pipeline then feed the annotated output back into the database.
I am pretty new to the web development side of this and can't really find anything on integrating scripts in this sense. So, if someone could point me to a useful resource or just in the general right direction that would be extremely helpful.
========================= UPDATE:
Okay, so I have figured out that the subprocess is the module that I want to use in this context and I have tried implementing some simple code based on the documentation but I am getting an
Exception Type: OSError
Exception Value: [Errno 2] No such file or directory
Exception Location: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py in _execute_child, line 1335.
A brief overview of what I'm trying to do:
This is the code I have in views. Its intent is to take text input from the model form, POST that to the DB and then pass that input into my script which produces an XML file which is stored in another column in the DB. I'm very new to django so I'm sorry if this is an simple fix, but I couldn't find any documentation relating django to subprocess that was helpful.
def queries_create(request):
if not request.user.is_authenticated():
return render(request, 'login_error.html')
form = QueryForm(request.POST or None)
if form.is_valid():
instance = form.save(commit=False)
instance.save()
p=subprocess.Popen([request.POST['post'], './path/to/run_pipeline.sh'])
p.save()
context = {
"title":"Create",
"form": form,
}
return render(request, "query_form.html", context)
Model code snippet:
class Query(models.Model):
problem/intervention = models.TextField()
updated = models.DateTimeField(auto_now=True, auto_now_add=False)
timestamp = models.DateTimeField(auto_now=False, auto_now_add=True)
UPDATE 2: Okay so the code is no longer breaking by changing the subprocess code as below
def queries_create(request):
if not request.user.is_authenticated():
return render(request, 'login_error.html')
form = QueryForm(request.POST or None)
if form.is_valid():
instance = form.save(commit=False)
instance.save()
p = subprocess.Popen(['path/to/run_pipeline.sh'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
(stdoutdata, stderrdata) = p.communicate()
instance.processed_data = stdoutdata
instance.save()
context = {
"title":"Create",
"form": form,
}
return render(request, "query_form.html", context)
However, I am now getting a "Could not find or load main class pipeline.CtakesPipeline" that I don't understand since the script runs fine from the shell in this working directory. This is the script I am trying to call with subprocess.
#!/bin/bash
INPUT=1ドル
OUTPUT=2ドル
CTAKES_HOME="full/path/to/CtakesClinicalPipeline/apache-ctakes-3.2.2"
UMLS_USER="####"
UMLS_PASS="####"
CLINICAL_PIPELINE_JAR="full/path/to/CtakesClinicalPipeline/target/
CtakesClinicalPipeline-0.0.1-SNAPSHOT.jar"
[[ $CTAKES_HOME == "" ]] && CTAKES_HOME=/usr/local/apache-ctakes-3.2.2
CTAKES_JARS=""
for jar in $(find ${CTAKES_HOME}/lib -iname "*.jar" -type f)
do
CTAKES_JARS+=$jar
CTAKES_JARS+=":"
done
current_dir=$PWD
cd $CTAKES_HOME
java -Dctakes.umlsuser=${UMLS_USER} -Dctakes.umlspw=${UMLS_PASS} -cp
${CTAKES_HOME}/desc/:${CTAKES_HOME}/resources/:${CTAKES_JARS%?}:
${current_dir}/${CLINICAL_PIPELINE_JAR} -
-Dlog4j.configuration=file:${CTAKES_HOME}/config/log4j.xml -Xms512M -Xmx3g
pipeline.CtakesPipeline $INPUT $OUTPUT
cd $current_dir
I'm not sure how to go about fixing this error so any help is appreciated.
-
1I am not at all familiar with cTAKES, so I apologize if this is an ignorant question: are you already running this Java service on an existing machine and looking to pipe data to it from your web app, or are you looking to deploy both the web app and the Java app?souldeux– souldeux2016年03月21日 15:00:40 +00:00Commented Mar 21, 2016 at 15:00
-
1I'm looking to deploy the pipeline as part of the web app. I want to use the java script internally.jdv12– jdv122016年03月21日 16:14:58 +00:00Commented Mar 21, 2016 at 16:14
1 Answer 1
If I understand you correctly, you want to pipe the value of request.POST['post'] to the program run_pipeline.sh and store the output in a field of your instance.
You are calling subprocess.Popen incorrectly. It should be:
p = subprocess.Popen(['/path/to/run_pipeline.sh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)Then pass in the input and read the output
(stdoutdata, stderrdata) = p.communicate()Then save the data, e.g. in a field of your instance
instance.processed_data = stdoutdata instance.save()
I suggest you first make sure to get the call to the subprocess working in a Python shell and then integrate it in your Django app.
Please note that creating a (potentially long-running) subprocess in a request is really bad practice and can lead to a lot of problems. The best practice is to delegate long-running tasks in a job queue. For Django, Celery is probably most commonly used. There is a bit of setup involved, though.