0

I am trying to convert this C code I have into a python script so it's readily accessible by more people, but I am having problems understanding this one snippet.

int i, t;
for (i = 0; i < N; i++) {
 t = (int)(T*drand48());
 z[i] = t;
 Nwt[w[i]][t]++;
 Ndt[d[i]][t]++;
 Nt[t]++;
}

N is a value (sum of one column from an array. Elemental corrected me).

T is just a numerical value.

z, w, and d are memory allocations created from the N array. They were created with this method.

w = ivec(N);
d = ivec(N);
z = ivec(N);
int *ivec(int n) //
{
 int *x = (int*)calloc(n,sizeof(int));
 assert(x);
 return x;
}

Nwt & Ndt are both arrays too, with each element being a memory allocation? (Not sure). At least, each one of them was created by using the following method, passing in two different int's.

Nwt = dmat(W,T);
Ndt = dmat(D,T);
double **dmat(int nr, int nc) //
{
 int N = nr*nc;
 double *tmp = (double*) calloc(N,sizeof(double));
 double **x = (double**)calloc(nr,sizeof(double*));
 int r;
 assert(tmp);
 assert(x);
 for (r = 0; r < nr; r++) x[r] = tmp + nc*r;
 return x;
}

So looking at the first loop I posted, what are the following lines doing? I would like to accomplish the same thing in python, but since no memory allocation is needed, not sure what those three lines do, or how I would duplicate it in python.

Nwt[w[i]][t]++;
Ndt[d[i]][t]++;
Nt[t]++;

This is what I have so far:

for i in range(self.N):
 t = self.T * random.random()
 self.z[i] = t
 //** INCORRECT BELOW **
 //self.Nwt[self.N[i]] = t + 1 
 //self.Ndt[i] = t + 1
 //self.Nt[t + 1] += 1
Joe
48k37 gold badges166 silver badges262 bronze badges
asked Sep 15, 2010 at 14:45
1
  • Your edit changed your original to: 'N is a value (sum of one column from an array)' BUT I actually meant 'N is a value (Number of elements in a column from an array i.e. size of the array) Commented Sep 15, 2010 at 16:11

4 Answers 4

2

A suggestion for the Python part of things is to use numpy arrays to represent the matrices (and possibly the arrays too). But to be honest, you should not be concerned with that right now. That C-code looks ugly. Apart from that, different languages use different approaches to achieve the same thing. That is what makes such conversions hard. Try to get an understanding of the algorithm it implements (supposing that is what it does) and write that down in a language-agnostic way. Then think how you would implement that in Python.

Glorfindel
22.8k13 gold badges97 silver badges124 bronze badges
answered Sep 15, 2010 at 14:48
Sign up to request clarification or add additional context in comments.

Comments

1

In your translation, the first thing I would worry about is making sensical variable names, particularly for those arrays. Regardless, much of that translates directly.

Nwt and Ndt are 2D arrays, Nt is a one dimensional array. It looks like you're looping over all the 'columns' in the z array, and generating a random number for each one. Then you increment whichever column was picked in Nwt (row w[i]), Ndt (row d[i]) and Nt. The actual random value is stashed in z.

#Literal translation
for i in range(N):
 t = Random.randint(0,T) #Not sure on this... but it seems likely.
 z[i] = t
 Nwt[w[i]][t] += 1
 Ndt[d[i]][t] += 1
 Nt[t] += 1
#In place of w= ivec(N);
w = [0]*N
d = [0]*N
z = [0]*N
#In place of Nwt = dmat(W,T)
Nwt = [[0.0] * T] * W
Ndt = [[0.0] * T] * D

EDIT: corrected w/d/z initialization from "n" to "N"

Note that there are still some things wrong here, since it looks like N must equal W, and D... so tread carefully.

answered Sep 15, 2010 at 15:03

5 Comments

thank you for this. Since they were 2D arrays, and the ++ was on the end, it's hard for me to understand where it was incrementing. I thought I would throw out the w & d and only use i, but I still need them? I guess I don't understand how the ivec method above translates the w and d into python.
w and d are providing indexes into Nwt and Ndt, which are very large tables. Note that we're missing code here that fills w/d with sane values. Since at the moment we're only incrementing the first elements of Nwt and Ndt. (None of the code you've shown will ever have a value for w[i],d[i] other than 0)
The beginning of the script starts with has 3 columns from a txt file. documentID, wordID, and wordCount (how many times that word happened in that document). The variable N is the sum of the the wordCount column. Your translation is helping me a lot though! I really appreciate the help!
I am still not 100% sure on how it works. I have the value of D and W, they are both int's. Why are you multiplying by 0.0 in the last 5 elements in your example?
dmat produces a "Double MATrix", an array of size W, each element of which is an array of size T. The slightly odd syntax [0.0]*3 expands to the list [0.0,0.0,0.0] in python. If you enclose that in another expansion, you can do things like [[1]*2]*3, which will equal [[1,1],[1,1],[1,1]] (Try it in the python interpreter) Basically I'm using that list expansion trick to replace the ivec and dmat calls with simple python expressions that do the same thing.
1

Nwt and Ndt are 2-dimensional arrays. These lines:

Nwt[w[i]][t]++;
Ndt[d[i]][t]++;

Increment by 1 the value at one of the locations in each of the arrays. If you think of the addressing as array[column][row], then the column is chosen based on the value in some other one-dimensional array w and d (respectively) for the index i. t seems to be some random index.

You don't show what dmat function is doing, so hard to break that one down.

(Can't help you on the Python side, hopefully this helps clarify the C)

answered Sep 15, 2010 at 14:52

4 Comments

The dmat function is in the 3rd code snippet I posted in my original code. It's an allocation routine as well. I think that's what's throwing me off. I don't know C, and haven't had to do memory allocation. Thanks for your input, I am going to keep working through it and hopefully someone will be able to help with the python side.
Oops, missed that. Yeah, it is just allocating 2d floating point arrays with nr (number rows) and nc (number columns) dimensions. But note that it's doing something tricky where the actual array (x) is a "jump table" of pointers into another array (tmp), which is itself allocated and initialized to all zeroes. (Looks like you should just sit down with a pencil and paper and get a holistic sense of what the data structure is really doing before starting on the Python. I don't envy you-- this is crappy, uncommented C code.
Also, when you say "increment by 1 the value at one of the locations in each of the arrays". Do you mean in each dimension? (w, d, & both t's) or each array like each line? Increments the t only?
(Each line.) The statement Nwt[w[i]][t]++; finds a single location in the array Nwt, and adds one to it. The next line finds a single location in Ndt, and adds one to it. The "row" chosen in both arrays is the same (t).
1

Okay you seem to have a few ideas wrong. N is the size of the array.

dmat returns a matrix like thing which is represented by nr row(s) - where each row is an 'array' of nc doubles

ivec returns an 'array' of n integer elements.

So w[] and d[] represent indexes to the array of doubles.

The loop that you are having trouble with is used to increment certain elements of the matrices. One index appears pre-stored in the w and d arrays and the other generated randomly I suspect - with out knowing what the intent of the code is it is a bit difficult to understand the semantics.

Specifically it might help to know: Nwt[x][y]++ means increment (add 1) the matrix element at row x col y

Also must mention that this C code is ugly - no useful naming and no comments, fearless use C's nastiest syntax, really difficult to follow.

answered Sep 15, 2010 at 14:57

1 Comment

This script scans documents looking for relationships between words. The original document it starts with has 3 columns. documentID, wordID, and wordCount (how many times that word happened in that document). The variable N is the sum of the the wordCount column.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.