2
\$\begingroup\$

I wrote this code that takes the vessel coordinates and checks them against every possible port coordinates. If the distance between the vessel and the port coordinates is within x meters (in this example, it has to be less than 3 km), then add that port and the distance_to_the_port to the data frame.

Example dataframes:

df_ports = pd.DataFrame({'Ports':['Port_A','Port_B','Port_C'], 'latitude': [1,5,3], 'longitude':[1,10,5]})
latitude_and_longitude_columns = df_ports.iloc[:, [1,2]]
df_ports['coordinates_tuple'] = latitude_and_longitude_columns.apply(tuple, axis=1)
df_vessel = pd.DataFrame({'latitude': [1,5,7], 'longitude':[1,10,20]})
latitude_and_longitude_columns = df_vessel.iloc[:, [0,1]]
df_vessel['coordinates_tuple'] = latitude_and_longitude_columns.apply(tuple, axis=1)
df_vessel['distance_to_port_in_meters'] = -1
df_vessel['port'] = -1

The function that checks the distance:

def get_distance_between_two_points(tuple1, tuple2):
 from geopy import distance 
 radius = 3000
 dist = distance.distance(tuple1, tuple2).m
 if dist < radius:
 return dist 
 return -1

The logic:

for _, port_row in df_ports.iterrows():
 port = port_row['Ports']
 result = df_vessel.apply(lambda vessel_row : get_distance_between_two_points(vessel_row['coordinates_tuple'], port_row['coordinates_tuple']), axis=1)
 for index, value in result[result != -1].items():
 df_vessel.loc[index, 'distance_to_port_in_meters'] = int(value)
 df_vessel.loc[index, 'port'] = port

Result:

 latitude longitude coordinates_tuple distance_to_port_in_meters port
0 1 1 (1, 1) 0 Port_A
1 5 10 (5, 10) 0 Port_B
2 7 20 (7, 20) -1 -1

The result is correct and it works with my bigger data sample.

How can I improve this?

In my current code, I have 4800 vessel coordinates and 137 ports. It takes just above 2 minutes to go through the code. How can I make my code better and possibly faster?

asked May 18, 2021 at 11:58
\$\endgroup\$
3
  • 2
    \$\begingroup\$ In my real code, I use a different one - so show your real code? \$\endgroup\$ Commented May 18, 2021 at 13:51
  • \$\begingroup\$ Based only on latitude and longitude, great-circle distance is a somewhat involved calculation. Are you calculating great circles? \$\endgroup\$ Commented May 18, 2021 at 13:53
  • \$\begingroup\$ I have edited the function. I have been reading that the last thing to do in pandas is to loop over a data frame. Is it possible to avoid the for loop in my code? \$\endgroup\$ Commented May 19, 2021 at 6:52

1 Answer 1

2
\$\begingroup\$

You need to drop your use of geopy. So far as I can see it does not support vectorised inverse geodesics. Instead use cartopy which inherently supports this.

The following example code uses random instead of real data, but with the same size as your real data, and completes in about one second. distances and close are both matrices whose rows correspond to vessels and columns correspond to ports. close is a matrix of booleans for vessel-port pairs that are proximate.

import numpy as np
from cartopy.geodesic import Geodesic
from numpy.random import default_rng
N_VESSELS = 4_800
N_PORTS = 137
rand = default_rng(seed=0)
vessels = rand.random((N_VESSELS, 2)) + (1, -51)
ports = rand.random((N_PORTS, 2)) + (1, -51)
source = vessels.repeat(N_PORTS, axis=0)
dest = np.tile(ports, (N_VESSELS, 1))
geo = Geodesic()
distances = geo.inverse(source, dest).base[:, 0].reshape((N_VESSELS, N_PORTS))
close = distances < 3_000
answered May 19, 2021 at 19:46
\$\endgroup\$
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.