31 """@brief Initializes the DroneRobot environment and sets default parameters.
33 This constructor initializes the simulation environment, sets up the observation and action spaces,
34 and defines key parameters such as PID gains, motor power, and episode configurations.
35 Additionally, it configures the random seed for reproducibility and sets up the simulation timestep.
38 - Initializes debug mode, random seed, and task status.
39 - Configures observation and action spaces using Gym's `Box`.
40 - Sets up PID gains with random initial values and motor power to zero.
41 - Defines simulation parameters such as timestep and maximum steps per episode.
42 - Maintains error tracking for the PID controller.
44 @note The method assumes that the superclass `RobotSupervisorEnv` is correctly implemented.
81 'X_P': random.uniform(0,100),
82 'X_I': random.uniform(0,100),
83 'X_D': random.uniform(0,100),
85 'Y_P': random.uniform(0,100),
86 'Y_I': random.uniform(0,100),
87 'Y_D': random.uniform(0,100),
89 'Z_P': random.uniform(0,100),
90 'Z_I': random.uniform(0,100),
91 'Z_D': random.uniform(0,100),
120 self.
observation_space = Box(low=np.array([-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf,-np.inf]),
121 high=np.array([np.inf,np.inf,np.inf,np.inf,np.inf,np.inf,np.inf,np.inf,np.inf,np.inf,np.inf,np.inf]),
126 high=np.array([100,100,100,100,100,100,100,100,100]),
136 @brief Updates the motor power for the drone based on the PID controller and the current location.
138 This method calculates the error between the drone's current location and the target location for
139 each axis (x, y, z) and adjusts the motor power accordingly using PID control. It also applies yaw
140 control to align the drone towards the target direction. For testing purposes, yaw gain can be set to
143 @param current_location A list containing the drone's current position (x, y, z) and yaw angle.
146 - Computes proportional (P), integral (I), and derivative (D) terms for PID control on each axis.
147 - Normalizes yaw angle errors to the range [-π, π].
148 - Updates motor power values for maintaining stability and reaching the target location.
149 - Clips motor power values to ensure they remain within the allowed range.
151 @note Assumes the `current_location` input includes yaw as the third element.
152 @see self.pid_gains, self.motor_power
166 desired_yaw = math.atan2(delta_y, delta_x)
169 current_yaw = math.atan2(current_location[1], current_location[0])
172 error_yaw = desired_yaw - current_yaw
173 error_yaw = (error_yaw + math.pi) % (2 * math.pi) - math.pi
184 for axis
in [
'x',
'y',
'z']:
186 P = self.
pid_gains[f
'{axis.upper()}_P'] * error[axis]
196 PID_output = P + I + D
200 motor_power[
'm1'] -= PID_output
201 motor_power[
'm2'] += PID_output
202 motor_power[
'm3'] += PID_output
203 motor_power[
'm4'] -= PID_output
205 motor_power[
'm1'] -= PID_output
206 motor_power[
'm2'] -= PID_output
207 motor_power[
'm3'] += PID_output
208 motor_power[
'm4'] += PID_output
210 motor_power[
'm1'] += PID_output
211 motor_power[
'm2'] += PID_output
212 motor_power[
'm3'] += PID_output
213 motor_power[
'm4'] += PID_output
217 motor_power[
'm1'] += yaw_P
218 motor_power[
'm2'] -= yaw_P
219 motor_power[
'm3'] += yaw_P
220 motor_power[
'm4'] -= yaw_P
223 motor_power = {key: np.clip(value, 0, self.
max_velocity)
for key, value
in motor_power.items()}
229 for axis
in [
'x',
'y',
'z']:
237 @brief Converts an axis-angle representation to Euler angles in degrees.
239 This method takes a 4D vector representing an axis-angle (x, y, z, angle)
240 and converts it to the corresponding Euler angles (roll, pitch, yaw) in degrees.
242 @param axis_angle A list or tuple containing the axis components (x, y, z) and the rotation angle.
243 @return A tuple (roll, pitch, yaw) representing the Euler angles in degrees.
246 - Normalizes the axis vector if its magnitude is non-zero.
247 - Converts the axis-angle to a quaternion and then to Euler angles.
248 - Handles edge cases to ensure angles are properly constrained.
250 @note The input angle is expected in radians.
254 x, y, z, angle = axis_angle
257 axis_length = math.sqrt(x*x + y*y + z*z)
267 w = math.cos(angle_rad / 2)
268 x = x * math.sin(angle_rad / 2)
269 y = y * math.sin(angle_rad / 2)
270 z = z * math.sin(angle_rad / 2)
275 t0 = +2.0 * (w * x + y * z)
276 t1 = +1.0 - 2.0 * (x * x + ysqr)
277 roll = math.degrees(math.atan2(t0, t1))
279 t2 = +2.0 * (w * y - z * x)
280 t2 = +1.0
if t2 > +1.0
else t2
281 t2 = -1.0
if t2 < -1.0
else t2
282 pitch = math.degrees(math.asin(t2))
284 t3 = +2.0 * (w * z + x * y)
285 t4 = +1.0 - 2.0 * (ysqr + z * z)
286 yaw = math.degrees(math.atan2(t3, t4))
288 return roll, pitch, yaw
345 @brief Computes the reward based on the drone's current state and actions.
347 This method evaluates the drone's performance in the environment by calculating a composite reward
348 that combines various factors such as distance to the target, stability, efficiency, and task completion.
350 @param action (Optional) The action taken by the drone, though not directly used in the current implementation.
351 @return A float value representing the computed reward for the current timestep.
354 - **Distance to Target:** A quadratic penalty based on the squared distance to the target location.
355 - **Stability Reward:** Penalizes abrupt changes in velocity.
356 - **Efficiency Reward:** Adds a small time penalty to encourage faster task completion.
357 - **Stay Penalty:** Penalizes the drone for staying near the origin after a certain number of steps.
358 - **Direction Reward:** Rewards the drone for moving in the direction of the target.
359 - **Goal Bonus:** Provides a significant reward for reaching the target location.
360 - **Time Penalty:** Encourages efficiency by applying a small penalty for each timestep.
362 @note This function assumes observations include position and velocity as described in `get_observations`.
369 distance_to_target = np.linalg.norm(observations[:3] - self.
target_location)
370 reward_distance = -distance_to_target**2
373 velocity = observations[3:6]
375 reward_stability = -0.1 * velocity_change
381 start_location = np.array([0, 0, 0])
382 distance_from_start = np.linalg.norm(observations[:3] - start_location)
383 reward_stay_penalty = -50
if distance_from_start < 0.1
and self.
cur_step_count > 100
else 0
387 reward_direction = max(0, np.dot(direction, velocity))
390 goal_bonus = 50
if distance_to_target < 0.1
else 0
398 + 0.1 * reward_stability
400 + reward_stay_penalty