活动介绍

ValueError Traceback (most recent call last) Input In [36], in <cell line: 2>() 1 from sklearn.model_selection import train_test_split ----> 2 x_train,x_test,y_train,y_test = train_test_split(C_data,C_label,test_size=0.2,random_state=256) File C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py:2420, in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays) 2417 arrays = indexable(*arrays) 2419 n_samples = _num_samples(arrays[0]) -> 2420 n_train, n_test = _validate_shuffle_split( 2421 n_samples, test_size, train_size, default_test_size=0.25 2422 ) 2424 if shuffle is False: 2425 if stratify is not None: File C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py:2098, in _validate_shuffle_split(n_samples, test_size, train_size, default_test_size) 2095 n_train, n_test = int(n_train), int(n_test) 2097 if n_train == 0: -> 2098 raise ValueError( 2099 "With n_samples={}, test_size={} and train_size={}, the " 2100 "resulting train set will be empty. Adjust any of the " 2101 "aforementioned parameters.".format(n_samples, test_size, train_size) 2102 ) 2104 return n_train, n_test ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

时间: 2025-07-11 16:53:34 浏览: 17
<think>好的,用户遇到了在使用scikit-learn的train_test_split函数时出现的ValueError,特别是n_samples=0导致训练集为空的问题。我需要根据提供的引用信息来分析和解决这个问题。 首先,用户提到引用[3]中的情况是n_samples=0,而错误信息显示测试集大小为0.2,导致训练集为空。这可能是因为输入的数据集没有任何样本,所以需要检查数据是否正确加载。用户可能在读取CSV文件时遇到了问题,如引用[2]所述,尽管CSV中有数据,但代码可能没有正确读取,导致数据集为空。这时候需要确认文件路径是否正确,数据格式是否符合预期,是否有缺失值或格式错误。 其次,根据引用[3]的解决方案,用户需要确保输入路径正确,并且文件命名对应,比如图片和JSON文件的对应。这可能意味着数据加载前的预处理步骤存在问题,例如文件路径错误或数据未正确解析。在Python中,路径中的反斜杠可能需要转义或改用正斜杠,避免转义错误。 另外,引用[1]中的示例代码使用了test_size=0.9,但用户可能设置的test_size和train_size参数组合不当,导致剩余样本数为零。例如,当数据集样本数本就不足,再设置较大的test_size,可能使得训练样本数为零。需要检查数据集的样本数量,并调整test_size或train_size参数,确保分割后的训练集和测试集都有足够的样本。 可能的解决步骤包括:检查数据加载是否正确,确保数据集不为空;验证文件路径和格式;调整test_size参数为更小的值,或设置train_size参数明确指定训练集大小;处理数据中的缺失值或格式错误。此外,还可以在分割前检查数据集长度,避免过小的数据集导致错误。 最后,需要将这些步骤以清晰的结构呈现给用户,帮助逐步排查问题,确保他们能够理解并实施解决方案。</think>针对`train_test_split`导致的`ValueError: n_samples=0`问题,以下是逐步解决方案: --- ### 1. **检查数据加载过程** - **现象**:错误直接提示`n_samples=0`,说明输入数据集为空。 - **排查步骤**: 1. 确认数据文件路径正确(尤其注意Windows路径需用双反斜杠`\\`或正斜杠`/`)[^3]。 2. 验证数据文件(如CSV)是否非空且格式正确,避免读取失败导致空数据集[^2]。 ```python import pandas as pd dataset = pd.read_csv("your_data.csv") print("数据量:", len(dataset)) # 输出应为正整数 ``` --- ### 2. **处理数据预处理问题** - **常见原因**:数据清洗后可能意外删除所有样本。 - **解决方法**: - 检查数据过滤条件是否过于严格(如删除所有缺失值的行)。 - 添加数据有效性检查: ```python if len(dataset) == 0: raise ValueError("数据集为空,请检查数据加载或预处理步骤") ``` --- ### 3. **调整`train_test_split`参数** - **参数组合规则**:`test_size`和`train_size`之和不能超过1,且需保证分割后样本数≥1。 - **错误案例**:当数据集仅有1个样本且`test_size=0.2`时,`0.2*1=0.2`会被四舍五入为0,导致训练集为空。 - **解决方案**: ```python # 显式设置train_size避免自动计算错误 train, test = train_test_split(dataset, test_size=0.2, train_size=0.8) # 或降低test_size比例 train, test = train_test_split(dataset, test_size=0.1) ``` --- ### 4. **验证数据格式兼容性** - **特殊场景**:输入数据可能是非标准格式(如JSON与图片未对应[^3])。 - **操作建议**: - 确保输入为NumPy数组或Pandas DataFrame。 - 检查特征和标签的维度是否匹配: ```python print("特征形状:", X.shape) # 应为(n_samples, n_features) print("标签形状:", y.shape) # 应为(n_samples,) ``` --- ### 5. **错误复现与调试(代码示例)** ```python from sklearn.model_selection import train_test_split import numpy as np # 生成示例数据(假设用户数据加载失败) X = np.array([]) # 模拟空数据集 try: X_train, X_test = train_test_split(X, test_size=0.2) except ValueError as e: print("错误原因:", e) # 输出具体错误信息 ``` --- ### 总结步骤 1. **确认数据集非空** → 2. **检查文件路径与格式** → 3. **调整分割参数** → 4. **验证数据预处理逻辑** ---
阅读全文

相关推荐

import tensorflow as tf import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler import os # 1. 数据预处理 (保持不变) def load_and_preprocess_data(csv_path): """加载并预处理双色球历史数据""" df = pd.read_csv(csv_path) red_balls = df.iloc[:, 0:6].values.astype(np.float32) blue_balls = df.iloc[:, 6:].values.reshape(-1, 1).astype(np.float32) red_scaler = MinMaxScaler(feature_range=(0, 1)).fit(red_balls) blue_scaler = MinMaxScaler(feature_range=(0, 1)).fit(blue_balls) red_normalized = red_scaler.transform(red_balls) blue_normalized = blue_scaler.transform(blue_balls) combined = np.hstack((red_normalized, blue_normalized)) X_train, _ = train_test_split(combined, test_size=0.2, random_state=42) return X_train, red_scaler, blue_scaler # 2. 重构GAN模型 class DualColorGAN(tf.keras.Model): def __init__(self, latent_dim=100): super(DualColorGAN, self).__init__() self.latent_dim = latent_dim # 生成器 self.generator = tf.keras.Sequential([ tf.keras.layers.Dense(256, activation='relu', input_dim=latent_dim), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(1024, activation='relu'), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(7, activation='sigmoid') ]) # 判别器 self.discriminator = tf.keras.Sequential([ tf.keras.layers.Dense(512, activation='relu', input_dim=7), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) # 分离优化器 (关键修复) self.g_optimizer = tf.keras.optimizers.Adam(0.0002, beta_1=0.5) self.d_optimizer = tf.keras.optimizers.Adam(0.0002, beta_1=0.5) self.loss_fn = tf.keras.losses.BinaryCrossentropy() def compile(self, **kwargs): super().compile(**kwargs) # 重构训练步骤 def train_step(self, real_data): batch_size = tf.shape(real_data)[0] # 训练判别器 with tf.GradientTape() as d_tape: noise = tf.random.normal([batch_size, self.latent_dim]) generated_data = self.generator(noise, training=False) real_output = self.discriminator(real_data, training=True) fake_output = self.discriminator(generated_data, training=True) # 损失计算 real_loss = self.loss_fn(tf.ones_like(real_output), real_output) fake_loss = self.loss_fn(tf.zeros_like(fake_output), fake_output) d_loss = (real_loss + fake_loss) / 2 # 仅更新判别器变量 d_grads = d_tape.gradient(d_loss, self.discriminator.trainable_variables) self.d_optimizer.apply_gradients(zip(d_grads, self.discriminator.trainable_variables)) # 训练生成器 with tf.GradientTape() as g_tape: noise = tf.random.normal([batch_size, self.latent_dim]) generated_data = self.generator(noise, training=True) fake_output = self.discriminator(generated_data, training=True) g_loss = self.loss_fn(tf.ones_like(fake_output), fake_output) # 仅更新生成器变量 g_grads = g_tape.gradient(g_loss, self.generator.trainable_variables) self.g_optimizer.apply_gradients(zip(g_grads, self.generator.trainable_variables)) return {"d_loss": d_loss, "g_loss": g_loss} # 3. 训练与预测 def train_and_predict(csv_path, epochs=5000, batch_size=32): X_train, red_scaler, blue_scaler = load_and_preprocess_data(csv_path) dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000).batch(batch_size) gan = DualColorGAN(latent_dim=100) gan.compile() # 训练前构建模型 (避免变量延迟创建) dummy_input = tf.random.normal([1, 100]) _ = gan.generator(dummy_input) _ = gan.discriminator(dummy_input) for epoch in range(epochs): for batch in dataset: metrics = gan.train_step(batch) if epoch % 500 == 0: print(f"Epoch {epoch}, D Loss: {metrics['d_loss']:.4f}, G Loss: {metrics['g_loss']:.4f}") # 生成预测号码 (保持不变) def generate_numbers(n=5): noise = tf.random.normal(shape=(n, 100)) generated = gan.generator(noise).numpy() red_generated = generated[:, :6] blue_generated = generated[:, 6:] red_denorm = red_scaler.inverse_transform(red_generated) blue_denorm = blue_scaler.inverse_transform(blue_generated) red_denorm = np.clip(red_denorm, 1, 33).astype(int) blue_denorm = np.clip(blue_denorm, 1, 16).astype(int) results = [] for i in range(n): unique_red = np.unique(red_denorm[i]) while len(unique_red) < 6: new_red = np.random.randint(1, 34, 6 - len(unique_red)) unique_red = np.unique(np.concatenate([unique_red, new_red])) sorted_red = np.sort(unique_red[:6]) results.append({ "红球": sorted_red.tolist(), "蓝球": int(blue_denorm[i][0]) }) return results return generate_numbers() # 4. 使用示例 if __name__ == "__main__": csv_path = "D:/worker/lottery_results7.csv" predictions = train_and_predict(csv_path, epochs=300) print("\n双色球预测号码:") for i, pred in enumerate(predictions, 1): print(f"预测组 {i}: 红球: {pred['红球']}, 蓝球: {pred['蓝球']}") C:\Users\power\AppData\Roaming\Python\Python39\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[1], line 145 143 if __name__ == "__main__": 144 csv_path = "D:/worker/lottery_results7.csv" --> 145 predictions = train_and_predict(csv_path, epochs=300) 147 print("\n双色球预测号码:") 148 for i, pred in enumerate(predictions, 1): Cell In[1], line 103, in train_and_predict(csv_path, epochs, batch_size) 101 dummy_input = tf.random.normal([1, 100]) 102 _ = gan.generator(dummy_input) --> 103 _ = gan.discriminator(dummy_input) 105 for epoch in range(epochs): 106 for batch in dataset: File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\utils\traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs) 119 filtered_tb = _process_traceback_frames(e.__traceback__) 120 # To get the full stack trace, call: 121 # keras.config.disable_traceback_filtering() --> 122 raise e.with_traceback(filtered_tb) from None 123 finally: 124 del filtered_tb File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\layers\input_spec.py:227, in assert_input_compatibility(input_spec, inputs, layer_name) 222 for axis, value in spec.axes.items(): 223 if value is not None and shape[axis] not in { 224 value, 225 None, 226 }: --> 227 raise ValueError( 228 f'Input {input_index} of layer "{layer_name}" is ' 229 f"incompatible with the layer: expected axis {axis} " 230 f"of input shape to have value {value}, " 231 "but received input with " 232 f"shape {shape}" 233 ) 234 # Check shape. 235 if spec.shape is not None: ValueError: Exception encountered when calling Sequential.call(). Input 0 of layer "dense_4" is incompatible with the layer: expected axis -1 of input shape to have value 7, but received input with shape (1, 100) Arguments received by Sequential.call(): • inputs=tf.Tensor(shape=(1, 100), dtype=float32) • training=None • mask=None 根据ValueError的要求修改完善代码

import tensorflow as tf import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler import os # 1. 数据预处理 (保持不变) def load_and_preprocess_data(csv_path): """加载并预处理双色球历史数据""" df = pd.read_csv(csv_path) red_balls = df.iloc[:, 0:6].values.astype(np.float32) blue_balls = df.iloc[:, 6:].values.reshape(-1, 1).astype(np.float32) red_scaler = MinMaxScaler(feature_range=(0, 1)).fit(red_balls) blue_scaler = MinMaxScaler(feature_range=(0, 1)).fit(blue_balls) red_normalized = red_scaler.transform(red_balls) blue_normalized = blue_scaler.transform(blue_balls) combined = np.hstack((red_normalized, blue_normalized)) X_train, _ = train_test_split(combined, test_size=0.2, random_state=42) return X_train, red_scaler, blue_scaler # 2. 重构GAN模型 class DualColorGAN(tf.keras.Model): def __init__(self, latent_dim=100): super(DualColorGAN, self).__init__() self.latent_dim = latent_dim # 生成器 self.generator = tf.keras.Sequential([ tf.keras.layers.Dense(256, activation='relu', input_dim=latent_dim), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(1024, activation='relu'), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(7, activation='sigmoid') ]) # 判别器 self.discriminator = tf.keras.Sequential([ tf.keras.layers.Dense(512, activation='relu', input_dim=7), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) # 分离优化器 (关键修复) self.g_optimizer = tf.keras.optimizers.Adam(0.0002, beta_1=0.5) self.d_optimizer = tf.keras.optimizers.Adam(0.0002, beta_1=0.5) self.loss_fn = tf.keras.losses.BinaryCrossentropy() def compile(self, **kwargs): super().compile(**kwargs) # 重构训练步骤 def train_step(self, real_data): batch_size = tf.shape(real_data)[0] # 训练判别器 with tf.GradientTape() as d_tape: noise = tf.random.normal([batch_size, self.latent_dim]) generated_data = self.generator(noise, training=False) real_output = self.discriminator(real_data, training=True) fake_output = self.discriminator(generated_data, training=True) # 损失计算 real_loss = self.loss_fn(tf.ones_like(real_output), real_output) fake_loss = self.loss_fn(tf.zeros_like(fake_output), fake_output) d_loss = (real_loss + fake_loss) / 2 # 仅更新判别器变量 d_grads = d_tape.gradient(d_loss, self.discriminator.trainable_variables) self.d_optimizer.apply_gradients(zip(d_grads, self.discriminator.trainable_variables)) # 训练生成器 with tf.GradientTape() as g_tape: noise = tf.random.normal([batch_size, self.latent_dim]) generated_data = self.generator(noise, training=True) fake_output = self.discriminator(generated_data, training=True) g_loss = self.loss_fn(tf.ones_like(fake_output), fake_output) # 仅更新生成器变量 g_grads = g_tape.gradient(g_loss, self.generator.trainable_variables) self.g_optimizer.apply_gradients(zip(g_grads, self.generator.trainable_variables)) return {"d_loss": d_loss, "g_loss": g_loss} # 3. 训练与预测 def train_and_predict(csv_path, epochs=5000, batch_size=32): X_train, red_scaler, blue_scaler = load_and_preprocess_data(csv_path) dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000).batch(batch_size) gan = DualColorGAN(latent_dim=100) gan.compile() # 训练前构建模型 (避免变量延迟创建) dummy_input = tf.random.normal([1, 100]) _ = gan.generator(dummy_input) _ = gan.discriminator(dummy_input) for epoch in range(epochs): for batch in dataset: metrics = gan.train_step(batch) if epoch % 500 == 0: print(f"Epoch {epoch}, D Loss: {metrics['d_loss']:.4f}, G Loss: {metrics['g_loss']:.4f}") # 生成预测号码 (保持不变) def generate_numbers(n=5): noise = tf.random.normal(shape=(n, 100)) generated = gan.generator(noise).numpy() red_generated = generated[:, :6] blue_generated = generated[:, 6:] red_denorm = red_scaler.inverse_transform(red_generated) blue_denorm = blue_scaler.inverse_transform(blue_generated) red_denorm = np.clip(red_denorm, 1, 33).astype(int) blue_denorm = np.clip(blue_denorm, 1, 16).astype(int) results = [] for i in range(n): unique_red = np.unique(red_denorm[i]) while len(unique_red) < 6: new_red = np.random.randint(1, 34, 6 - len(unique_red)) unique_red = np.unique(np.concatenate([unique_red, new_red])) sorted_red = np.sort(unique_red[:6]) results.append({ "红球": sorted_red.tolist(), "蓝球": int(blue_denorm[i][0]) }) return results return generate_numbers() # 4. 使用示例 if __name__ == "__main__": csv_path = "D:/worker/lottery_results7.csv" predictions = train_and_predict(csv_path, epochs=300) print("\n双色球预测号码:") for i, pred in enumerate(predictions, 1): print(f"预测组 {i}: 红球: {pred['红球']}, 蓝球: {pred['蓝球']}") C:\Users\power\AppData\Roaming\Python\Python39\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[1], line 145 143 if __name__ == "__main__": 144 csv_path = "D:/worker/lottery_results7.csv" --> 145 predictions = train_and_predict(csv_path, epochs=300) 147 print("\n双色球预测号码:") 148 for i, pred in enumerate(predictions, 1): Cell In[1], line 103, in train_and_predict(csv_path, epochs, batch_size) 101 dummy_input = tf.random.normal([1, 100]) 102 _ = gan.generator(dummy_input) --> 103 _ = gan.discriminator(dummy_input) 105 for epoch in range(epochs): 106 for batch in dataset: File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\utils\traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs) 119 filtered_tb = _process_traceback_frames(e.__traceback__) 120 # To get the full stack trace, call: 121 # keras.config.disable_traceback_filtering() --> 122 raise e.with_traceback(filtered_tb) from None 123 finally: 124 del filtered_tb File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\layers\input_spec.py:227, in assert_input_compatibility(input_spec, inputs, layer_name) 222 for axis, value in spec.axes.items(): 223 if value is not None and shape[axis] not in { 224 value, 225 None, 226 }: --> 227 raise ValueError( 228 f'Input {input_index} of layer "{layer_name}" is ' 229 f"incompatible with the layer: expected axis {axis} " 230 f"of input shape to have value {value}, " 231 "but received input with " 232 f"shape {shape}" 233 ) 234 # Check shape. 235 if spec.shape is not None: ValueError: Exception encountered when calling Sequential.call(). Input 0 of layer "dense_4" is incompatible with the layer: expected axis -1 of input shape to have value 7, but received input with shape (1, 100) Arguments received by Sequential.call(): • inputs=tf.Tensor(shape=(1, 100), dtype=float32) • training=None • mask=None 根据ValueError的要求修改完善代码,并生成完整代码

import tensorflow as tf import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler import os # 1. 数据准备与预处理 def load_and_preprocess_data(csv_path): """加载并预处理双色球历史数据""" df = pd.read_csv(csv_path) # 假设CSV列格式: 期号,红球1,红球2,红球3,红球4,红球5,红球6,蓝球 red_balls = df.iloc[:, 0:6].values.astype(np.float32) blue_balls = df.iloc[:, 6:].values.reshape(-1, 1).astype(np.float32) print(red_balls) print(blue_balls) # 归一化处理 (0-1范围) red_scaler = MinMaxScaler(feature_range=(0, 1)).fit(red_balls) blue_scaler = MinMaxScaler(feature_range=(0, 1)).fit(blue_balls) red_normalized = red_scaler.transform(red_balls) blue_normalized = blue_scaler.transform(blue_balls) # 组合特征 combined = np.hstack((red_normalized, blue_normalized)) # 划分训练集 X_train, _ = train_test_split(combined, test_size=0.2, random_state=42) return X_train, red_scaler, blue_scaler # 2. 构建生成对抗网络(GAN) class DualColorGAN(tf.keras.Model): def __init__(self, latent_dim=100): super(DualColorGAN, self).__init__() self.latent_dim = latent_dim # 生成器 self.generator = tf.keras.Sequential([ tf.keras.layers.Dense(256, activation='relu', input_dim=latent_dim), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(1024, activation='relu'), tf.keras.layers.BatchNormalization(), tf.keras.layers.Dense(7, activation='sigmoid') # 输出7个数字 ]) # 判别器 self.discriminator = tf.keras.Sequential([ tf.keras.layers.Dense(512, activation='relu', input_dim=7), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) self.optimizer = tf.keras.optimizers.Adam(0.0002, 0.5) self.loss_fn = tf.keras.losses.BinaryCrossentropy() def compile(self, **kwargs): super().compile(**kwargs) self.discriminator.compile( loss=self.loss_fn, optimizer=self.optimizer, metrics=['accuracy'] ) def train_step(self, real_data): batch_size = tf.shape(real_data)[0] # 训练判别器 noise = tf.random.normal(shape=(batch_size, self.latent_dim)) generated_data = self.generator(noise, training=False) combined_data = tf.concat([real_data, generated_data], axis=0) labels = tf.concat([ tf.ones((batch_size, 1)), # 真实数据标签 tf.zeros((batch_size, 1)) # 生成数据标签 ], axis=0) # 添加少量噪声到标签(提高稳定性) labels += 0.05 * tf.random.uniform(tf.shape(labels)) with tf.GradientTape() as tape: predictions = self.discriminator(combined_data) d_loss = self.loss_fn(labels, predictions) grads = tape.gradient(d_loss, self.discriminator.trainable_variables) self.optimizer.apply_gradients(zip(grads, self.discriminator.trainable_variables)) # 训练生成器 noise = tf.random.normal(shape=(batch_size, self.latent_dim)) misleading_labels = tf.ones((batch_size, 1)) # 试图欺骗判别器 with tf.GradientTape() as tape: generated_data = self.generator(noise, training=True) predictions = self.discriminator(generated_data) g_loss = self.loss_fn(misleading_labels, predictions) grads = tape.gradient(g_loss, self.generator.trainable_variables) self.optimizer.apply_gradients(zip(grads, self.generator.trainable_variables)) return {"d_loss": d_loss, "g_loss": g_loss} # 3. 训练与预测 def train_and_predict(csv_path, epochs=5000, batch_size=32): # 加载数据 X_train, red_scaler, blue_scaler = load_and_preprocess_data(csv_path) dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000).batch(batch_size) # 初始化模型 gan = DualColorGAN(latent_dim=100) gan.compile() # 训练模型 for epoch in range(epochs): for batch in dataset: metrics = gan.train_step(batch) if epoch % 500 == 0: print(f"Epoch {epoch}, D Loss: {metrics['d_loss']:.4f}, G Loss: {metrics['g_loss']:.4f}") # 生成预测号码 def generate_numbers(n=5): noise = tf.random.normal(shape=(n, 100)) generated = gan.generator(noise).numpy() # 分离红球和蓝球 red_generated = generated[:, :6] blue_generated = generated[:, 6:] # 反归一化 red_denorm = red_scaler.inverse_transform(red_generated) blue_denorm = blue_scaler.inverse_transform(blue_generated) # 后处理:确保号码在有效范围内 red_denorm = np.clip(red_denorm, 1, 33).astype(int) blue_denorm = np.clip(blue_denorm, 1, 16).astype(int) # 排序红球并去重 results = [] for i in range(n): unique_red = np.unique(red_denorm[i]) # 如果不足6个唯一红球,补充随机数 while len(unique_red) < 6: new_red = np.random.randint(1, 34, 6 - len(unique_red)) unique_red = np.unique(np.concatenate([unique_red, new_red])) sorted_red = np.sort(unique_red[:6]) # 取前6个并排序 results.append({ "红球": sorted_red.tolist(), "蓝球": int(blue_denorm[i][0]) }) return results return generate_numbers() # 4. 使用示例 if __name__ == "__main__": # 替换为您的CSV文件路径 csv_path = "D:/worker/lottery_results7.csv" # 生成5组预测号码 predictions = train_and_predict(csv_path, epochs=300) print("\n双色球预测号码:") for i, pred in enumerate(predictions, 1): print(f"预测组 {i}: 红球: {pred['红球']}, 蓝球: {pred['蓝球']}") C:\Users\power\AppData\Roaming\Python\Python39\site-packages\keras\src\layers\core\dense.py:87: UserWarning: Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs) ---------------------------------------------------------------------------ValueError Traceback (most recent call last) Cell In[3], line 162 159 csv_path = "D:/worker/lottery_results7.csv" 161 # 生成5组预测号码--> 162 predictions = train_and_predict(csv_path, epochs=300) 164 print("\n双色球预测号码:") 165 for i, pred in enumerate(predictions, 1): Cell In[3], line 116, in train_and_predict(csv_path, epochs, batch_size) 114 for epoch in range(epochs): 115 for batch in dataset:--> 116 metrics = gan.train_step(batch) 118 if epoch % 500 == 0: 119 print(f"Epoch {epoch}, D Loss: {metrics['d_loss']:.4f}, G Loss: {metrics['g_loss']:.4f}") Cell In[3], line 99, in DualColorGAN.train_step(self, real_data) 97 g_loss = self.loss_fn(misleading_labels, predictions) 98 grads = tape.gradient(g_loss, self.generator.trainable_variables)---> 99 self.optimizer.apply_gradients(zip(grads, self.generator.trainable_variables)) 101 return {"d_loss": d_loss, "g_loss": g_loss} File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\optimizers\base_optimizer.py:383, in BaseOptimizer.apply_gradients(self, grads_and_vars) 381 def apply_gradients(self, grads_and_vars): 382 grads, trainable_variables = zip(*grads_and_vars)--> 383 self.apply(grads, trainable_variables) 384 # Return iterations for compat with tf.keras. 385 return self._iterations File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\optimizers\base_optimizer.py:424, in BaseOptimizer.apply(self, grads, trainable_variables) 422 self.build(trainable_variables) 423 self.built = True--> 424 self._check_variables_are_known(trainable_variables) 426 with backend.name_scope(self.name, caller=self): 427 # Overwrite targeted variables directly with their gradients if 428 # their overwrite_with_gradient is set. 429 grads, trainable_variables = ( 430 self._overwrite_variables_directly_with_gradients( 431 grads, trainable_variables 432 ) 433 ) File ~\AppData\Roaming\Python\Python39\site-packages\keras\src\optimizers\base_optimizer.py:329, in BaseOptimizer._check_variables_are_known(self, variables) 327 for v in variables: 328 if self._var_key(v) not in self._trainable_variables_indices:--> 329 raise ValueError( 330 f"Unknown variable: {v}. This optimizer can only " 331 "be called for the variables it was originally built with. " 332 "When working with a new set of variables, you should " 333 "recreate a new optimizer instance." 334 ) ValueError: Unknown variable: <Variable path=sequential_2/dense_8/kernel, shape=(100, 256), dtype=float32, value=[[ 0.03668569 0.08865 -0.11710601 ... -0.07610065 0.00637069 -0.11252417] [-0.09408598 -0.12256975 -0.12315848 ... 0.01152834 0.07556388 -0.0131057 ] [ 0.1293782 -0.05283339 0.05111471 ... -0.03385867 -0.04834706 0.0365559 ] ... [-0.00592124 -0.12550488 -0.0388018 ... 0.1263026 -0.04316472 0.04232238] [ 0.088101 -0.02959325 -0.00558581 ... 0.00974776 -0.03782529 0.0871228 ] [-0.09214704 0.07390936 0.12219191 ... 0.11338826 0.03027995 -0.11891758]]>. This optimizer can only be called for the variables it was originally built with. When working with a new set of variables, you should recreate a new optimizer instance.

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[16], line 7 4 from sklearn.model_selection import GridSearchCV, train_test_split 5 from sklearn.metrics import r2_score ----> 7 X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=1, stratify=y) 8 # 使用网格搜索拟合 PLS 回归模型 9 pls = GridSearchCV(Pipeline([('pls', PLSRegression())]), 10 {'pls__n_components': [10, 12, 14, 16, 18, 20]}, 11 cv=3, n_jobs=-1, verbose=0) File E:\anaconda\envs\python312\Lib\site-packages\sklearn\utils\_param_validation.py:213, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs) 207 try: 208 with config_context( 209 skip_parameter_validation=( 210 prefer_skip_nested_validation or global_skip_validation 211 ) 212 ): --> 213 return func(*args, **kwargs) 214 except InvalidParameterError as e: 215 # When the function is just a wrapper around an estimator, we allow 216 # the function to delegate validation to the estimator, but we replace 217 # the name of the estimator by the name of the function in the error 218 # message to avoid confusion. 219 msg = re.sub( 220 r"parameter of \w+ must be", 221 f"parameter of {func.__qualname__} must be", 222 str(e), 223 ) File E:\anaconda\envs\python312\Lib\site-packages\sklearn\model_selection\_split.py:2782, in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays) 2779 if n_arrays == 0: 2780 raise ValueError("At least one array required as input") -> 2782 arrays = indexable(*arrays) 2784 n_samples = _num_samples(arrays[0]) 2785 n_train, n_test = _validate_shuffle_split( 2786 n_samples, test_size, train_size, default_test_size=0.25 2787 ) File E:\anaconda\envs\python312\Lib\site-packages\sklearn\utils\validation.py:514, in indexable(*iterables) 484 """Make arrays indexable for cross-validation. 485 486 Checks consistent length, passes through None, and ensures that everything (...) 510 [[1, 2, 3], array([2, 3, 4]), None, <...Sparse...dtype 'int64'...shape (3, 1)>] 511 """ 513 result = [_make_indexable(X) for X in iterables] --> 514 check_consistent_length(*result) 515 return result File E:\anaconda\envs\python312\Lib\site-packages\sklearn\utils\validation.py:457, in check_consistent_length(*arrays) 455 uniques = np.unique(lengths) 456 if len(uniques) > 1: --> 457 raise ValueError( 458 "Found input variables with inconsistent numbers of samples: %r" 459 % [int(l) for l in lengths] 460 ) ValueError: Found input variables with inconsistent numbers of samples: [1, 132]

ValueError Traceback (most recent call last) Cell In[3], line 82 76 param_grid = { 77 'lasso__alpha': [0.01, 0.1, 1.0, 10.0], 78 'lasso__max_iter': [100, 1000, 5000, 10000] 79 } 81 # 执行超参数优化 ---> 82 best_model, best_params = optimize_hyperparameters( 83 pipeline, param_grid, X_train, y_train 84 ) 85 print(f"最佳参数配置: {best_params}") 87 # 测试最优模型的表现 Cell In[3], line 54, in optimize_hyperparameters(pipeline, param_grid, X_train, y_train) 46 """使用GridSearchCV执行超参数优化""" 47 grid_search = GridSearchCV( 48 estimator=pipeline, 49 param_grid=param_grid, (...) 52 n_jobs=-1 53 ) ---> 54 grid_search.fit(X_train, y_train) 55 return grid_search.best_estimator_, grid_search.best_params_ File D:\HONOR\Downloads\D\Lib\site-packages\sklearn\model_selection\_search.py:874, in BaseSearchCV.fit(self, X, y, groups, **fit_params) 868 results = self._format_results( 869 all_candidate_params, n_splits, all_out, all_more_results 870 ) 872 return results --> 874 self._run_search(evaluate_candidates) 876 # multimetric is determined here because in the case of a callable 877 # self.scoring the return type is only known after calling 878 first_test_score = all_out[0]["test_scores"] File D:\HONOR\Downloads\D\Lib\site-packages\sklearn\model_selection\_search.py:1388, in GridSearchCV._run_search(self, evaluate_candidates) 1386 def _run_search(self, evaluate_candidates): 1387 """Search all candidates in param_grid""" -> 1388 evaluate_candidates(ParameterGrid(self.param_grid)) File D:\HONOR\Downloads\D\Lib\site-packages\sklearn\model_selection\_search.py:851, in BaseSearchCV.fit.<locals>.evaluate_candidates(candidate_params, cv, more_results) 844 elif len(out) != n_candidates * n_splits: 845 raise ValueError( 846 "cv.split and cv.get_n_splits returned " 847 "inconsistent results. Expected {} " 848 "splits, got {}".format(n_splits, len(out) // n_candidates) 849 ) --> 851 _warn_or_raise_about_fit_failures(out, self.error_score) 853 # For callable self.scoring, the return type is only know after 854 # calling. If the return type is a dictionary, the error scores 855 # can now be inserted with the correct key. The type checking 856 # of out will be done in _insert_error_scores. 857 if callable(self.scoring): File D:\HONOR\Downloads\D\Lib\site-packages\sklearn\model_selection\_validation.py:367, in _warn_or_raise_about_fit_failures(results, error_score) 360 if num_failed_fits == num_fits: 361 all_fits_failed_message = ( 362 f"\nAll the {num_fits} fits failed.\n" 363 "It is very likely that your model is misconfigured.\n" 364 "You can try to debug the error by setting error_score='raise'.\n\n" 365 f"Below are more details about the failures:\n{fit_errors_summary}" 366 ) --> 367 raise ValueError(all_fits_failed_message) 369 else: 370 some_fits_failed_message = ( 371 f"\n{num_failed_fits} fits failed out of a total of {num_fits}.\n" 372 "The score on these train-test partitions for these parameters" (...) 376 f"Below are more details about the failures:\n{fit_errors_summary}" 377 ) ValueError: All the 80 fits failed. It is very likely that your model is misconfigured. You can try to debug the error by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 80 fits failed with the following error: Traceback (most recent call last): File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\pipeline.py", line 401, in fit Xt = self._fit(X, y, **fit_params_steps) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\pipeline.py", line 359, in _fit X, fitted_transformer = fit_transform_one_cached( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\joblib\memory.py", line 349, in __call__ return self.func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\pipeline.py", line 893, in _fit_transform_one res = transformer.fit_transform(X, y, **fit_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\utils\_set_output.py", line 140, in wrapped data_to_wrap = f(self, X, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\base.py", line 881, in fit_transform return self.fit(X, y, **fit_params).transform(X) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\preprocessing\_data.py", line 824, in fit return self.partial_fit(X, y, sample_weight) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\preprocessing\_data.py", line 861, in partial_fit X = self._validate_data( ^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\base.py", line 565, in _validate_data X = check_array(X, input_name="X", **check_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\utils\validation.py", line 879, in check_array array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\HONOR\Downloads\D\Lib\site-packages\sklearn\utils\_array_api.py", line 185, in _asarray_with_order array = numpy.asarray(array, order=order, dtype=dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: could not convert string to float: '000027.SZ'

import pandas as pd import numpy as np from sklearn.impute import KNNImputer from sklearn.ensemble import RandomForestRegressor from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score import matplotlib.pyplot as plt import seaborn as sns # 假设df_cleaned是当前的数据框 print("=== 开始逐个处理缺失字段 (保留Pa字段) ===") # 创建备份用于验证填充效果 df_backup = df_cleaned.copy() # 1. 处理 W1 字段 print("\n1. 处理 W1 字段 (缺失率: 36.25%)") print("处理决策: 直接删除") print("理由: 缺失率过高(36.25%),且为文本描述型数据,难以有效填充。W1与土壤湿度关系较弱,删除不会显著影响模型性能。") if 'W1' in df_cleaned.columns: df_cleaned.drop(columns=['W1'], inplace=True) print("已删除 W1 字段") # 2. 处理 W2 字段 print("\n2. 处理 W2 字段 (缺失率: 36.25%)") print("处理决策: 直接删除") print("理由: 与W1类似,缺失率过高,且为文本描述型数据。W2与土壤湿度关系较弱,删除不会显著影响模型性能。") if 'W2' in df_cleaned.columns: df_cleaned.drop(columns=['W2'], inplace=True) print("已删除 W2 字段") # 3. 处理 H 字段 print("\n3. 处理 H 字段 (缺失率: 22.70%)") print("H 描述: 最低云层底部的高度(米)") print("处理决策: 直接删除") print("理由: 缺失率较高(22.70%),且数据几乎全是'2500或更高,或无云',信息量极低。H与土壤湿度关系较弱,删除不会显著影响模型性能。") if 'H' in df_cleaned.columns: df_cleaned.drop(columns=['H'], inplace=True) print("已删除 H 字段") # 4. 处理 WW 字段 print("\n4. 处理 WW 字段 (缺失率: 14.92%)") print("WW 描述: 气象站公布的当前天气情况") print("处理决策: 直接删除") print("理由: 缺失率较高(14.92%),且为复杂的文本描述型数据,难以有效量化。天气信息可以从其他气象要素(如RRR、U、T等)间接反映,删除不会显著影响模型性能。") if 'WW' in df_cleaned.columns: df_cleaned.drop(columns=['WW'], inplace=True) print("已删除 WW 字段") # 5. 处理 ff3 字段 print("\n5. 处理 ff3 字段 (缺失率: 14.77%)") print("ff3 描述: 两次观测之间地面高度10-12米的最大阵风(米每秒)") print("处理决策: 直接删除") print("理由: 缺失率较高(14.77%),且与Ff(平均风速)信息重复。ff3与土壤湿度关系较弱,删除不会显著影响模型性能。") if 'ff3' in df_cleaned.columns: df_cleaned.drop(columns=['ff3'], inplace=True) print("已删除 ff3 字段") # 6. 处理 tR 字段 print("\n6. 处理 tR 字段 (缺失率: 0.91%)") print("tR 描述: 达到规定降水量的时间") print("处理决策: 直接删除") print("理由: 此字段只有两个值(12.0和6.0),信息量有限,且与土壤湿度关系较弱。删除不会显著影响模型性能。") if 'tR' in df_cleaned.columns: df_cleaned.drop(columns=['tR'], inplace=True) print("已删除 tR 字段") # 7. 处理 Pa 字段 (保留但处理缺失值) print("\n7. 处理 Pa 字段 (缺失率: 23.02%)") print("Pa 描述: 气压趋势,即观测前三个小时内大气压的变化(毫米汞柱)") print("处理决策: 保留并使用高级方法填充缺失值") print("理由: Pa字段包含在预测数据集中,必须保留。使用基于其他气压字段(Po和P)的预测模型来填充缺失值。") # 记录Pa字段的缺失位置,用于后续验证 pa_missing_mask = None if 'Pa' in df_cleaned.columns and df_cleaned['Pa'].isnull().sum() > 0: pa_missing_mask = df_cleaned['Pa'].isnull() # 使用Po和P来预测Pa # 首先准备训练数据(非缺失的Pa值) train_data = df_cleaned[df_cleaned['Pa'].notna()] test_data = df_cleaned[df_cleaned['Pa'].isna()] if len(train_data) > 0 and len(test_data) > 0: # 使用随机森林回归预测Pa from sklearn.ensemble import RandomForestRegressor # 准备特征和目标变量 features = ['Po', 'P'] X_train = train_data[features] y_train = train_data['Pa'] X_test = test_data[features] # 训练模型 model = RandomForestRegressor(n_estimators=100, random_state=42) model.fit(X_train, y_train) # 预测缺失的Pa值 pa_predictions = model.predict(X_test) # 将预测值填充回数据集 df_cleaned.loc[df_cleaned['Pa'].isna(), 'Pa'] = pa_predictions print("已使用随机森林模型填充Pa缺失值") # 记录模型性能(使用交叉验证) from sklearn.model_selection import cross_val_score cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='r2') print(f"Pa填充模型交叉验证R²分数: {cv_scores.mean():.4f} (±{cv_scores.std()*2:.4f})") else: # 如果无法使用模型预测,使用简单插值 df_cleaned['Pa'] = df_cleaned['Pa'].interpolate(method='time') df_cleaned['Pa'] = df_cleaned['Pa'].fillna(method='ffill').fillna(method='bfill') print("使用时间序列插值填充Pa缺失值") # 8. 处理 Tx 字段 print("\n8. 处理 Tx 字段 (缺失率: 7.15%)") print("Tx 描述: 过去一段时间(不超过12小时)的最高气温(℃)") print("处理决策: 使用与当前温度T的相关性进行填充") print("理由: Tx与当前温度T有强相关性,通常Tx高于T。可以使用T加上Tx与T的平均差值来填充缺失的Tx值。") # 记录Tx字段的缺失位置,用于后续验证 tx_missing_mask = None if 'Tx' in df_cleaned.columns and df_cleaned['Tx'].isnull().sum() > 0: tx_missing_mask = df_cleaned['Tx'].isnull() # 计算Tx与T的平均差值 valid_mask = df_cleaned['Tx'].notna() & df_cleaned['T'].notna() if valid_mask.sum() > 0: avg_diff = (df_cleaned.loc[valid_mask, 'Tx'] - df_cleaned.loc[valid_mask, 'T']).mean() print(f"Tx与T的平均差值为: {avg_diff:.2f}℃") # 使用T + 平均差值填充缺失的Tx tx_missing = df_cleaned['Tx'].isna() df_cleaned.loc[tx_missing, 'Tx'] = df_cleaned.loc[tx_missing, 'T'] + avg_diff # 对于仍然缺失的值,使用时间序列插值 df_cleaned['Tx'] = df_cleaned['Tx'].interpolate(method='time') print("已使用T + 平均差值填充Tx缺失值") else: # 如果没有有效数据计算平均差值,使用简单插值 df_cleaned['Tx'] = df_cleaned['Tx'].interpolate(method='time') print("使用时间序列插值填充Tx缺失值") # 9. 处理 Tn 字段 print("\n9. 处理 Tn 字段 (缺失率: 5.98%)") print("Tn 描述: 过去一段时间(不超过12小时)的最低气温(℃)") print("处理决策: 使用与当前温度T的相关性进行填充") print("理由: Tn与当前温度T有强相关性,通常Tn低于T。可以使用T减去Tn与T的平均差值来填充缺失的Tn值。") # 记录Tn字段的缺失位置,用于后续验证 tn_missing_mask = None if 'Tn' in df_cleaned.columns and df_cleaned['Tn'].isnull().sum() > 0: tn_missing_mask = df_cleaned['Tn'].isnull() # 计算Tn与T的平均差值 valid_mask = df_cleaned['Tn'].notna() & df_cleaned['T'].notna() if valid_mask.sum() > 0: avg_diff = (df_cleaned.loc[valid_mask, 'T'] - df_cleaned.loc[valid_mask, 'Tn']).mean() print(f"Tn与T的平均差值为: {avg_diff:.2f}℃") # 使用T - 平均差值填充缺失的Tn tn_missing = df_cleaned['Tn'].isna() df_cleaned.loc[tn_missing, 'Tn'] = df_cleaned.loc[tn_missing, 'T'] - avg_diff # 对于仍然缺失的值,使用时间序列插值 df_cleaned['Tn'] = df_cleaned['Tn'].interpolate(method='time') print("已使用T - 平均差值填充Tn缺失值") else: # 如果没有有效数据计算平均差值,使用简单插值 df_cleaned['Tn'] = df_cleaned['Tn'].interpolate(method='time') print("使用时间序列插值填充Tn缺失值") # 10. 处理 RRR 字段 print("\n10. 处理 RRR 字段 (缺失率: 0.91%)") print("RRR 描述: 降水量(毫米)") print("处理决策: 填充为0") print("理由: 降水量缺失通常表示无降水事件,填充为0是合理的。降水事件相对稀少,且对土壤湿度有直接影响,需要保留此字段。") # 记录RRR字段的缺失位置,用于后续验证 rrr_missing_mask = None if 'RRR' in df_cleaned.columns and df_cleaned['RRR'].isnull().sum() > 0: rrr_missing_mask = df_cleaned['RRR'].isnull() df_cleaned['RRR'] = df_cleaned['RRR'].fillna(0) print("已将RRR缺失值填充为0") # 11. 处理 P 字段 print("\n11. 处理 P 字段 (缺失率: 0.15%)") print("P 描述: 平均海平面的大气压(毫米汞柱)") print("处理决策: 使用时间序列插值填充") print("理由: 缺失率极低(0.15%),且气压变化较为平缓,适合使用时间序列插值。P与土壤湿度有一定关系(影响蒸发),需要保留此字段。") # 记录P字段的缺失位置,用于后续验证 p_missing_mask = None if 'P' in df_cleaned.columns and df_cleaned['P'].isnull().sum() > 0: p_missing_mask = df_cleaned['P'].isnull() df_cleaned['P'] = df_cleaned['P'].interpolate(method='time') df_cleaned['P'] = df_cleaned['P'].fillna(method='ffill').fillna(method='bfill') print("已使用时间序列插值填充P缺失值") # 12. 处理 U 字段 print("\n12. 处理 U 字段 (缺失率: 0.02%)") print("U 描述: 地面高度2米处的相对湿度(%)") print("处理决策: 使用时间序列插值填充") print("理由: 缺失率极低(0.02%),且湿度变化较为连续,适合使用时间序列插值。U与土壤湿度有强相关性,是重要字段。") # 记录U字段的缺失位置,用于后续验证 u_missing_mask = None if 'U' in df_cleaned.columns and df_cleaned['U'].isnull().sum() > 0: u_missing_mask = df_cleaned['U'].isnull() df_cleaned['U'] = df_cleaned['U'].interpolate(method='time') df_cleaned['U'] = df_cleaned['U'].fillna(method='ffill').fillna(method='bfill') print("已使用时间序列插值填充U缺失值") # 13. 处理 VV 字段 print("\n13. 处理 VV 字段 (缺失率: 0.02%)") print("VV 描述: 水平能见度(千米)") print("处理决策: 使用时间序列插值填充") print("理由: 缺失率极低(0.02%),且能见度变化较为连续,适合使用时间序列插值。VV与天气状况相关,间接影响土壤湿度。") # 记录VV字段的缺失位置,用于后续验证 vv_missing_mask = None if 'VV' in df_cleaned.columns and df_cleaned['VV'].isnull().sum() > 0: vv_missing_mask = df_cleaned['VV'].isnull() df_cleaned['VV'] = df_cleaned['VV'].interpolate(method='time') df_cleaned['VV'] = df_cleaned['VV'].fillna(method='ffill').fillna(method='bfill') print("已使用时间序列插值填充VV缺失值") # 14. 处理 Td 字段 print("\n14. 处理 Td 字段 (缺失率: 0.02%)") print("Td 描述: 地面高度2米处的露点温度(℃)") print("处理决策: 使用时间序列插值填充") print("理由: 缺失率极低(0.02%),且露点温度变化较为连续,适合使用时间序列插值。Td与空气饱和程度相关,影响蒸发,与土壤湿度有关。") # 记录Td字段的缺失位置,用于后续验证 td_missing_mask = None if 'Td' in df_cleaned.columns and df_cleaned['Td'].isnull().sum() > 0: td_missing_mask = df_cleaned['Td'].isnull() df_cleaned['Td'] = df_cleaned['Td'].interpolate(method='time') df_cleaned['Td'] = df_cleaned['Td'].fillna(method='ffill').fillna(method='bfill') print("已使用时间序列插值填充Td缺失值") # 15. 处理 T 字段 print("\n15. 处理 T 字段 (缺失率: 0.02%)") print("T 描述: 地面上2米处的大气气温(℃)") print("处理决策: 使用时间序列插值填充") print("理由: 缺失率极低(0.02%),且温度变化较为连续,适合使用时间序列插值。T与土壤湿度有强相关性(影响蒸发),是关键字段。") # 记录T字段的缺失位置,用于后续验证 t_missing_mask = None if 'T' in df_cleaned.columns and df_cleaned['T'].isnull().sum() > 0: t_missing_mask = df_cleaned['T'].isnull() df_cleaned['T'] = df_cleaned['T'].interpolate(method='time') df_cleaned['T'] = df_cleaned['T'].fillna(method='ffill').fillna(method='bfill') print("已使用时间序列插值填充T缺失值") # 16. 跳过 DD 字段处理 print("\n16. 跳过 DD 字段处理") print("DD 描述: 观测前10分钟内地面高度10-12米处风向") print("处理决策: 暂时跳过处理") print("理由: 无缺失值,但作为类别型数据,暂时保留原始格式,后续再决定如何处理。") # 17. 检查 Po 字段 print("\n17. 检查 Po 字段 (缺失率: 0.00%)") print("Po 描述: 气象站水平的大气压(毫米汞柱)") print("处理决策: 无需处理") print("理由: 无缺失值,且与土壤湿度有一定关系(影响蒸发),是重要字段。") # Po字段无缺失,无需处理 # 最终检查 print("\n=== 最终检查 ===") final_missing = df_cleaned.isnull().sum() final_missing = final_missing[final_missing > 0] if len(final_missing) > 0: print(f"以下字段仍有缺失值: {list(final_missing.index)}") print("使用迭代式填充处理剩余缺失值...") # 分离数值型特征 numeric_features = df_cleaned.select_dtypes(include=[np.number]).columns.tolist() if numeric_features: # 创建迭代式填充器 imputer = IterativeImputer( estimator=RandomForestRegressor(n_estimators=50, random_state=42), max_iter=20, random_state=42 ) # 应用填充 df_numeric_imputed = imputer.fit_transform(df_cleaned[numeric_features]) df_cleaned[numeric_features] = df_numeric_imputed print("已使用迭代式填充处理剩余缺失值") else: print("所有缺失值已成功处理!") print(f"\n最终数据形状: {df_cleaned.shape}") print("最终字段:", list(df_cleaned.columns)) # 验证填充效果 print("\n=== 填充效果验证 ===") def evaluate_imputation(original_df, imputed_df, missing_mask, field_name): """评估填充效果""" if missing_mask is None or not missing_mask.any(): print(f"{field_name}: 无缺失值,无需评估") return # 获取原始值和填充值 original_values = original_df.loc[missing_mask, field_name] imputed_values = imputed_df.loc[missing_mask, field_name] # 计算评估指标 mse = mean_squared_error(original_values, imputed_values) mae = mean_absolute_error(original_values, imputed_values) rmse = np.sqrt(mse) r2 = r2_score(original_values, imputed_values) print(f"\n{field_name} 填充效果评估:") print(f" 填充数量: {len(original_values)}") print(f" MSE: {mse:.4f}") print(f" MAE: {mae:.4f}") print(f" RMSE: {rmse:.4f}") print(f" R²: {r2:.4f}") # 绘制散点图比较原始值和填充值 plt.figure(figsize=(8, 6)) plt.scatter(original_values, imputed_values, alpha=0.5) plt.plot([original_values.min(), original_values.max()], [original_values.min(), original_values.max()], 'r--', lw=2) plt.xlabel('原始值') plt.ylabel('填充值') plt.title(f'{field_name} 填充效果') plt.tight_layout() plt.savefig(f'{field_name}_imputation_evaluation.png') plt.close() # 绘制分布对比图 plt.figure(figsize=(10, 6)) plt.subplot(1, 2, 1) sns.histplot(original_values, kde=True, color='blue', label='原始值') plt.title('原始值分布') plt.subplot(1, 2, 2) sns.histplot(imputed_values, kde=True, color='red', label='填充值') plt.title('填充值分布') plt.tight_layout() plt.savefig(f'{field_name}_distribution_comparison.png') plt.close() # 评估各字段的填充效果 evaluate_imputation(df_backup, df_cleaned, pa_missing_mask, 'Pa') evaluate_imputation(df_backup, df_cleaned, tx_missing_mask, 'Tx') evaluate_imputation(df_backup, df_cleaned, tn_missing_mask, 'Tn') evaluate_imputation(df_backup, df_cleaned, rrr_missing_mask, 'RRR') evaluate_imputation(df_backup, df_cleaned, p_missing_mask, 'P') evaluate_imputation(df_backup, df_cleaned, u_missing_mask, 'U') evaluate_imputation(df_backup, df_cleaned, vv_missing_mask, 'VV') evaluate_imputation(df_backup, df_cleaned, td_missing_mask, 'Td') evaluate_imputation(df_backup, df_cleaned, t_missing_mask, 'T') # 保存处理后的数据 output_path = 'fully_processed_meteorological_data_with_pa.csv' df_cleaned.to_csv(output_path, index=False) print(f"\n数据已保存到: {output_path}")报错了--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[61], line 352 349 plt.close() 351 # 评估各字段的填充效果 --> 352 evaluate_imputation(df_backup, df_cleaned, pa_missing_mask, 'Pa') 353 evaluate_imputation(df_backup, df_cleaned, tx_missing_mask, 'Tx') 354 evaluate_imputation(df_backup, df_cleaned, tn_missing_mask, 'Tn') Cell In[61], line 313 310 imputed_values = imputed_df.loc[missing_mask, field_name] 312 # 计算评估指标 --> 313 mse = mean_squared_error(original_values, imputed_values) 314 mae = mean_absolute_error(original_values, imputed_values) 315 rmse = np.sqrt(mse) File d:\anaconda\Lib\site-packages\sklearn\utils\_param_validation.py:216, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs) 210 try: 211 with config_context( 212 skip_parameter_validation=( 213 prefer_skip_nested_validation or global_skip_validation 214 ) 215 ): --> 216 return func(*args, **kwargs) 217 except InvalidParameterError as e: 218 # When the function is just a wrapper around an estimator, we allow 219 # the function to delegate validation to the estimator, but we replace 220 # the name of the estimator by the name of the function in the error 221 # message to avoid confusion. 222 msg = re.sub( 223 r"parameter of \w+ must be", 224 f"parameter of {func.__qualname__} must be", 225 str(e), 226 ) File d:\anaconda\Lib\site-packages\sklearn\metrics\_regression.py:565, in mean_squared_error(y_true, y_pred, sample_weight, multioutput) 515 """Mean squared error regression loss. 516 517 Read more in the :ref:User Guide <mean_squared_error>. (...) 561 0.825... 562 """ 563 xp, _ = get_namespace(y_true, y_pred, sample_weight, multioutput) 564 _, y_true, y_pred, sample_weight, multioutput = ( --> 565 _check_reg_targets_with_floating_dtype( 566 y_true, y_pred, sample_weight, multioutput, xp=xp 567 ) 568 ) 569 check_consistent_length(y_true, y_pred, sample_weight) 570 output_errors = _average((y_true - y_pred) ** 2, axis=0, weights=sample_weight) File d:\anaconda\Lib\site-packages\sklearn\metrics\_regression.py:198, in _check_reg_targets_with_floating_dtype(y_true, y_pred, sample_weight, multioutput, xp) 148 """Ensures that y_true, y_pred, and sample_weight correspond to the same 149 regression task. 150 (...) 194 correct keyword. 195 """ 196 dtype_name = _find_matching_floating_dtype(y_true, y_pred, sample_weight, xp=xp) --> 198 y_type, y_true, y_pred, multioutput = _check_reg_targets( 199 y_true, y_pred, multioutput, dtype=dtype_name, xp=xp 200 ) 202 # _check_reg_targets does not accept sample_weight as input. 203 # Convert sample_weight's data type separately to match dtype_name. 204 if sample_weight is not None: File d:\anaconda\Lib\site-packages\sklearn\metrics\_regression.py:105, in _check_reg_targets(y_true, y_pred, multioutput, dtype, xp) 102 xp, _ = get_namespace(y_true, y_pred, multioutput, xp=xp) 104 check_consistent_length(y_true, y_pred) --> 105 y_true = check_array(y_true, ensure_2d=False, dtype=dtype) 106 y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype) 108 if y_true.ndim == 1: File d:\anaconda\Lib\site-packages\sklearn\utils\validation.py:1107, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_all_finite, ensure_non_negative, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name) 1101 raise ValueError( 1102 "Found array with dim %d. %s expected <= 2." 1103 % (array.ndim, estimator_name) 1104 ) 1106 if ensure_all_finite: -> 1107 _assert_all_finite( 1108 array, 1109 input_name=input_name, 1110 estimator_name=estimator_name, 1111 allow_nan=ensure_all_finite == "allow-nan", 1112 ) 1114 if copy: 1115 if _is_numpy_namespace(xp): 1116 # only make a copy if array and array_orig may share memory File d:\anaconda\Lib\site-packages\sklearn\utils\validation.py:120, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name) 117 if first_pass_isfinite: 118 return --> 120 _assert_all_finite_element_wise( 121 X, 122 xp=xp, 123 allow_nan=allow_nan, 124 msg_dtype=msg_dtype, 125 estimator_name=estimator_name, 126 input_name=input_name, 127 ) File d:\anaconda\Lib\site-packages\sklearn\utils\validation.py:169, in _assert_all_finite_element_wise(X, xp, allow_nan, msg_dtype, estimator_name, input_name) 152 if estimator_name and input_name == "X" and has_nan_error: 153 # Improve the error message on how to handle missing values in 154 # scikit-learn. 155 msg_err += ( 156 f"\n{estimator_name} does not accept missing values" 157 " encoded as NaN natively. For supervised learning, you might want" (...) 167 "#estimators-that-handle-nan-values" 168 ) --> 169 raise ValueError(msg_err) ValueError: Input contains NaN.

编码: ascii, 置信度: 1.00 Training until validation scores don't improve for 20 rounds [10] training's auc: 0.999999 valid_1's auc: 0.999999 [20] training's auc: 0.999999 valid_1's auc: 0.999999 Early stopping, best iteration is: [1] training's auc: 0.999999 valid_1's auc: 0.999999 Validation AUC: 1.0000 --------------------------------------------------------------------------- InvalidIndexError Traceback (most recent call last) Cell In[16], line 188 186 samples = prepare_samples(all_see, all_click, all_play) 187 model, features, auc_score = train_model(samples) --> 188 result = predict_new_data(model, features, 'testA_did_show.csv') Cell In[16], line 164, in predict_new_data(model, feature_columns, test_file) 161 user_click_rate = pd.read_csv('user_click_rate.csv', encoding='gbk').set_index('did')['user_click_rate'] 162 video_popularity = pd.read_csv('video_popularity.csv', encoding='gbk').set_index('vid')['video_popularity'] --> 164 test_data['user_click_rate'] = test_data['did'].map(user_click_rate).fillna(0).astype(np.float32) 165 test_data['video_popularity'] = test_data['vid'].map(video_popularity).fillna(0).astype(np.int32) 167 test_data[feature_columns] = test_data[feature_columns].fillna(0) File ~\ANA\Lib\site-packages\pandas\core\series.py:4544, in Series.map(self, arg, na_action) 4464 def map( 4465 self, 4466 arg: Callable | Mapping | Series, 4467 na_action: Literal["ignore"] | None = None, 4468 ) -> Series: 4469 """ 4470 Map values of Series according to an input mapping or function. 4471 (...) 4542 dtype: object 4543 """ -> 4544 new_values = self._map_values(arg, na_action=na_action) 4545 return self._constructor(new_values, index=self.index, copy=False).__finalize__( 4546 self, method="map" 4547 ) File ~\ANA\Lib\site-packages\pandas\core\base.py:919, in IndexOpsMixin._map_values(self, mapper, na_action, convert) 916 arr = self._values 918 if isinstance(arr, ExtensionArray): --> 919 return arr.map(mapper, na_action=na_action) 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File ~\ANA\Lib\site-packages\pandas\core\arrays\categorical.py:1530, in Categorical.map(self, mapper, na_action) 1526 na_action = "ignore" 1528 assert callable(mapper) or is_dict_like(mapper) -> 1530 new_categories = self.categories.map(mapper) 1532 has_nans = np.any(self._codes == -1) 1534 na_val = np.nan File ~\ANA\Lib\site-packages\pandas\core\indexes\base.py:6419, in Index.map(self, mapper, na_action) 6383 """ 6384 Map values using an input mapping or function. 6385 (...) 6415 Index(['A', 'B', 'C'], dtype='object') 6416 """ 6417 from pandas.core.indexes.multi import MultiIndex -> 6419 new_values = self._map_values(mapper, na_action=na_action) 6421 # we can return a MultiIndex 6422 if new_values.size and isinstance(new_values[0], tuple): File ~\ANA\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert) 918 if isinstance(arr, ExtensionArray): 919 return arr.map(mapper, na_action=na_action) --> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File ~\ANA\Lib\site-packages\pandas\core\algorithms.py:1803, in map_array(arr, mapper, na_action, convert) 1799 mapper = mapper[mapper.index.notna()] 1801 # Since values were input this means we came from either 1802 # a dict or a series and mapper should be an index -> 1803 indexer = mapper.index.get_indexer(arr) 1804 new_values = take_nd(mapper._values, indexer) 1806 return new_values File ~\ANA\Lib\site-packages\pandas\core\indexes\base.py:3875, in Index.get_indexer(self, target, method, limit, tolerance) 3872 self._check_indexing_method(method, limit, tolerance) 3874 if not self._index_as_unique: -> 3875 raise InvalidIndexError(self._requires_unique_msg) 3877 if len(target) == 0: 3878 return np.array([], dtype=np.intp) InvalidIndexError: Reindexing only valid with uniquely valued Index objects,请帮我定位并解决问题

请用中文解释下述代码为什么可以保存EXCEL文件但是显示图片文件保存失败? import cv2 import numpy as np import pandas as pd from PIL import Image import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import os import tkinter as tk from tkinter import simpledialog, filedialog from skimage.feature import hog from skimage import exposure import glob import warnings warnings.filterwarnings('ignore') class SEMCircleDetector: def __init__(self): self.points = [] self.circles = [] # 存储用户选择的圆形 (x, y, radius) self.scale = None # 像素/微米 self.current_image = None self.current_image_path = None self.drawing_circle = False self.center = None self.radius = 0 self.temp_circle = None self.original_image_bgr = None # 保存原始图像的BGR版本 def mouse_callback(self, event, x, y, flags, param): if event == cv2.EVENT_LBUTTONDOWN: if len(self.points) < 6: # 标定点采集阶段 self.points.append((x, y)) cv2.circle(self.current_image, (x, y), 5, (0, 255, 0), -1) if len(self.points) % 2 == 0: idx = len(self.points) // 2 cv2.putText(self.current_image, f'Group {idx}', (x+10, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) cv2.imshow("SEM Image", self.current_image) else: # 圆形选择阶段 self.drawing_circle = True self.center = (x, y) elif event == cv2.EVENT_MOUSEMOVE: if self.drawing_circle and self.center is not None: self.radius = int(np.sqrt((x - self.center[0])**2 + (y - self.center[1])**2)) img_copy = self.current_image.copy() cv2.circle(img_copy, self.center, self.radius, (0, 255, 0), 2) cv2.imshow("SEM Image", img_copy) elif event == cv2.EVENT_LBUTTONUP: if self.drawing_circle and self.center is not None: self.drawing_circle = False # 确保圆形是有效的 if self.radius > 5: self.circles.append((self.center[0], self.center[1], self.radius)) cv2.circle(self.current_image, self.center, self.radius, (255, 0, 0), 2) cv2.imshow("SEM Image", self.current_image) self.center = None self.radius = 0 def calculate_scale(self): if len(self.points) < 6: raise ValueError("需要6个点来计算标尺") distances = [] for i in range(0, 6, 2): x1, y1 = self.points[i] x2, y2 = self.points[i+1] distance = np.sqrt((x2 - x1)**2 + (y2 - y1)**2) distances.append(distance) print(f"第{i//2+1}组距离: {distance:.2f} 像素") # 平均距离(像素)对应1微米 avg_distance = np.mean(distances) self.scale = avg_distance # 像素/微米 print(f"标尺计算: 1微米 = {self.scale:.2f} 像素") return self.scale def extract_features(self, img, center, radius): x, y = center r = int(radius) # 确保不越界 x1, y1 = max(0, x - r), max(0, y - r) x2, y2 = min(img.shape[1], x + r), min(img.shape[0], y + r) if x1 >= x2 or y1 >= y2: return None roi = img[y1:y2, x1:x2] if roi.size == 0: return None # 调整大小以便提取特征 roi = cv2.resize(roi, (64, 64)) gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY) # 提取HOG特征 try: fd, hog_image = hog(gray, orientations=8, pixels_per_cell=(16, 16), cells_per_block=(1, 1), visualize=True) except: # 如果HOG特征提取失败,使用零向量替代 fd = np.zeros(128) # HOG特征向量的默认长度 # 提取其他特征 mean_val = np.mean(gray) std_val = np.std(gray) entropy_val = self.calculate_entropy(gray) # 组合所有特征 features = np.concatenate([fd, [mean_val, std_val, entropy_val]]) return features def calculate_entropy(self, image): # 计算图像熵 hist = np.histogram(image, bins=256, range=(0, 255))[0] hist = hist / hist.sum() entropy = -np.sum(hist * np.log2(hist + 1e-10)) return entropy def train_classifier(self, features, labels): X_train, X_test, y_train, y_test = train_test_split( features, labels, test_size=0.2, random_state=42) clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) y_pred = clf.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f"Classifier accuracy: {accuracy:.2f}") return clf def process_image(self, image_path): self.current_image_path = image_path # 使用PIL读取图像并转换为OpenCV格式 pil_image = Image.open(image_path).convert('L') self.original_image = np.array(pil_image) self.original_image_bgr = cv2.cvtColor(self.original_image, cv2.COLOR_GRAY2BGR) self.current_image = self.original_image_bgr.copy() # 重置状态 self.points = [] self.circles = [] # 创建可调整大小的窗口 cv2.namedWindow("SEM Image", cv2.WINDOW_NORMAL) cv2.resizeWindow("SEM Image", 1000, 800) # 设置初始窗口大小 cv2.setMouseCallback("SEM Image", self.mouse_callback) print("请点击6个点来标定比例尺(每两个点之间的距离为1微米)") print("点击完成后,按任意键继续") while True: cv2.imshow("SEM Image", self.current_image) key = cv2.waitKey(1) & 0xFF if len(self.points) >= 6: break # 计算比例尺 self.calculate_scale() print(f"标尺计算完成: 1微米 = {self.scale:.2f} 像素") print("现在请选择至少50个圆形(点击圆心并拖动确定半径)") print("选择完成后,按 'q' 键退出") while True: cv2.imshow("SEM Image", self.current_image) key = cv2.waitKey(1) & 0xFF if key == ord('q'): break cv2.destroyAllWindows() # 保存标记后的图像 base_name = os.path.basename(image_path) name, ext = os.path.splitext(base_name) output_dir = os.path.join(os.path.dirname(image_path), "processed") os.makedirs(output_dir, exist_ok=True) # 保存用户框选的图片 - 使用PNG格式 user_marked_image = self.original_image_bgr.copy() for cx, cy, r in self.circles: cv2.circle(user_marked_image, (cx, cy), r, (255, 0, 0), 2) output_image_path = os.path.join(output_dir, f"{name}_user_marked.png") success = cv2.imwrite(output_image_path, user_marked_image) if success: print(f"用户框选的图像已保存: {output_image_path}") else: print(f"错误: 无法保存用户框选的图像到 {output_image_path}") print(f"请检查文件路径和权限") # 保存统计信息 self.save_statistics(image_path, output_dir) # 使用机器学习检测剩余圆形 self.detect_remaining_circles(output_dir) return self.circles def save_statistics(self, image_path, output_dir): if not self.circles: print("没有选择圆形") return # 转换为微米单位 radii_um = [r / self.scale for _, _, r in self.circles] # 创建DataFrame df = pd.DataFrame({ 'Center_X': [c[0] for c in self.circles], 'Center_Y': [c[1] for c in self.circles], 'Radius_pixels': [c[2] for c in self.circles], 'Radius_um': radii_um }) # 保存到CSV base_name = os.path.basename(image_path) name, ext = os.path.splitext(base_name) output_csv_path = os.path.join(output_dir, f"{name}_user_circles.csv") df.to_csv(output_csv_path, index=False) print(f"用户选择的圆形统计信息已保存: {output_csv_path}") # 显示基本统计信息 print(f"选择了 {len(radii_um)} 个圆形") print(f"平均半径: {np.mean(radii_um):.4f} 微米") print(f"半径标准差: {np.std(radii_um):.4f} 微米") print(f"最小半径: {np.min(radii_um):.4f} 微米") print(f"最大半径: {np.max(radii_um):.4f} 微米") def detect_remaining_circles(self, output_dir): print(f"已选择的圆形数量: {len(self.circles)}") if not self.circles or len(self.circles) < 10: print("选择的圆形数量不足,无法进行机器学习检测") return print("使用机器学习方法检测剩余圆形...") # 准备训练数据 features = [] labels = [] # 1表示圆形,0表示非圆形 # 正样本(已选择的圆形) valid_circles = 0 for cx, cy, r in self.circles: feat = self.extract_features(self.original_image_bgr, (cx, cy), r) if feat is not None: features.append(feat) labels.append(1) valid_circles += 1 print(f"有效正样本数量: {valid_circles}") # 负样本(随机非圆形区域) img_h, img_w = self.original_image_bgr.shape[:2] num_negative = valid_circles # 使用有效正样本数量 negative_count = 0 max_attempts = num_negative * 10 # 最大尝试次数 for attempt in range(max_attempts): if negative_count >= num_negative: break x = np.random.randint(50, img_w - 50) y = np.random.randint(50, img_h - 50) r = np.random.randint(10, 30) # 确保这个区域不包含任何已选择的圆形 is_valid = True for cx, cy, cr in self.circles: if np.sqrt((x - cx)**2 + (y - cy)**2) < cr + r: is_valid = False break if is_valid: feat = self.extract_features(self.original_image_bgr, (x, y), r) if feat is not None: features.append(feat) labels.append(0) negative_count += 1 print(f"负样本数量: {negative_count}") # 训练分类器 features = np.array(features) labels = np.array(labels) if len(np.unique(labels)) < 2: print("正负样本不平衡,无法训练分类器") return clf = self.train_classifier(features, labels) # 在全图像中检测圆形 detected_circles = [] step = 10 # 扫描步长 # 创建用于机器学习检测的图像副本 ml_image = self.original_image_bgr.copy() # 先绘制用户选择的圆形(蓝色) for cx, cy, r in self.circles: cv2.circle(ml_image, (cx, cy), r, (255, 0, 0), 2) # 再绘制机器学习检测的圆形(红色) for y in range(30, img_h - 30, step): for x in range(30, img_w - 30, step): # 检查这个位置是否已经有圆形 is_duplicate = False for cx, cy, cr in self.circles + detected_circles: if np.sqrt((x - cx)**2 + (y - cy)**2) < cr + 10: is_duplicate = True break if not is_duplicate: # 尝试不同的半径 for r in range(10, 50, 5): feat = self.extract_features(self.original_image_bgr, (x, y), r) if feat is not None: proba = clf.predict_proba([feat])[0] if proba[1] > 0.7: # 圆形概率阈值 detected_circles.append((x, y, r)) cv2.circle(ml_image, (x, y), r, (0, 0, 255), 2) break # 保存新检测的圆形 if detected_circles: # 保存机器学习检测结果的图像 - 使用PNG格式 base_name = os.path.basename(self.current_image_path) name, ext = os.path.splitext(base_name) output_image_path = os.path.join(output_dir, f"{name}_ml_detected.png") success = cv2.imwrite(output_image_path, ml_image) if success: print(f"机器学习检测结果图像已保存: {output_image_path}") else: print(f"错误: 无法保存机器学习检测结果图像到 {output_image_path}") print(f"请检查文件路径和权限") # 保存新的统计信息 ml_csv_path = os.path.join(output_dir, f"{name}_ml_circles.csv") radii_um = [r / self.scale for _, _, r in detected_circles] ml_df = pd.DataFrame({ 'Center_X': [c[0] for c in detected_circles], 'Center_Y': [c[1] for c in detected_circles], 'Radius_pixels': [c[2] for c in detected_circles], 'Radius_um': radii_um }) ml_df.to_csv(ml_csv_path, index=False) print(f"机器学习检测的圆形统计信息已保存: {ml_csv_path}") # 保存包含用户选择和机器学习检测的所有圆形 all_circles = self.circles + detected_circles all_csv_path = os.path.join(output_dir, f"{name}_all_circles.csv") all_radii_um = [r / self.scale for _, _, r in all_circles] all_df = pd.DataFrame({ 'Center_X': [c[0] for c in all_circles], 'Center_Y': [c[1] for c in all_circles], 'Radius_pixels': [c[2] for c in all_circles], 'Radius_um': all_radii_um, 'Source': ['User'] * len(self.circles) + ['ML'] * len(detected_circles) }) all_df.to_csv(all_csv_path, index=False) print(f"所有圆形统计信息已保存: {all_csv_path}") print(f"通过机器学习新检测到 {len(detected_circles)} 个圆形") # 打印一些机器学习检测的圆形半径信息用于调试 print("机器学习检测的部分圆形半径(微米):") for i, r_um in enumerate(radii_um[:10]): print(f"{i+1}. {r_um:.6f}") else: print("未通过机器学习检测到新的圆形") def get_image_files(folder_path): """获取文件夹中的所有图像文件""" extensions = ['*.png', '*.jpg', '*.jpeg', '*.tif', '*.tiff', '*.bmp', '*.PNG', '*.JPG', '*.JPEG', '*.TIF', '*.TIFF', '*.BMP'] image_files = [] for ext in extensions: pattern = os.path.join(folder_path, ext) image_files.extend(glob.glob(pattern)) return sorted(image_files) def main(): # 直接输入文件夹路径 folder_path = input("请输入包含SEM图像的文件夹路径: ").strip().strip('"') if not os.path.exists(folder_path): print("文件夹路径不存在,请检查后重试") return # 获取文件夹中的所有图像文件 image_paths = get_image_files(folder_path) if not image_paths: print("文件夹中没有找到图像文件") return print(f"找到 {len(image_paths)} 张图像文件:") for i, path in enumerate(image_paths): print(f"{i+1}. {os.path.basename(path)}") # 创建检测器并处理每张图像 detector = SEMCircleDetector() all_results = [] for i, path in enumerate(image_paths): print(f"\n处理图像 {i+1}/{len(image_paths)}: {os.path.basename(path)}") circles = detector.process_image(path) all_results.append((path, circles)) print("\n所有图像处理完成!") if __name__ == "__main__": main()

# 环境要求:Python 3.8+, scikit-learn, OpenCV, TensorFlow/Keras, numpy, pandas, matplotlib # 安装命令:pip install scikit-learn opencv-python tensorflow numpy pandas matplotlib scikit-image xgboost import os import numpy as np import cv2 import tensorflow as tf from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn.metrics import classification_report, confusion_matrix, accuracy_score from sklearn.pipeline import make_pipeline from sklearn.decomposition import PCA import matplotlib.pyplot as plt import seaborn as sns from tensorflow.keras.applications import MobileNetV2, EfficientNetB0 # 改用更先进的轻量级模型 from tensorflow.keras.models import Model from tensorflow.keras.layers import GlobalAveragePooling2D, Dense from tensorflow.keras.optimizers import Adam from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau from skimage.feature import graycomatrix, graycoprops, hog import joblib import xgboost as xgb from imblearn.over_sampling import SMOTE import application2 # 设置中文字体支持 plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 'WenQuanYi Micro Hei'] plt.rcParams['axes.unicode_minus'] = False # ====================== # 1. 数据准备与预处理(增强版) # ====================== def load_and_preprocess_data(data_dir, styles, img_size=(224, 224), max_per_class=1000, augment=True): """ 加载并预处理图像数据,包含数据增强 :param data_dir: 数据集目录 :param styles: 艺术风格列表 :param img_size: 目标图像尺寸 :param max_per_class: 每个风格最多加载的图像数 :param augment: 是否应用数据增强 :return: 图像数据和标签 """ images = [] labels = [] # 数据增强函数 def augment_image(img): # 随机旋转 angle = np.random.randint(-20, 20) M = cv2.getRotationMatrix2D((img.shape[1] // 2, img.shape[0] // 2), angle, 1.0) rotated = cv2.warpAffine(img, M, (img.shape[1], img.shape[0])) # 随机裁剪 crop_size = int(min(img.shape[0], img.shape[1]) * 0.8) x = np.random.randint(0, img.shape[1] - crop_size) y = np.random.randint(0, img.shape[0] - crop_size) cropped = img[y:y + crop_size, x:x + crop_size] cropped = cv2.resize(cropped, img_size) # 随机翻转 if np.random.rand() > 0.5: flipped = cv2.flip(img, 1) else: flipped = img # 随机亮度调整 hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV) hsv = hsv.astype("float32") hsv[:, :, 2] = hsv[:, :, 2] * np.random.uniform(0.8, 1.2) hsv[:, :, 2] = np.clip(hsv[:, :, 2], 0, 255) brightness_adjusted = cv2.cvtColor(hsv.astype("uint8"), cv2.COLOR_HSV2RGB) return [rotated, cropped, flipped, brightness_adjusted] print("开始加载和预处理图像...") for style_idx, style in enumerate(styles): style_dir = os.path.join(data_dir, style) if not os.path.exists(style_dir): print(f"警告: {style_dir} 目录不存在,跳过") continue print(f"处理风格: {style}") count = 0 for img_name in os.listdir(style_dir): if max_per_class and count >= max_per_class: break img_path = os.path.join(style_dir, img_name) try: # 读取并调整大小 img = cv2.imread(img_path) if img is None: continue # 调整尺寸和颜色空间转换 img = cv2.resize(img, img_size) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 添加原始图像 images.append(img) labels.append(style_idx) count += 1 # 数据增强(引用[2]建议) if augment and count < max_per_class // 2: # 限制增强数量 augmented_images = augment_image(img) for aug_img in augmented_images: if count >= max_per_class: break images.append(aug_img) labels.append(style_idx) count += 1 except Exception as e: print(f"处理图像 {img_path} 时出错: {str(e)}") print(f"已加载 {count} 张 {style} 风格的图像") return np.array(images), np.array(labels) # 设置数据集路径和风格类别 DATA_DIR = application2.MY_DATA_DIR # 需替换为实际数据集路径 ART_STYLES = application2.MY_ART_STYLES # 示例风格 # 加载数据(每类最多加载200张图像,应用数据增强) images, labels = load_and_preprocess_data(DATA_DIR, ART_STYLES, max_per_class=100, augment=True) if len(images) == 0: print("错误: 未加载到任何图像数据,请检查路径和文件") exit() print(f"成功加载 {len(images)} 张图像,{len(ART_STYLES)} 种风格") # 检查类别平衡 label_counts = {style: np.sum(labels == i) for i, style in enumerate(ART_STYLES)} print("类别分布:") for style, count in label_counts.items(): print(f"{style}: {count} 张") # 划分训练集和测试集 (80%训练, 20%测试) X_train, X_test, y_train, y_test = train_test_split( images, labels, test_size=0.2, random_state=42, stratify=labels ) # ====================== # 2. 特征工程(优化版) # ====================== def extract_traditional_features(images): """ 提取传统图像特征(优化版) :param images: 图像数据 :return: 特征矩阵 """ traditional_features = [] print("提取传统图像特征...") for i, img in enumerate(images): if i % 50 == 0: # 每处理50张图像输出一次进度 print(f"处理进度: {i + 1}/{len(images)}") features = [] try: # 1. 颜色直方图 (RGB + HSV) for j in range(3): # RGB hist = cv2.calcHist([img], [j], None, [64], [0, 256]) cv2.normalize(hist, hist) features.extend(hist.flatten()) # HSV颜色空间 hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV) for j in range(3): # HSV hist = cv2.calcHist([hsv], [j], None, [64], [0, 256]) cv2.normalize(hist, hist) features.extend(hist.flatten()) # 2. 纹理特征 (灰度共生矩阵 + HOG) gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) # GLCM纹理特征 glcm = graycomatrix(gray, distances=[1, 3], angles=[0, np.pi / 4, np.pi / 2, 3 * np.pi / 4], levels=256, symmetric=True, normed=True) for prop in ['contrast', 'dissimilarity', 'homogeneity', 'energy', 'correlation']: prop_val = graycoprops(glcm, prop) features.extend(prop_val.flatten()) # HOG特征(引用[2]建议) hog_features = hog(gray, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), block_norm='L2-Hys') features.extend(hog_features) except Exception as e: print(f"处理图像时出错: {str(e)},使用默认特征值") # 提供默认特征值 features.extend([0.0] * (64 * 6 + 20 + len(hog_features)) if 'hog_features' in locals() else 36) traditional_features.append(features) return np.array(traditional_features) def create_feature_extractor(model_name='MobileNetV2'): """ 创建深度特征提取器(使用更先进的轻量级模型) :param model_name: 模型名称 (MobileNetV2 或 EfficientNetB0) :return: 特征提取模型 """ input_shape = (224, 224, 3) if model_name == 'EfficientNetB0': base_model = EfficientNetB0( weights='imagenet', include_top=False, input_shape=input_shape ) else: # 默认使用MobileNetV2 base_model = MobileNetV2( weights='imagenet', include_top=False, input_shape=input_shape ) # 添加全局平均池化层 x = base_model.output x = GlobalAveragePooling2D()(x) # 创建特征提取模型 model = Model(inputs=base_model.input, outputs=x) # 冻结基础模型权重 for layer in base_model.layers: layer.trainable = False return model def extract_deep_features(images, model): """ 使用预训练模型提取深度特征 :param images: 图像数据 :param model: 特征提取模型 :return: 深度特征矩阵 """ print("提取深度特征...") # 预处理图像 if isinstance(model.layers[0].input_shape[0], tuple): # EfficientNet preprocessed_imgs = tf.keras.applications.efficientnet.preprocess_input( images.astype('float32')) else: # MobileNetV2 preprocessed_imgs = tf.keras.applications.mobilenet_v2.preprocess_input( images.astype('float32')) # 分批提取特征以节省内存 batch_size = 32 deep_features = [] for i in range(0, len(preprocessed_imgs), batch_size): batch = preprocessed_imgs[i:i + batch_size] batch_features = model.predict(batch, verbose=0) deep_features.append(batch_features) if i % (4 * batch_size) == 0: # 每处理4个批次输出一次进度 print(f"深度特征提取进度: {min(i + batch_size, len(preprocessed_imgs))}/{len(preprocessed_imgs)}") deep_features = np.vstack(deep_features) return deep_features # 创建特征提取器(引用[4]建议使用轻量模型) feature_extractor = create_feature_extractor('MobileNetV2') print("开始提取传统特征...") train_trad_features = extract_traditional_features(X_train) test_trad_features = extract_traditional_features(X_test) print("开始提取深度特征...") train_deep_features = extract_deep_features(X_train, feature_extractor) test_deep_features = extract_deep_features(X_test, feature_extractor) # 合并特征 print("合并传统特征和深度特征...") X_train_features = np.hstack([train_trad_features, train_deep_features]) X_test_features = np.hstack([test_trad_features, test_deep_features]) # 释放内存 del train_trad_features, test_trad_features, train_deep_features, test_deep_features # 特征降维(PCA) print("应用PCA降维...") pca = PCA(n_components=0.95, random_state=42) # 保留95%的方差 X_train_pca = pca.fit_transform(X_train_features) X_test_pca = pca.transform(X_test_features) print(f"降维后特征数量: {X_train_pca.shape[1]} (原始: {X_train_features.shape[1]})") # 处理类别不平衡(SMOTE) print("应用SMOTE处理类别不平衡...") smote = SMOTE(random_state=42) X_train_res, y_train_res = smote.fit_resample(X_train_pca, y_train) # 特征标准化 scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train_res) X_test_scaled = scaler.transform(X_test_pca) # ====================== # 3. 模型构建与训练(优化版) # ====================== def train_xgboost(X_train, y_train): """ 训练XGBoost模型(引用[1]建议的先进算法) """ print("训练XGBoost模型...") # 标签编码 le = LabelEncoder() y_train_encoded = le.fit_transform(y_train) # 定义参数网格 param_grid = { 'n_estimators': [100, 200], 'max_depth': [3, 5, 7], 'learning_rate': [0.01, 0.1], 'subsample': [0.8, 1.0], 'colsample_bytree': [0.8, 1.0] } # 创建XGBoost分类器 xgb_model = xgb.XGBClassifier( objective='multi:softmax', random_state=42, n_jobs=-1 ) # 使用分层K折交叉验证 skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=42) # 网格搜索 grid_search = GridSearchCV( estimator=xgb_model, param_grid=param_grid, scoring='accuracy', cv=skf, n_jobs=-1, verbose=1 ) grid_search.fit(X_train, y_train_encoded) print(f"最佳参数: {grid_search.best_params_}") print(f"最佳准确率: {grid_search.best_score_:.4f}") return grid_search.best_estimator_, le def train_cnn_model(X_train, y_train, X_val, y_val, num_classes): """ 训练端到端的CNN模型(可选) """ print("训练端到端CNN模型...") # 创建模型 base_model = MobileNetV2( weights='imagenet', include_top=False, input_shape=(224, 224, 3) ) # 添加自定义层 x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(512, activation='relu')(x) x = Dense(256, activation='relu')(x) predictions = Dense(num_classes, activation='softmax')(x) model = Model(inputs=base_model.input, outputs=predictions) # 冻结基础模型层 for layer in base_model.layers: layer.trainable = False # 编译模型 model.compile( optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # 回调函数(引用[3]建议的早停机制) callbacks = [ EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True), ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=1e-6) ] # 训练模型 history = model.fit( X_train, y_train, epochs=50, batch_size=32, validation_data=(X_val, y_val), callbacks=callbacks, verbose=1 ) # 绘制训练历史 plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(history.history['accuracy'], label='训练准确率') plt.plot(history.history['val_accuracy'], label='验证准确率') plt.title('模型准确率') plt.ylabel('准确率') plt.xlabel('轮次') plt.legend() plt.subplot(1, 2, 2) plt.plot(history.history['loss'], label='训练损失') plt.plot(history.history['val_loss'], label='验证损失') plt.title('模型损失') plt.ylabel('损失') plt.xlabel('轮次') plt.legend() plt.savefig('cnn_training_history.png') plt.show() return model print("构建和训练分类模型...") # 训练XGBoost模型(主要模型) xgb_model, label_encoder = train_xgboost(X_train_scaled, y_train_res) # 训练端到端CNN模型(可选) # 注意:需要原始图像数据,且需要额外划分验证集 X_train_cnn, X_val_cnn, y_train_cnn, y_val_cnn = train_test_split( X_train, y_train, test_size=0.2, random_state=42, stratify=y_train ) cnn_model = train_cnn_model(X_train_cnn, y_train_cnn, X_val_cnn, y_val_cnn, len(ART_STYLES)) # ====================== # 4. 模型评估(增强版) # ====================== def evaluate_model(model, X_test, y_test, model_name, le=None): """ 评估模型性能并可视化结果 """ print(f"\n{model_name} 性能评估:") # 对于XGBoost需要标签编码 if le: y_test_encoded = le.transform(y_test) y_pred = model.predict(X_test) y_pred_labels = le.inverse_transform(y_pred) else: y_pred = model.predict(X_test) y_pred_labels = np.argmax(y_pred, axis=1) # 计算准确率 accuracy = accuracy_score(y_test, y_pred_labels) print(f"整体准确率: {accuracy:.4f}") # 打印分类报告 print(classification_report(y_test, y_pred_labels, target_names=ART_STYLES)) # 绘制混淆矩阵 cm = confusion_matrix(y_test, y_pred_labels) plt.figure(figsize=(10, 8)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=ART_STYLES, yticklabels=ART_STYLES) plt.title(f'{model_name} 混淆矩阵 (准确率: {accuracy:.2f})') plt.xlabel('预测标签') plt.ylabel('真实标签') plt.savefig(f'{model_name}_confusion_matrix.png') plt.show() return accuracy # 评估XGBoost模型 xgb_accuracy = evaluate_model(xgb_model, X_test_scaled, y_test, "XGBoost", le=label_encoder) # 评估CNN模型 cnn_accuracy = evaluate_model(cnn_model, X_test, y_test, "CNN模型") # ====================== # 5. 模型集成(可选) # ====================== def ensemble_predict(models, weights, X): """ 集成多个模型的预测结果 """ predictions = np.zeros((X.shape[0], len(ART_STYLES))) for model, weight in zip(models, weights): if isinstance(model, xgb.XGBClassifier): pred = model.predict_proba(X) else: pred = model.predict(X) predictions += weight * pred return np.argmax(predictions, axis=1) # 集成模型(权重根据准确率分配) models = [xgb_model, cnn_model] weights = [xgb_accuracy, cnn_accuracy] # 根据准确率分配权重 total_weight = sum(weights) normalized_weights = [w / total_weight for w in weights] # 准备CNN模型的测试数据(需要原始图像) X_test_cnn = np.array([cv2.resize(img, (224, 224)) for img in X_test]) X_test_cnn = tf.keras.applications.mobilenet_v2.preprocess_input(X_test_cnn.astype('float32')) # 集成预测 ensemble_pred = ensemble_predict(models, normalized_weights, [X_test_scaled, X_test_cnn]) # 评估集成模型 ensemble_accuracy = accuracy_score(y_test, ensemble_pred) print(f"\n集成模型性能评估:") print(f"整体准确率: {ensemble_accuracy:.4f}") print(classification_report(y_test, ensemble_pred, target_names=ART_STYLES)) # ====================== # 6. 可视化预测结果(增强版) # ====================== def visualize_predictions(models, weights, X_test, y_test, num_samples=5): """ 可视化模型预测结果(支持多模型) """ plt.figure(figsize=(15, 10)) indices = np.random.choice(range(len(X_test)), num_samples) # 准备CNN输入 X_test_cnn = np.array([cv2.resize(img, (224, 224)) for img in X_test]) X_test_cnn = tf.keras.applications.mobilenet_v2.preprocess_input(X_test_cnn.astype('float32')) for i, idx in enumerate(indices): # 获取图像和真实标签 image = X_test[idx] true_label = ART_STYLES[y_test[idx]] # 获取特征 trad_features = extract_traditional_features([image]) deep_features = extract_deep_features(np.expand_dims(image, axis=0), feature_extractor) features = np.hstack([trad_features, deep_features]) features_pca = pca.transform(features) features_scaled = scaler.transform(features_pca) # 获取各模型预测 model_probs = [] model_preds = [] # XGBoost预测 xgb_prob = xgb_model.predict_proba(features_scaled)[0] model_probs.append(xgb_prob) model_preds.append(ART_STYLES[np.argmax(xgb_prob)]) # CNN预测 cnn_input = np.expand_dims(X_test_cnn[idx], axis=0) cnn_prob = cnn_model.predict(cnn_input, verbose=0)[0] model_probs.append(cnn_prob) model_preds.append(ART_STYLES[np.argmax(cnn_prob)]) # 集成预测 ensemble_prob = weights[0] * xgb_prob + weights[1] * cnn_prob ensemble_pred = ART_STYLES[np.argmax(ensemble_prob)] # 绘制图像和预测结果 plt.subplot(2, num_samples, i + 1) plt.imshow(image) plt.title(f"真实: {true_label}\n集成预测: {ensemble_pred}") plt.axis('off') # 绘制概率分布 plt.subplot(2, num_samples, i + num_samples + 1) width = 0.35 x = np.arange(len(ART_STYLES)) plt.bar(x - width / 2, model_probs[0], width, label='XGBoost') plt.bar(x + width / 2, model_probs[1], width, label='CNN') # 标记最高概率 max_idx = np.argmax(ensemble_prob) plt.annotate(f"{ensemble_prob[max_idx]:.2f}", xy=(max_idx, ensemble_prob[max_idx]), xytext=(max_idx, ensemble_prob[max_idx] + 0.05), arrowprops=dict(facecolor='black', shrink=0.05)) plt.xticks(x, ART_STYLES, rotation=45) plt.xlabel('艺术风格') plt.ylabel('概率') plt.ylim(0, 1) plt.legend() plt.tight_layout() plt.savefig('ensemble_prediction_visualization.png') plt.show() # 可视化集成模型的预测 visualize_predictions(models, normalized_weights, X_test, y_test, num_samples=5) # ====================== # 7. 保存模型 # ====================== # 设置保存目录 save_dir = "saved_models" os.makedirs(save_dir, exist_ok=True) # 保存最佳模型 xgb_model_path = os.path.join(save_dir, 'art_style_xgb_classifier.pkl') cnn_model_path = os.path.join(save_dir, 'art_style_cnn_classifier.h5') scaler_path = os.path.join(save_dir, 'feature_scaler.pkl') pca_path = os.path.join(save_dir, 'pca_transformer.pkl') label_encoder_path = os.path.join(save_dir, 'label_encoder.pkl') joblib.dump(xgb_model, xgb_model_path) cnn_model.save(cnn_model_path) joblib.dump(scaler, scaler_path) joblib.dump(pca, pca_path) joblib.dump(label_encoder, label_encoder_path) print(f"XGBoost模型已保存为 {xgb_model_path}") print(f"CNN模型已保存为 {cnn_model_path}") print(f"特征标准化器已保存为 {scaler_path}") print(f"PCA转换器已保存为 {pca_path}") print(f"标签编码器已保存为 {label_encoder_path}") print("程序执行完成!")报错Traceback (most recent call last): File "D:\python_program\python_work2\train5.py", line 522, in <module> ensemble_pred = ensemble_predict(models, normalized_weights, [X_test_scaled, X_test_cnn]) File "D:\python_program\python_work2\train5.py", line 499, in ensemble_predict predictions = np.zeros((X.shape[0], len(ART_STYLES))) AttributeError: 'list' object has no attribute 'shape'帮我解决并给出完整代码

大家在看

recommend-type

300解密软件

S7300解密软件,可以机密新型CPU,有存储卡的那种
recommend-type

AAA2.5及汉化补丁

Advanced Aircraft Analysis V2.5.1.53 (3A) 在win7 64位上安装测试。有注册机和安装视频。支持winxp和win732位和64位系统。 Darcorp Advanced Aircraft Analysis V2.5.1.53 (AAA) 软件是一款面向于高级用户的飞机设计和仿真分析软件,目前广泛应用于数十个国家的各种机构,已然成为飞机设计、开发、稳定性分析以及飞行控制的工业标准软件。适用于 FAR23、FAR25、UAV无人驾驶飞机与 Military 规范,为全球飞机公司(如波音公司)、政府部门(如 FAA)与学校采用于飞机初步设计、分析、与 3-D 绘图的一套完整软件工具。 Advanced Aircraft Analysis (AAA) 是行业标准的飞机设计,稳定性和控制分析软件。 安装在超过45个国家,AAA所使用的主要航空工程大学,飞机制造商和世界各地的军事组织。 Advanced Aircraft Analysis(AAA)是行业标准的飞机设计 AAA提供了一个功能强大的框架,以支持飞机初步设计迭代和非独特的过程。 AAA计划允许学生和初步设计工程师从早期的大小通过开环和闭环动态稳定性和灵敏度分析的重量,而该机的配置工作在监管和成本的限制。
recommend-type

C#解析CAD文件

C# 读取Autocad DXF文件图形数据,基于WPF环境开发并显示图形文件
recommend-type

StepInt3-Plugin-x64:StepInt3插件(x64)-x64dbg的插件

StepInt3插件(x64)-x64dbg的插件 有关此插件的x86版本,请访问 概述 一个插件来解决int3断点异常 特征 自动跳过int3断点异常 从插件菜单启用/禁用的选项 如何安装 如果当前正在运行x64dbg(x64dbg 64位),请停止并退出。 将StepInt3.dp64复制到x64dbg\x64\plugins文件夹中。 启动x64dbg 信息 由撰写 使用 RadASM项目(.rap)用于管理和编译插件。 RadASM IDE可以在下载 该插件的x64版本使用 要构建此x64版本,还需要。 x64dbg x64dbg github x64dbg开关
recommend-type

BCM 56XX SDK 编程手册

Broadcom SDK 5.6 平台指南,关于SDK编译方法、步骤的编程手册,是学习了解Broadcom SDK的很有用的参考手册

最新推荐

recommend-type

XPage新标签页 - Chrome 应用商店 0.6.7.0.crx

XPage新标签页 - Chrome 应用商店 0.6.7.0.crx
recommend-type

ComfyUILotus Depth实现高效单目深度估计与细节重建

文件编号:c0068 ComfyUI使用教程、开发指导、资源下载: https://datayang.blog.csdn.net/article/details/145220524 AIGC工具平台Tauri+Django开源ComfyUI项目介绍和使用 https://datayang.blog.csdn.net/article/details/146316250 更多工具介绍 项目源码搭建介绍: 《我的AI工具箱Tauri+Django开源git项目介绍和使用》https://datayang.blog.csdn.net/article/details/146156817 图形桌面工具使用教程: 《我的AI工具箱Tauri+Django环境开发,支持局域网使用》https://datayang.blog.csdn.net/article/details/141897682
recommend-type

基于STM32单片机的串口双机通信汽车电量里程模拟项目 - OLED显示

内容概要:本文介绍了基于STM32单片机的串口双机通信项目,用于模拟汽车电量和里程。该项目涉及两个STM32单片机之间的数据交互,其中一个作为主控板负责调整参数并显示,另一个作为驱动板负责发送传感器数据。项目提供了详细的程序源码和Proteus仿真文件,涵盖了硬件连接、按键处理、ADC采集、串口通信协议以及OLED显示等多个方面。文中还特别提到了按键消抖、软件滤波、串口通信优化和OLED图形界面的设计细节。 适合人群:嵌入式系统开发人员,尤其是对STM32单片机和串口通信感兴趣的开发者。 使用场景及目标:本项目的应用场景为汽车电子系统的开发与测试,旨在帮助开发人员理解和掌握STM32单片机的串口通信机制及其应用。目标是通过实际项目操作,提升开发人员对嵌入式系统设计的理解和技术水平。 其他说明:项目中包含了多个实用的技术点,如状态机实现的按键处理、一阶低通滤波、DMA+环形缓冲区的串口通信优化、OLED GUI库的应用等。此外,项目还包括了一个有趣的彩蛋——当里程达到520公里时,OLED会显示特殊爱心动画。
recommend-type

破解dex2jar: Android应用反编译与分析指南

标题中的“dex2jar”指的是一个用于将Android应用程序中的DEX文件(Dalvik可执行文件)转换成Java JAR文件的工具。这个过程被称为“DEX转JAR”,是一个逆向工程的过程,它允许开发者查看和分析Android应用程序的原始Java代码,这通常用于学习、测试和安全分析目的。破解一词在此上下文中可能用于描述不正当手段获取程序的源代码以进行修改或绕过安全机制等行为,但请注意,任何未经授权的修改和使用都可能违反法律和版权。 描述部分提供了使用dex2jar工具的基本步骤。dex2jar通常是一个批处理文件(dex2jar.bat),用于在Windows环境下执行操作。它将DEX文件(classes.dex)作为输入,并生成对应的JAR文件。这个过程需要用户已经下载并解压了dex2jar的压缩包,并将其中的dex2jar.bat文件放在一个可以访问的目录中。然后,用户需要将目标Android应用程序中的classes.dex文件复制到该目录下,并在命令行界面中运行以下命令: dex2jar.bat classes.dex 执行完毕后,会在同一目录下生成名为classes.dex.dex2jar.jar的文件。这个JAR文件实质上是将DEX文件中的类转换成了Java的.class文件,并打包成了JAR格式,供后续的分析或修改使用。 【标签】中的“Android 破解”可能被误解为破解Android应用程序的安全机制,实际上,这个标签更准确的意义是分析和理解Android应用程序的工作原理。而“jar dex”则是指JAR文件与DEX文件之间的转换关系。 【压缩包子文件的文件名称列表】中列举了几个文件名,其中“使用说明.txt”很可能是该工具的官方使用指南,提供更详细的使用说明、安装步骤和常见问题的解答。“dex2jar最新版下载dex2jar下载 2.0官方版_ - pc6下载站.url”则是一个指向PC6下载站上dex2jar工具下载页面的网页链接。“dex2jar-2.0”可能是指特定版本号的dex2jar工具压缩包。 总结以上信息,dex2jar是一个专门用于Android应用安全分析和逆向工程的工具,它将DEX文件转换成Java的JAR文件以便进行代码的查看和分析。尽管这个过程在安全研究和教育目的下是合法且有价值的,但任何使用这一工具的非法破解、侵犯版权或进行恶意行为都是不可接受的。
recommend-type

共享内存与共识算法详解

### 共享内存与共识算法详解 在分布式系统中,共享内存和共识算法是两个核心的概念。共享内存涉及到多个进程对同一数据的读写操作,而共识算法则帮助多个进程在可能存在故障的情况下达成一致的决策。下面将详细介绍这两个方面的相关知识。 #### 共享内存 共享内存的寄存器规范由Lamport在多篇论文中引入,同时定义了安全、规则和原子三种语义。 - **安全寄存器**:在三种寄存器语义中,安全寄存器提供的保证最弱。当读操作与写操作并发时,安全寄存器可能返回任意值,甚至可能是从未被写入或尝试写入的值。在多处理器机器中,进程可能并发访问实现寄存器抽象的底层硬件位置,从而导致返回任意值。在任意故障模型中
recommend-type

计算机专业本科生和研究生就业薪资待遇

<think>好的,我需要回答用户关于计算机专业本科和研究生就业薪资对比的问题。首先,我得先看看用户提供的引用资料,看看里面有没有相关的数据。 引用[4]提到,2019届计算机类本科毕业生的平均月收入是6858元,而高职是4883元。这应该可以作为本科生的参考数据。至于研究生,引用[1]指出重庆大学的计算机和软件硕士就业情况良好,薪资高于行业平均水平,但没有具体数字。不过引用[3]提到,前20名的高校多为985/211,尤其是理工类院校的毕业生薪资更高。这里可能需要结合其他信息来推断研究生的薪资水平。 另外,引用[2]提到计算机专业毕业生薪资一般在万元以上,但不确定这是否特指研究生还是包括
recommend-type

eWebEditor 10.3最新版特性与安全升级指南

从提供的信息来看,我们需要深入了解和探讨的内容主要集中在“eWebEditor最新版”这一主题上。eWebEditor是一款流行的在线HTML编辑器,它支持ASP和ASP.NET环境,并广泛用于Web内容管理。通过给出的标题和描述,以及标签和文件名称列表,我们可以推导出一系列相关的知识点。 ### 标题知识点解析 #### eWebEditor的定义与功能 “eWebEditor最新版”中提到的“eWebEditor”指的是在线HTML编辑器产品,它被广泛应用于需要在线编辑和发布网页内容的场合。编辑器通常包含许多功能,比如文本格式化、图像插入、链接管理等,提供用户友好和接近桌面程序的编辑体验。eWebEditor产品以ASP和ASP.NET作为其主要的技术平台。 #### “最新版”更新内容 “最新版”表明我们正在讨论的是eWebEditor的最新版本更新,该版本很可能是为了增加新功能、提升性能、修复已知问题或改善安全性能。一般来说,软件的更新也可能会引入对新操作系统或浏览器的兼容性,以及对现有API或开发环境的新支持。 ### 描述知识点解析 #### “亲测可用”的含义 从“亲测 可用”的描述中我们可以推断出,发布者可能已经对“eWebEditor最新版”进行了测试,并验证了其在实际使用中的性能和稳定性。该短语传递出一个积极的信号,即该版本值得信赖,用户可以期待它将正常工作,无需担心兼容性或功能缺失的问题。 ### 标签知识点解析 #### eWebEditor的版本标识 “eWebEditor ASPX 10.3 最新版”中的标签指出我们讨论的版本号为10.3,这是一个具体的产品版本,意味着它可能包含了一些特定的更新或新增特性。通过版本号,我们可以推断产品已经经过了多次迭代和改进。 #### ASPX技术框架 在标签中提到的“ASPX”,这表明eWebEditor最新版支持ASP.NET Web Forms技术,ASPX是ASP.NET网页的标准文件扩展名。这一信息指出编辑器适合使用.NET框架的网站开发环境。 ### 文件名称列表知识点解析 #### “升级说明.txt”文件 “升级说明.txt”是一个文本文件,它可能包含了eWebEditor从上一版本升级到最新版本时的变化说明,例如新增功能、改进的地方以及需要注意的变更。开发者或维护人员在升级时应该仔细阅读这些说明,以便于平滑过渡到新版本,并最大化地利用新功能。 #### “安全说明.txt”文件 “安全说明.txt”文件通常提供了关于软件安全性的相关信息,这可能包括了针对最新版的安全补丁、修复的安全漏洞列表以及安全最佳实践的建议。特别是对于在线编辑器这类直接参与网页内容生成的工具,安全尤为重要,因此,安全说明文件对于确保编辑器和整个网站的安全运行至关重要。 #### “ewebeditor”文件夹或组件 “ewebeditor”可能是实际包含eWebEditor编辑器文件的文件夹名称。通常,这类文件夹内会包含用于前端的JavaScript文件、用于后端处理的服务器端代码(ASP.NET或ASP代码),以及相关的样式文件和资源文件。对于开发者来说,了解这些文件和组件的组织结构对于集成和配置编辑器至关重要。 综合以上信息,我们可以了解到eWebEditor的最新版本更新了很多内容,可能包含性能和安全性的提升,并可能对特定的技术平台如ASP.NET提供了更好的支持。用户应该参考升级和安全说明文件,以便正确理解和应用这些更新。对于开发者而言,掌握如何在项目中部署和配置eWebEditor编辑器也是一个重要的技能点。
recommend-type

分布式系统中的时间抽象与故障处理

### 分布式系统中的时间抽象与故障处理 #### 1. 故障检测概述 在分布式系统中,存在三种不同的系统假设:异步系统假设、同步系统假设和部分同步系统假设。异步系统不包含任何时间假设,我们的进程和链路抽象直接体现了这一点。然而,这些抽象不足以定义同步和部分同步系统。 为了添加时间假设,一种方法是用时间保证来扩展进程和链路抽象,但这会导致规范过于复杂。因此,我们引入了故障检测器的抽象概念,它能提供关于哪些进程崩溃、哪些进程正常的信息,不过这些信息不一定准确。 故障检测器抽象相较于直接对进程和链路做时间假设具有以下两个优势: - 减轻了用时间假设扩展进程和链路抽象的需求,保留了这些抽象的简
recommend-type

我发一份参考课程设计目录

<think>好的,用户之前问过关于高校实验室安排系统的数据库课程设计,现在他需要一份参考的课程设计目录。首先,我需要回顾之前的对话,确保目录结构和他之前的问题相关,并且覆盖他提到的核心需求、ER模型、表结构、业务逻辑等内容。 用户现在要目录,可能是在进行课程设计的结构规划,需要明确的章节安排。我需要考虑课程设计通常包含的部分,比如需求分析、系统设计、数据库设计、实现、测试等。同时,结合之前给出的内容,比如ER图、表结构、冲突检测算法等,确保这些部分在目录中有体现。 还要注意用户可能希望目录足够详细,包含各个关键部分,比如摘要、需求分析、数据库设计(包括概念结构、逻辑结构、物理结构)、系统
recommend-type

惠普AMTBrand深度解析与技术特点

标题和描述中提到的“惠普AMTBrand”,可能是指惠普(Hewlett-Packard,简称HP)公司旗下的AMTBrand相关产品或技术。然而,由于给出的信息非常有限,我们可以先从惠普公司以及AMT(Active Management Technology,主动管理技术)两个方面进行展开。惠普是全球知名的IT企业,提供多种计算机硬件、软件、云服务和解决方案,而AMT是英特尔(Intel)研发的一种硬件级别的远程管理技术。 首先,我们来了解惠普公司: 惠普(Hewlett-Packard Enterprise,简称HPE),是全球领先的信息技术解决方案提供商。成立于1939年,由Bill Hewlett和David Packard在一间车库里创立,如今已经成为全球范围内提供广泛IT产品与服务的企业。惠普的产品和服务包括但不限于个人计算机(PC)、打印设备、工作站、服务器、网络设备、存储解决方案以及软件和服务。 惠普在IT服务管理、云计算、大数据和分析等领域均有涉猎,并为各种规模的企业提供从基础架构到应用管理的全方位解决方案。随着数字化转型的不断深入,惠普也在不断地通过研发新技术和收购相关企业来拓展其产品和服务的范围。 接着,我们探索AMT技术: AMT是英特尔推出的一种基于硬件的管理解决方案,它允许IT部门远程管理企业中的个人计算机和其他设备。AMT是一种集成在商用英特尔处理器中的技术,能够在个人电脑关机或者操作系统失效的情况下,提供网络访问以及硬件级别的远程管理功能。这项技术最初由英特尔在2006年发布,历经数代更新,为IT运维人员提供了众多功能,如远程开机、远程维护、软件部署、系统监控等。 AMT的优势主要体现在以下几个方面: 1. 远程访问:即使目标设备没有开机或操作系统出现故障,AMT也可以实现远程访问和管理。 2. 安全性:它提供比传统软件解决方案更为深入的硬件级别安全机制,确保数据传输的安全。 3. 节约成本:通过减少现场支持的需要,AMT能够帮助公司节约维护成本。 4. 提高效率:快速远程解决问题,从而提高IT部门的工作效率和响应速度。 然而,由于AMT技术基于特定的硬件和软件,这意味着用户需要购买带有AMT功能的英特尔处理器和相应的支持软件,同时可能需要来自惠普或其他OEM厂商的相应硬件。 最后,提到“压缩包子文件的文件名称列表”中的“AMTBrand”,这可能意味着提供或关联惠普AMTBrand技术的文件压缩包。考虑到文件名称的单一性和没有其他文件描述提供进一步信息,不能确定这个压缩包内具体包含哪些内容。不过,可以推测该压缩包内可能包含了介绍惠普AMTBrand技术的文档、演示、驱动程序、更新或管理工具等。 综合来看,文件中提到的“惠普AMTBrand”可能是指惠普公司与英特尔AMT技术结合后形成的产品或服务品牌,旨在为用户提供更为高效和安全的远程管理解决方案。考虑到惠普公司自身在IT领域的深厚积累和AMT技术的先进性,惠普AMTBrand很可能为企业级客户提供了重要价值。