藉由Copilot輔助分析房市資料 — Linear Regression and Neural Network

藉由Copilot輔助分析房市資料 — Linear Regression and Neural Network

藉由Copilot輔助分析房市資料 — Linear Regression and Neural Network

前情提要:先透過爬蟲去爬政府房市交易資訊 ,並且做資料前處理把一些骯髒資料剔除

再來是透過詢問copilot去取得linear regression 跟NN如何去使用

再加上一個前處理: One hot encode,基本上是把項目型的資料做一01的array方便分析用

from sklearn.preprocessing import OneHotEncoder# Assuming 'X' is your feature matrix containing the categorical columnencoder = OneHotEncoder(sparse_output=False)X_encoded = encoder.fit_transform(data[['Is there a management organization?','Parking space category',]])

再來把要train 的Y 處理一下,因為原本單位是元,房價基本都是百萬起跳,導致MSE會非常大

後面就可以跑Linear regression

import numpy as npimport pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_scorefrom sklearn.preprocessing import StandardScalerscaler = StandardScaler()X_standardized = scaler.fit_transform(X_combined)# y = scaler.fit_transform(y)X_train, X_test, y_train, y_test = train_test_split(X_standardized, y, test_size=0.2, random_state=42)model = LinearRegression()model.fit(X_train, y_train)y_pred = model.predict(X_test)mse = mean_squared_error(y_test, y_pred)r2 = r2_score(y_test, y_pred)print(f"Mean Squared Error (MSE): {mse:.2f}")print(f"R-squared (R2): {r2:.2f}")

後面跑出來的R2值0.11,表示其實預測的不是很好,所以再次嘗試NN

import numpy as npimport pandas as pdimport tensorflow as tffrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom tensorflow.keras.optimizers import Adam# Assuming you have X_combined (features) and y (target variable)# Normalize features (standardization)scaler = StandardScaler()X_standardized = scaler.fit_transform(X_combined)# Split data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X_standardized, y, test_size=0.2, random_state=42)# Build the NN modelmodel = tf.keras.Sequential([    tf.keras.layers.Dense(9, activation='relu', ),    tf.keras.layers.Dense(5, activation='relu'),    tf.keras.layers.Dense(1)  # Linear output for regression])optimizer = Adam(learning_rate=0.0001)# Compile the modelmodel.compile(optimizer=optimizer, loss='mean_squared_error')# Train the modelmodel.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))# Evaluate the modelmse = model.evaluate(X_test, y_test)print(f"Mean Squared Error (MSE): {mse:.2f}")

最後的MSE也在兩百左右。

這邊使用copilot輔助給予基本架構的code,再搭配網路課程學習如何挑選model,改動參數,下一步就是要深入研究如何讓model的predict更加準確。

Comments

Loading comments…

Leave a Comment