Trích chọn thuộc tính trong R

Chào mừng đến với BIS Đăng nhập | Đăng ký | Trợ giúp

trong

Tìm kiếm

Trang chủ

Diễn đàn

Khóa học

BIS » Data Mining and Business Intelligence » Data Mining and Business Intelligence » Trích chọn thuộc tính trong R

Trích chọn thuộc tính trong R

Bài cuối 06-11-2022 03:06 AM của FrillockRoxAlloni. 6 trả lời.

Trang 1 trong số 1 (7 nội dung)
	Sắp xếp bài viết: Trước Tiếp theo

07-27-2017 09:08 PM

chucnv
Tham gia 12-05-2008
Điểm 28,320

Trích chọn thuộc tính trong R

Trả lời Liên hệ

Trích chọn thuộc tính trong R

chuc1803@gmail.com

Trích chọn thuộc tính hay còn gọi là trích chọn đặc trưng (feature selection| feature extraction) là một bước tiền xử lý quan trọng trong khai phá dữ liệu, giúp đánh giá được mức độ tác động (quan trọng) của các thuộc tính để từ đó chọn ra được tập thuộc tính tốt nhất để xây dựng mô hình, giúp làm tăng hiệu suất và rút ngắn thời gian huấn luyện mô hình.

Bài viết này giới thiệu cách sử dụng package caret trong R để minh họa vấn đề trích chọn thuộc tính, cụ thể:

· Làm sao để loại bỏ các thuộc tính thừa trong dataset.

· Làm sao để xếp hạng (rank) các thuộc tính trong dataset theo mức độ quan trọng của chúng.

· Làm sao để chọn được các thuộc tính trong dataset sử dụng phương pháp Recursive Feature Elimination (RFE).

Dataset minh họa là dữ liệu về bệnh đái tháo đường của các phụ nữ mang thai, xem mô tả chi tiết về dataset Pima Indians Diabetes tại ĐÂY

1. Làm sao để loại bỏ các thuộc tính thừa trong dataset.

##Remove Redundant Features

# ensure the results are repeatable

set.seed(7)

#install package mlbench

install.packages("mlbench")

# load the library

library(mlbench)

library(caret)

# load the data

data(PimaIndiansDiabetes)

fix(PimaIndiansDiabetes)

# calculate correlation matrix

correlationMatrix <- cor(PimaIndiansDiabetes[,1:8])

# summarize the correlation matrix

print(correlationMatrix)

library(psych)

pairs.panels(PimaIndiansDiabetes)

# find attributes that are highly corrected (ideally >0.75)

highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.5)

# print indexes of highly correlated attributes

print(highlyCorrelated)

Theo kết quả trên thì thuộc tính thứ 8 (Age) bị loại vì có tương quan cao 
với thuộc tính pregnant (hệ số tương quan =0.54434123 lớn hơn cutoff=0.5)

2. Làm sao để xếp hạng (rank) các thuộc tính trong dataset theo mức độ quan trọng của chúng.

Mức độ quan trọng của thuộc tính trong dataset được ước lượng từ việc xây dựng mô hình. Một số phương pháp thường dùng để đánh giá mực độ quan trọng của các thuộc tính như sử dụng ROC curve hoặc LVQ(Learning Vector Quantization)

Trong ví dụ này sử dụng LVQ để đánh giá mực độ quan trong của các thuộc tính.

##Rank Features By Importance

# ensure results are repeatable

set.seed(7)

# load the library

library(mlbench)

library(caret)

install.packages("e1071")

library(e1071)

# load the dataset

data(PimaIndiansDiabetes)

# prepare training scheme

control <- trainControl(method="repeatedcv", number=10, repeats=3)

# train the model

model <- train(diabetes~., data=PimaIndiansDiabetes, method="lvq", preProcess="scale", trControl=control)

# estimate variable importance

importance <- varImp(model, scale=FALSE)

# summarize importance

print(importance)

# plot importance

plot(importance)

Kết quả đánh giá chỉ ra rằng 3 thuộc tính quan trọng nhất là glucose, mass và age và thuộc tính insulin là ít quan trọng nhất.

3. Làm sao để chọn được các thuộc tính trong dataset sử dụng phương pháp Recursive Feature Elimination.

Một phương pháp trích chọn thuộc tính phổ biến được cung cấp trong gói caret của R có tên là Recursive Feature Elimination (RFE).

Sau đây minh họa việc sử dụng RFE để trích chọn thuộc tính trong dataset Pima Indians Diabetes. Thuật toán Random Forest được sử dụng trên mỗi vòng lặp để đánh giá model. Thuật toán được cấu hình để đánh giá trên tất cả các tập con có thể của tập thuộc tính trng dataset. Trong ví dụ này tất cả 8 thuộc tính đều được chọn, mặc dù trong đồ thị ta có thể thấy rằng chỉ có 4 thuộc tính cho kết quả có thể so sánh được. (Xem hình)

##Feature Selection

# ensure the results are repeatable

set.seed(7)

# load the library

library(mlbench)

library(caret)

# load the data

data(PimaIndiansDiabetes)

# define the control using a random forest selection function

control <- rfeControl(functions=rfFuncs, method="cv", number=10)

# run the RFE algorithm

results <- rfe(PimaIndiansDiabetes[,1:8], PimaIndiansDiabetes[,9], sizes=c(1:8), rfeControl=control)

# summarize the results

print(results)

# list the chosen features

predictors(results)

# plot the results

plot(results, type=c("g", "o"))

Từ khóa đại diện: Feature extraction, Trích chọn đặc trưng, Feature Selection, trích chọn thuộc tính

Điểm chủ đề: 110

06-10-2022 10:48 AM trả lời

Eaizfk
Tham gia 06-10-2022
Điểm 20

Yfckbc Install Mdwduo

Trả lời Liên hệ

Mqwllg https://chloroquins.com/ - order chloroquine pill Ajttxr https://edtadalsfil.com/ - buy tadalafil online cheap

Điểm chủ đề: 20

06-10-2022 05:48 PM trả lời

Iwpczg
Tham gia 06-10-2022
Điểm 20

Mgenxz Point of view Ugshgu

Trả lời Liên hệ

Iaxzod https://doxycyclinet.com/ - doxycycline ca Pcjhtw https://vardenafilxp.com/ - buy vardenafil 20mg generic

Điểm chủ đề: 20

06-10-2022 08:02 PM trả lời

IrmakAbrance
Tham gia 06-10-2022
Điểm 20

Bufford, Finley, Tyler and Asam Russian federation

Trả lời Liên hệ

Most service canine are gentle mannered Golden Retrievers or Labrador Retrievers, although increasingly, canine without pedigree are rescued from shelters and skilled to be service canines. A score larger than 10 is often indicative of good cervical m ucus favouring sperm penetration; a rating of lower than 10 m ay m ean that the cervical m ucus is unfavour- capable of sperm penetration. Friction should be felt asthe crown slips during crown placement, you will need to acquire over the buccal bulge treatment for uti of dogs cheap trozocina 250mg amex. Adult Primary Liver Cancer Symptoms, Tests, Prognosis, and Stages вЂ“ Patient Version. Physical examination findings might when a high-risk particular person is in excessive heat environ be variable and due to this fact unreliable. Unsuccessful remedy experiences with Educate sufferers about basic buprenorphine buprenorphine prior to now do not essentially pharmacology and induction expectations indicate that buprenorphine might be ineffective (Exhibit 3D gastritis bananas cheap 40mg pantoprazole overnight delivery. While some individuals with long-term Lyme disease take antibiotics over an prolonged course of time, most physicians do not think about Lyme to be a chronic infection. Patients undergoing short procedures but whose diabetes has been handled with can usually be handled with a single subcutaneous subcutaneous insulin, oral medications or dose of correction insulin, preferably rapid non-insulin injectable therapy usually require appearing insulin analogs (lispro, aspart, glulisine) basal insulin remedy in the hospital to manage over common insulin as a result of rapid appearing analogs glucoses. Conversely, at higher concentrations Evidence, mechanism, significance and management (5%), peppermint oil decreased the integrity of the dermal barrier gastritis diet цццюпщщпдуюсщь purchase biaxin line.

Điểm chủ đề: 20

06-11-2022 01:10 AM trả lời

Xcqzor
Tham gia 06-11-2022
Điểm 20

Bxmrrd Site Wydyte

Trả lời Liên hệ

Elwxwv https://accutanpll.com/ - isotretinoin 20mg usa Wwzkhd https://baricitinib.store/ - baricitinib 4mg drug

Điểm chủ đề: 20

06-11-2022 02:37 AM trả lời

Altusinvonry
Tham gia 06-11-2022
Điểm 20

Stan, Yokian, Renwik and Sebastian Benin

Trả lời Liên hệ

Management of Peritoneal Surface Malignancy Using Intraperitoneal Chemotherapy and Cytoreductive Surgery. This was a reasonably large research, involving a total of 90 consecutive sufferers across six Danish centres, however was a retrospective audit which obtained information from patient files to evaluate the quick-observe principles of perioperative care employed in these cases. Examine whether some chosen variables are actually following a Gaussian sample or not antibiotics for sinus infection didn't work buy 100mg ericiclina mastercard. Please evaluate Figure 15-1, which illustrates potential infusion sites for peripheral venous catheters. This remedy must be continued for no less than ten sessions to be able to restore the capillary community, which is normally atrophic in chronically contractured muscular tissues. The drawback is compounded by the fact that the solvent information accommodates its own uncertainties impotence from smoking order 100 mg suhagra visa. Despite the value of these instruments (Deciphering Developmental Disorders, 2015; Turner et al. Ask your physician what a suitable quantity of sodium or potassium bicarbonate is. Long Term Eects: Anthelmintic, Anti-Allergic, Antibacterial, Antifungal, Anti-Inammatory, Antioxidant Antispasmodic, Astringent, Expectorant, Fungicide, Immunostimulant, Tonic medications 1-z best order for remeron.

Điểm chủ đề: 20

06-11-2022 03:06 AM trả lời

FrillockRoxAlloni
Tham gia 06-11-2022
Điểm 20

Peratur, Mazin, Chenor and Givess Colombia

Trả lời Liên hệ

Between 2007 and 2001, almost a hundred,000 persons with diabetes were treated in emergency rooms for hypoglycemia, with an annual value exceeding $100 million six]. Rate of extreme ovarian harm following surgery for However, in endometrioma sufferers, fewer oocytes were endometriomas. Links are supplied to help the reader to rapidly find the appropriate part inside the document for more in-depth guidance allergy medicine ear pressure prednisone 20mg. In most situations, further neurological evaluation might be required to find out eligibility for medical certification. However, when you choose, you might pinch the skin to create a firmer surface for your injection.ure 5b). He is initially felt to have reasonably persistent asthma and possible asthmatic bronchitis diabetes low carb diet cheap repaglinide 0.5mg on-line. The amnion is most densely adherent to the umbilical cord at its insertion within the placental disk (Figure 5. If unrestricted discharge just isn't permitted, in some circumstances a calculation of the actual discharge and dilution could allow a limited number of sufferers and complete exercise to be discharged with out remedy of the effluent <>2. Appendix 1 articulates the stakeholder groups that were invited to take part within the growth course of medicine chest discount 75 mg clopidogrel with amex.

Điểm chủ đề: 20

Trang 1 trong số 1 (7 nội dung)

Powered by Community Server (Commercial Edition), by Telligent Systems