Skip to main content

Documents Computer Science

lesson 2 quiz.docx

lesson 2 quiz

.docx

School

Foundation University, Islamabad Campus *

*We aren’t endorsed by this school

Course

D101

Subject

Computer Science

Date

May 6, 2024

Type

docx

Pages

2

Uploaded by BarristerKnowledgeKingfisher46 on coursehero.com

1. The Data Exploration node in Model Studio enables you to do which of the following? a. Impute variables based on summary statistics. b. View the most important inputs or suspicious variables. c. See variables with a high percentage of nonmissing values. 2. To define variable metadata and assign rules to modify variables (for example, assigning a type of transformation), you can use either the Data tab or the Manage Variables node. a. True b. False 3. Which of the following statements is true about the Text Mining node? a. It processes audio and video data. b. It transforms a term-by-document frequency matrix using singular value decomposition (SVD) to create binary coefficients. c. It creates topics based on groups of terms that occur together in several documents. Each term-document pair is assigned a score for every topic. d. It does not allow terms and documents to belong to multiple topics. 4. After a pipeline is run, which of the following can you do using the Manage Variables node? a. Specify a different target variable. b. Modify the target variable attributes. c. Set up imputation and transformation rules. d. Perform imputation and transformations. 5. How do the transformations available in the Transformations node minimize bias in model predictions? a. by reducing the effect of extreme or unusual input values b. by replacing missing values and avoiding complete case analysis c. by converting unstructured data to structured data d. by reducing the total number of variables to reduce dimensionality

6. The Variable Selection node uses only supervised methods to select inputs. a. True b. False 7. Which of the following transformations creates bins for a numeric variable? a. inverse b. exponential c. standardize d. quantile 8. Which of the following statements is true about the validation data that the Variable Selection node creates from the training data? a. The Variable Selection node always creates these validation data. b. These validation data are used for variable selection during data preparation. c. These validation data are used for model assessment during the modeling process, instead of the original validation partition. 9. inputs during data preparation? a. A model that is based on a large number of inputs is very likely to be underfit to the training data. b. The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target. c. Modeling algorithms do not reduce the number of inputs. 10.Which of the following is a best practice for handling high-cardinality input variables? a. binning b. Winsorizing c. standardization d. text mining

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Questions

Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).Your department is interested in keeping track of information aboutmajors. Design a data structure that will maintain useful information for yourdepartment. The roster of majors, of course, should be ordered by last name(and then by first, if there are multiple students with the same last name).

To declare a variable, 'type' isn't adequate. To put it another way, all variables have data types and other attributes. How can we use the concept that enables us to represent any variable's characteristics?

State whether the following statement is true or false. ‘Feature engineering is applied in the very initial stages of the data science projects while model tuning comes at the end of the project where we want to inch-up the model performance.’

Question 1: Use MapReduce framework to convert the set of written book reviews to quantitative ratings of certain criteria. The output is the average of all numeric ratings of the book criteria. Reviewer1: Book A: I love reading it. The story is interesting, the writing is very good. Reviewer2: Book A: the story is amazing, however, the writing can be more entertaining. Reviewer3: Book B: the story is interesting, the writing is entertaining, I like reading it. Note: no standard answer since you decide the map function Question 2: Assume there are two files each of which contains city names and their recorded temperatures (note that each file might have the same city represented multiple times) as follows: File 01: (Cheney, 20), (Spokane, 25), (Seattle, 31), (Cheney, 14), (Spokane, 23) File 02: (Cheney, 18), (Seattle, 33), (Spokane, 32), (Seattle, 29), (Spokane, 30) Given these two files as the input, illustrate the Splitting, Mapping, Shuffling, Reducing, and Final Result steps (in…

[For this question, you can use a program such as Excel or Word (paste into the quiz, then format at needed), or use the table formatting tools in the Canvas Toolbar in the quiz question to create your matrix.] Using the scenario information, illustrate a factorial matrix with identifiers for each side of the matrix, and the variable identifications in each box of the table. The goal is that your reader should be able to visually see what you are researching based on the matrix table. Do not copy and paste the matrix from SPSS or other automatically generated format. You are to create the table manually. However, it does not need to be complicated. Simply identify a letter or code that illustrates your variables in the scenario, and be sure your reader knows what they mean. An example is provided below. The goal here is not that you produce a beautifully formatted table for this quiz, but that your information is clearly understandable to your reader, and correct to the scenario.…

A variable's description must include not only its name, but also its "type" and "additional attributes." Every variable has more than one type of data. If you can describe the idea, we'll be able to describe any number better in the future.

What can you do to guarantee that your model has all it needs to function properly?Why might problem statements benefit from data modeling techniques?

A. You need to use 1 dataset with 2 classes for this exercise: • You can use any dataset (download from UCI Machine Learning Repository) B. You have to DO THE FOLLOWINGS for EACH of the data set above: • Suggest a number of classification methods. Prepare the confusion matrix based on the classification results • Calculate the following measurement based on the classification result. Term Definition Calculation Ability to select what needs to be Sensitivity TP/(TP+FN) selected Ability to reject what needs to be rejected Specificity TN/(TN+FP) Proportion of cases found that Precision ТР/(ТP+FP) were relevant Proportion of all relevant cases Recall ТР/(ТР+FN) that were found Aggregate measure of classifier performance Accuracy (TP+TN)/(TP+TN+FP+FN) Draw ROC graph. C. Report your result 1. Give a brief report on your experiment the answer of the following questions: a. Describe the classification techniques. b. Describe the data and what problem to be solved. c. Report the results in…

Why do we need to separate the data into training and validation sets? What will the training set be used for? What is the validation set's purpose? Computer science

Do an analysis of a real data set and also mention where did you find the data set. Please do analysis completely in R studio. A basic requirement for the data set is that it includes one response variable and at least two predictor variables.The main objectives of this question are• to identify a suitable data set,• to come up with meaningful research questions based on the data,• to experience some of the problems encountered when analyzing real data,Also mention:• Where I find the data set?• Why the problem is of interest?• Which method or model is appropriate to this problem?• How do I apply the method to analysis the data set?• What is my conclusion?

1 Basic Measures - Explicit vs Implicit Measures 3 Implicit measures are created automatically when we drag a column of values into a visual. Explicit measures are created manually and define how the column should be summarized. What are the advantages of using explicit measures in pivot tables? Select ALL that apply. Because we can define the formatting of explicit measures, and maintain consistent presentation when the measures are used in different visuals. You have greater control over the outcome, which is even more important when handing over the model to another user. Because it's nice to have a complete list of measures. To ensure that you always know what aggregation is being done. Because implicit functions don't get saved when you remove them from a visual. Photos - 3.png Fullscreen The Date Dimension & Time Intelligence A combination of a date dimension table and time intelligence functions can help us create powerful comparisons across time periods. What actions do you…

Q1. The purpose of surrounding attributes with methods as a ‘wall’ in a class is a. better readability b. clear representation of a class c. information hiding d. easier understanding Q2 Which of the following describes about data flow modeling correctly? a. Data flow diagram depicts relationships between data objects. b. Data flow modeling represents how users interact with a software system. c. Data flow diagram indicates how data are transformed by the system. d. all of the above. Q3 Which of the following describes about behavioral modeling correctly? a. It represents functions that transform the data flow in a software system. b. It may indicate how a system responds to external events by changing its states. c. Both sequence diagram and statechart diagram can be used for this modeling. d. b and c. Q4 In data/class design, you need to consider different kinds of classes, including:…

Distinguish between the capabilities of ModelMUSE and MODFLOW.

Sarah is working on a design that has physical independence and needs to make a change. Which change will not affect the internal model? a. logical design b. external model c. internal schema d. storage methods

Models are useful for a wide range of reasons. Sort the models into their appropriate groupings.

The way data is presented may reveal a lot about the connection between variables. Explain each of the three (three) "presenting data formats" with a brief example.

In Task 1, what is the role of the 'prompt'? Select one: a. The prompt is the output given by the model. b. The prompt tells the user what the model is doing. c. The prompt is the input to the model, in which you express what you want it to generate.

what will be the answer of final feature map?

Which of the following are part of data preprocessing steps? a. Aggregation b. Modelling c. Dimensionality reduction d. Testing e. Feature selection f. Attribute transformation All of the above a, b, e, f a, c, d, f a, c, e, f

What exactly is the point of separating the data into a training set and a testing set? It is not obvious what the goal of the training set is. What exactly is the validation set supposed to accomplish?

The hyper-parameters of a model must NOT be tuned on the test data ( i.e, the data used to evaluate the performance of the final model after selecting the hyper-parameters) Group of answer choices True False

Create an in-depth description of the process you use to construct a model in Plaxis, keeping in mind the points below.

2. How do you construct the profile matrix?

You are a Data Scientist at United Health. You want to check if a patient will develop cancer based on smoking habits. Please write the R code to generate confusion matrix. Use the following details. After splitting the dataframe, we have test_cancer, and training_cancer. The outcome variable is develop_cancer.

Explain sample size and training/testing a model.

In addition to a variable's name, its "type" and "extra characteristics" must be specified. That is to say, apart from its data type, every variable has its own distinct characteristics. If you could elaborate on the idea so that we could better clarify the terms, that would be great.

Quiz 2/III Draw the use case model and write main scenario and extensions for the following: student online registration in a computer college : a student enters his information in the college website and receives a registration code to his email . then login the email to complete the process and get a message that he is being added to the student database if his degrees are under the average the system will redirect the record to another college.

What actions can you take to make sure your model has all the data it needs? How may problem statements benefit from data modeling techniques?

So here, where is the scope?

If you have a training set with millions of features, which Linear Regression training procedure should you use?

course title DECISSION SUPPORT SYSTEM, R PROGRAMMING, (rstudio) PS: kindly solve problem by writing codes and run in rstudio Your goal is to properly classify people who have defaulted based on student status, credit card balance, and income (Default: to fail pay a loan debt). Load data “Default” from ISLR package. Split data to train and test sets. Build your prediction model using logistic regression. Comment on the results Predict your test data using the model you built. Calculate the accuracy using real labels. Create a table to show predicted vs actual values (confusion matrix)

SEE MORE QUESTIONS

Recommended textbooks for you

Text book image

Database System Concepts

Computer Science

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:McGraw-Hill Education

Text book image

Starting Out with Python (4th Edition)

Computer Science

ISBN:9780134444321

Author:Tony Gaddis

Publisher:PEARSON

Text book image

Digital Fundamentals (11th Edition)

Computer Science

ISBN:9780132737968

Author:Thomas L. Floyd

Publisher:PEARSON

Text book image

C How to Program (8th Edition)

Computer Science

ISBN:9780133976892

Author:Paul J. Deitel, Harvey Deitel

Publisher:PEARSON

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781337627900

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Programmable Logic Controllers

Computer Science

ISBN:9780073373843

Author:Frank D. Petruzella

Publisher:McGraw-Hill Education

SEE MORE TEXTBOOKS

Related Questions

SEE MORE QUESTIONS

Recommended textbooks for you

Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education

Text book image

Database System Concepts

Computer Science

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:McGraw-Hill Education

Text book image

Starting Out with Python (4th Edition)

Computer Science

ISBN:9780134444321

Author:Tony Gaddis

Publisher:PEARSON

Text book image

Digital Fundamentals (11th Edition)

Computer Science

ISBN:9780132737968

Author:Thomas L. Floyd

Publisher:PEARSON

Text book image

C How to Program (8th Edition)

Computer Science

ISBN:9780133976892

Author:Paul J. Deitel, Harvey Deitel

Publisher:PEARSON

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781337627900

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Programmable Logic Controllers

Computer Science

ISBN:9780073373843

Author:Frank D. Petruzella

Publisher:McGraw-Hill Education

SEE MORE TEXTBOOKS