# Data Sampling

## Data Sampling

### Input Data

|                   |                                                                                                                                             |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| **\*Target Data** | Input data                                                                                                                                  |
| **\*Output Name** | Name of the output after data sampling                                                                                                      |
| **Remainder**     | <p>Name of the remainder(optional)</p><ul><li>Remainder: data that is not selected after data sampling will be saved as Remainder</li></ul> |

### Arguments

|                   |                                                                                                                                                                                                            |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Sampling Type** | <p>1)frac or 2)n If <strong>frac</strong> is selected, write the fraction(0-1) of rows to select in <strong>Sampling Size</strong>. </p><p>For <strong>n</strong>, write the number of rows to select.</p> |
| **replace**       | If TRUE, it samples with replacement.                                                                                                                                                                      |
| **weight**        | Sampling weights                                                                                                                                                                                           |

![\[Task Information of Data Sampling\]](https://3929524962-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MGaxNdSdRN_L_WNiX4I%2F-MLNgnJPWJiH2IcsMjO5%2F-MLNhNEBxj8aG0UI0Vun%2Fimage.png?alt=media\&token=28668414-ddba-4258-81e0-6a83e0a3b6e2)

![Example of data sampling](https://3929524962-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MGaxNdSdRN_L_WNiX4I%2F-MKxWqvYpt_qWJ6ArMnY%2F-MKxZh_56GIvTSmR2ubr%2Fimage.png?alt=media\&token=28240fa0-4593-4d00-8612-c93f55f5641e)

Since we set Sampling Size as 0.6, the sample data frame has 3000 rows while the original data frame has 5000 rows. (5000\*0.6=3000)

## R Packages

Data Sampling: <https://cran.r-project.org/web/packages/dplyr/index.html>
