This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
copy-from-ftp [2016/03/02 20:01] dmtolpeko |
copy-from-ftp [2016/03/09 16:53] dmtolpeko |
||
---|---|---|---|
Line 2: | Line 2: | ||
COPY FROM FTP statement allows to copy files from a FTP server to local or any Hadoop compatible file system. Using this statement you can easily copy FTP subdirectories into HDFS i.e. | COPY FROM FTP statement allows to copy files from a FTP server to local or any Hadoop compatible file system. Using this statement you can easily copy FTP subdirectories into HDFS i.e. | ||
+ | |||
+ | The NEW option helps you build a ETL process and download only new files from FTP. | ||
**Syntax**: | **Syntax**: | ||
Line 20: | Line 22: | ||
* FILES option specifies a wildcard (Java regular expression) to choose which files to transfer. By default, all files from the specified directory are transferred. | * FILES option specifies a wildcard (Java regular expression) to choose which files to transfer. By default, all files from the specified directory are transferred. | ||
* LOCAL keyword means that files are copied to the local file system. By default files are copied to HDFS compatible file system. | * LOCAL keyword means that files are copied to the local file system. By default files are copied to HDFS compatible file system. | ||
- | * OVERWRITE means that the existing files will be overwritten, this is the default. NEW means that only new files will be transferred, and existing files will be skipped. | + | * OVERWRITE means that the existing files will be overwritten, this is the default. |
- | * SUBDIR option specifies to search files in sub-directories. By default, the command transfers files only from the directory specified by DIR option. | + | * NEW means that only new files will be transferred, and existing files will be skipped. |
+ | * SUBDIR option specifies to transfer files in sub-directories. The directory structure is recreated in the target. By default, the command transfers files only from the directory specified by DIR option. | ||
* SESSIONS specifies the number of concurrent FTP sessions to transfer the files. Each session transfers the whole file. By default, files are copied in the single session. | * SESSIONS specifies the number of concurrent FTP sessions to transfer the files. Each session transfers the whole file. By default, files are copied in the single session. |