Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing csv file into table #1001

Closed
kiwimg opened this issue Jul 16, 2022 · 8 comments
Closed

Importing csv file into table #1001

kiwimg opened this issue Jul 16, 2022 · 8 comments

Comments

@kiwimg
Copy link

kiwimg commented Jul 16, 2022

How to import CSV files into a table?

v0.3.2-patch10,use “ com.clickhouse.jdbc.ClickHouseStatement” Is this correct?
clickHouseStatement.write() // Write API entrypoint
.table(endPiont).format(ClickHouseFormat.CSV)// where to write data
.data(file.getAbsolutePath(), ClickHouseCompression.ZIP)/// specify input
.send();

ru.yandex.clickhouse.* will Deprecated

Before 0.3.2...
import ru.yandex.clickhouse.ClickHouseStatement;
ClickHouseStatement sth = connection.createStatement();
sth
.write() // Write API entrypoint
.table("default.my_table") // where to write data
.option("format_csv_delimiter", ";") // specific param
.data(new File("/path/to/file.csv.gz"), ClickHouseFormat.CSV, ClickHouseCompression.gzip) // specify input
.send();

@kiwimg
Copy link
Author

kiwimg commented Jul 18, 2022

How to import large CSV files quickly, What is a good implementation method, thanks

@zhicwu zhicwu added this to the 0.3.2-patch11 milestone Jul 18, 2022
@zhicwu
Copy link
Contributor

zhicwu commented Jul 18, 2022

How to import large CSV files quickly, What is a good implementation method, thanks

Sorry for the late reply. ClickHouseFile was added for this but it's only half baked. I think I can quickly fix that when I get home. You'll be able to use below code starting from patch11:

// one-liner
ClickHouseClient.load(
    ClickHouseNode.of("http://localhost:8123/system"),
    "table_a", 
    // will parse file name later so that you don't have to specify compression and format
    ClickHouseFile.of("/Users/zhicwu/a.csv.gz", ClickHouseCompression.GZIP, 0, ClickHouseFormat.CSV)).get();

// JDBC
Statement stmt = conn.createStatement();
stmt.unwrap(ClickHouseRequest).write().data(ClickHouseFile.of(...)).table("mytable").executeAndWait();

@kiwimg
Copy link
Author

kiwimg commented Jul 18, 2022

Very good!thanks!!

Can GZ files be automatically compressed? For example, the CSV source files I import are automatically compressed by the interface
patch11: When will it be released?
I expect the import speed to be fast and the memory consumption to be smal

@zhicwu
Copy link
Contributor

zhicwu commented Aug 2, 2022

@kiwimg, since patch11 has been released, you may refer to this for direct file loading, which should be similar to use curl. If you want Java client to take care of compression when uploading file, you just need to set input stream and compression algorithm in ClickHouseRequest.

I'm still thinking a more consistent way of doing this so I changed milestone to 0.3.3.

@kiwimg
Copy link
Author

kiwimg commented Aug 2, 2022

@kiwimg, since patch11 has been released, you may refer to this for direct file loading, which should be similar to use curl. If you want Java client to take care of compression when uploading file, you just need to set input stream and compression algorithm in ClickHouseRequest.

I'm still thinking a more consistent way of doing this so I changed milestone to 0.3.3.

Thank you very much. You're really great

@chigend
Copy link

chigend commented Dec 30, 2022

what's difference between the so called "direct file loading" with the java client ClickhouseClient#load(ClickHouseNode server, String table, ClickHouseFormat format, ClickHouseCompression compression, String file) method @zhicwu

@zhicwu
Copy link
Contributor

zhicwu commented Jan 19, 2023

The API was enhanced for scheme inferring although it's limited to a few file extensions. For example, passing file name a.csv.lz4 will be treated as LZ4 compressed file in CSV format.

@chigend, sorry for the late reply. To help you understand, let's start with an example. Assume you want to load a.csv.lz4 into ClickHouse. It's ridiculous to uncompress the file first and then load the CSV into ClickHouse, right? It's more direct to load the compressed file directly into the database because ClickHouse simply supports that :)

@zhicwu zhicwu closed this as completed Jan 19, 2023
@chigend
Copy link

chigend commented Jan 30, 2023

thank you @zhicwu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants