Skip to content

Commit 636cc8d

Browse files
committed
chore!: update readme file
1 parent 218f048 commit 636cc8d

File tree

1 file changed

+76
-19
lines changed

1 file changed

+76
-19
lines changed

README.md

Lines changed: 76 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,73 @@
1-
# DuckDB C/C++ extension template
2-
This is an **experimental** template for C/C++ based extensions that link with the **C Extension API** of DuckDB. Note that this
3-
is different from https://github.com/duckdb/extension-template, which links against the C++ API of DuckDB.
1+
# The DuckDB ReadStat Extension
42

5-
Features:
6-
- No DuckDB build required
7-
- CI/CD chain preconfigured
8-
- (Coming soon) Works with community extensions
3+
Use this extension to read data sets from SAS, Stata, and SPSS from [DuckDB](<https://duckdb.org/>) with [ReadStat](<https://github.com/WizardMac/ReadStat?tab=readme-ov-file#readstat-read-and-write-data-sets-from-sas-stata-and-spss>).
94

10-
## Cloning
5+
## Installation & Loading
6+
7+
Installation is simple through the DuckDB Community Extension repository, just type
8+
9+
```
10+
INSTALL read_stat FROM community;
11+
LOAD read_stat;
12+
```
13+
14+
in a DuckDB instance near you.
15+
16+
### The `read_stat` Function
17+
The extension adds a single DuckDB table function, `read_stat`, which you use as follows:
18+
19+
```SQL
20+
-- Read a SAS `.sas7bdat` file
21+
FROM read_stat('sas_data.sas7bdat');
22+
-- Read an SPSS `.sav` or `.zsav` file
23+
FROM read_stat('spss_data.sav');
24+
FROM read_stat('compressed_spss_data.zsav');
25+
-- Read a Stata .dta file
26+
FROM read_stat('stata_data.dta');
27+
```
28+
29+
If the file extension is not `.sas7bdat`, `.sav`, `.zsav`, or `.dta`,
30+
use the `read_stat` function for the right file type with the `format` parameter:
31+
32+
```SQL
33+
FROM read_stat('sas_data.other_extension', format = 'sas7bdat');
34+
-- SPSS `.sav` and `.zsav` can both be read through the format `'sav'`
35+
FROM read_stat(
36+
'spss_data_possibly_compressed.other_extension',
37+
format = 'sav'
38+
);
39+
FROM read_stat('stata_data.other_extension', format = 'dta');
40+
```
41+
42+
Override the file character `encoding` inferred from the file with an `iconv` encoding name, see <https://www.gnu.org/software/libiconv/>:
43+
44+
```SQL
45+
FROM read_stat('latin1_encoded.sas7bdat', encoding = 'iso-8859-1');
46+
```
47+
48+
If your files have the proper file extensions and you do not need to override their character encodings, a [replacement scan](<https://duckdb.org/docs/stable/guides/glossary.html#replacement-scan>) is also available:
49+
50+
```SQL
51+
-- Read a SAS `.sas7bdat` file
52+
FROM 'sas_data.sas7bdat';
53+
-- Read an SPSS `.sav` or `.zsav` file
54+
FROM 'spss_data.sav';
55+
FROM 'compressed_spss_data.zsav';
56+
-- Read a Stata .dta file
57+
FROM 'stata_data.dta';
58+
```
59+
60+
## Contributing
61+
62+
### Cloning
1163
Clone the repo with submodules
1264

1365
```shell
1466
git clone --recurse-submodules <repo>
1567
```
1668

17-
## Dependencies
69+
### Dependencies
70+
1871
In principle, compiling this template only requires a C/C++ toolchain. However, this template relies on some additional
1972
tooling to make life a little easier and to be able to share CI/CD infrastructure with extension templates for other languages:
2073

@@ -24,13 +77,14 @@ tooling to make life a little easier and to be able to share CI/CD infrastructur
2477
- CMake
2578
- Git
2679
- (Optional) Ninja + ccache
80+
- vcpkg
2781

2882
Installing these dependencies will vary per platform:
2983
- For Linux, these come generally pre-installed or are available through the distro-specific package manager.
3084
- For MacOS, [homebrew](https://formulae.brew.sh/).
3185
- For Windows, [chocolatey](https://community.chocolatey.org/).
3286

33-
## Building
87+
### Building
3488
After installing the dependencies, building is a two-step process. Firstly run:
3589
```shell
3690
make configure
@@ -48,15 +102,15 @@ to the `build/debug` directory.
48102

49103
To create optimized release binaries, simply run `make release` instead.
50104

51-
### Faster builds
105+
#### Faster builds
52106
We recommend to install Ninja and Ccache for building as this can have a significant speed boost during development. After installing, ninja can be used
53107
by running:
54108
```shell
55109
make clean
56110
GEN=ninja make debug
57111
```
58112

59-
## Testing
113+
### Testing
60114
This extension uses the DuckDB Python client for testing. This should be automatically installed in the `make configure` step.
61115
The tests themselves are written in the SQLLogicTest format, just like most of DuckDB's tests. A sample test can be found in
62116
`test/sql/<extension_name>.test`. To run the tests using the *debug* build:
@@ -70,28 +124,31 @@ or for the *release* build:
70124
make test_release
71125
```
72126

73-
### Version switching
127+
#### Version switching
74128
Testing with different DuckDB versions is really simple:
75129

76130
First, run
77-
```
131+
```shell
78132
make clean_all
79133
```
80134
to ensure the previous `make configure` step is deleted.
81135

82-
Then, run
83-
```
136+
Then, run
137+
138+
```shell
84139
DUCKDB_TEST_VERSION=v1.1.2 make configure
85140
```
86141
to select a different duckdb version to test with
87142

88-
Finally, build and test with
89-
```
143+
Finally, build and test with
144+
145+
```shell
90146
make debug
91147
make test_debug
92148
```
93149

94-
### Using unstable Extension C API functionality
150+
#### Using unstable Extension C API functionality
151+
95152
The DuckDB Extension C API has a stable part and an unstable part. By default, this template only allows usage of the stable
96153
part of the API. To switch it to allow using the unstable part, take the following steps:
97154

0 commit comments

Comments
 (0)