You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+76-19Lines changed: 76 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,73 @@
1
-
# DuckDB C/C++ extension template
2
-
This is an **experimental** template for C/C++ based extensions that link with the **C Extension API** of DuckDB. Note that this
3
-
is different from https://github.com/duckdb/extension-template, which links against the C++ API of DuckDB.
1
+
# The DuckDB ReadStat Extension
4
2
5
-
Features:
6
-
- No DuckDB build required
7
-
- CI/CD chain preconfigured
8
-
- (Coming soon) Works with community extensions
3
+
Use this extension to read data sets from SAS, Stata, and SPSS from [DuckDB](<https://duckdb.org/>) with [ReadStat](<https://github.com/WizardMac/ReadStat?tab=readme-ov-file#readstat-read-and-write-data-sets-from-sas-stata-and-spss>).
9
4
10
-
## Cloning
5
+
## Installation & Loading
6
+
7
+
Installation is simple through the DuckDB Community Extension repository, just type
8
+
9
+
```
10
+
INSTALL read_stat FROM community;
11
+
LOAD read_stat;
12
+
```
13
+
14
+
in a DuckDB instance near you.
15
+
16
+
### The `read_stat` Function
17
+
The extension adds a single DuckDB table function, `read_stat`, which you use as follows:
18
+
19
+
```SQL
20
+
-- Read a SAS `.sas7bdat` file
21
+
FROM read_stat('sas_data.sas7bdat');
22
+
-- Read an SPSS `.sav` or `.zsav` file
23
+
FROM read_stat('spss_data.sav');
24
+
FROM read_stat('compressed_spss_data.zsav');
25
+
-- Read a Stata .dta file
26
+
FROM read_stat('stata_data.dta');
27
+
```
28
+
29
+
If the file extension is not `.sas7bdat`, `.sav`, `.zsav`, or `.dta`,
30
+
use the `read_stat` function for the right file type with the `format` parameter:
31
+
32
+
```SQL
33
+
FROM read_stat('sas_data.other_extension', format ='sas7bdat');
34
+
-- SPSS `.sav` and `.zsav` can both be read through the format `'sav'`
35
+
FROM read_stat(
36
+
'spss_data_possibly_compressed.other_extension',
37
+
format ='sav'
38
+
);
39
+
FROM read_stat('stata_data.other_extension', format ='dta');
40
+
```
41
+
42
+
Override the file character `encoding` inferred from the file with an `iconv` encoding name, see <https://www.gnu.org/software/libiconv/>:
43
+
44
+
```SQL
45
+
FROM read_stat('latin1_encoded.sas7bdat', encoding ='iso-8859-1');
46
+
```
47
+
48
+
If your files have the proper file extensions and you do not need to override their character encodings, a [replacement scan](<https://duckdb.org/docs/stable/guides/glossary.html#replacement-scan>) is also available:
49
+
50
+
```SQL
51
+
-- Read a SAS `.sas7bdat` file
52
+
FROM'sas_data.sas7bdat';
53
+
-- Read an SPSS `.sav` or `.zsav` file
54
+
FROM'spss_data.sav';
55
+
FROM'compressed_spss_data.zsav';
56
+
-- Read a Stata .dta file
57
+
FROM'stata_data.dta';
58
+
```
59
+
60
+
## Contributing
61
+
62
+
### Cloning
11
63
Clone the repo with submodules
12
64
13
65
```shell
14
66
git clone --recurse-submodules <repo>
15
67
```
16
68
17
-
## Dependencies
69
+
### Dependencies
70
+
18
71
In principle, compiling this template only requires a C/C++ toolchain. However, this template relies on some additional
19
72
tooling to make life a little easier and to be able to share CI/CD infrastructure with extension templates for other languages:
20
73
@@ -24,13 +77,14 @@ tooling to make life a little easier and to be able to share CI/CD infrastructur
24
77
- CMake
25
78
- Git
26
79
- (Optional) Ninja + ccache
80
+
- vcpkg
27
81
28
82
Installing these dependencies will vary per platform:
29
83
- For Linux, these come generally pre-installed or are available through the distro-specific package manager.
30
84
- For MacOS, [homebrew](https://formulae.brew.sh/).
31
85
- For Windows, [chocolatey](https://community.chocolatey.org/).
32
86
33
-
## Building
87
+
###Building
34
88
After installing the dependencies, building is a two-step process. Firstly run:
35
89
```shell
36
90
make configure
@@ -48,15 +102,15 @@ to the `build/debug` directory.
48
102
49
103
To create optimized release binaries, simply run `make release` instead.
50
104
51
-
### Faster builds
105
+
####Faster builds
52
106
We recommend to install Ninja and Ccache for building as this can have a significant speed boost during development. After installing, ninja can be used
53
107
by running:
54
108
```shell
55
109
make clean
56
110
GEN=ninja make debug
57
111
```
58
112
59
-
## Testing
113
+
###Testing
60
114
This extension uses the DuckDB Python client for testing. This should be automatically installed in the `make configure` step.
61
115
The tests themselves are written in the SQLLogicTest format, just like most of DuckDB's tests. A sample test can be found in
62
116
`test/sql/<extension_name>.test`. To run the tests using the *debug* build:
@@ -70,28 +124,31 @@ or for the *release* build:
70
124
make test_release
71
125
```
72
126
73
-
### Version switching
127
+
####Version switching
74
128
Testing with different DuckDB versions is really simple:
75
129
76
130
First, run
77
-
```
131
+
```shell
78
132
make clean_all
79
133
```
80
134
to ensure the previous `make configure` step is deleted.
81
135
82
-
Then, run
83
-
```
136
+
Then, run
137
+
138
+
```shell
84
139
DUCKDB_TEST_VERSION=v1.1.2 make configure
85
140
```
86
141
to select a different duckdb version to test with
87
142
88
-
Finally, build and test with
89
-
```
143
+
Finally, build and test with
144
+
145
+
```shell
90
146
make debug
91
147
make test_debug
92
148
```
93
149
94
-
### Using unstable Extension C API functionality
150
+
#### Using unstable Extension C API functionality
151
+
95
152
The DuckDB Extension C API has a stable part and an unstable part. By default, this template only allows usage of the stable
96
153
part of the API. To switch it to allow using the unstable part, take the following steps:
0 commit comments