Efficient tuple concatenation

Thanks Michael! This indeed works beautifully in Julia 0.6. However I would have thought it would be equivalent to my solution (which I think is inlined automatically). Apparently, however, there is some subtle difference, although I’m not sure where.

What is puzzling is that under current master both solutions are indeed the same… and slow. I guess this is a regression?

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  1.157 μs (17 allocations: 976 bytes)

julia> versioninfo()
Julia Version 0.7.0-DEV.1165
Commit 1a43098cf7 (2017-07-31 03:33 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

EDIT: for comparison, v0.6

julia> using BenchmarkTools

julia> @inline tuplejoin(x) = x
tuplejoin (generic function with 1 method)

julia> @inline tuplejoin(x, y) = (x..., y...)
tuplejoin (generic function with 2 methods)

julia> @inline tuplejoin(x, y, z...) = tuplejoin(tuplejoin(x, y), z...)
tuplejoin (generic function with 3 methods)

julia> @btime tuplejoin((1,2),(1,2),(1,2),(1,2),(1,2),(1,2));
  3.720 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)