The Variational Multiscale (VMS) method has appeared as a promising new approach to the Large Eddy Simulation (LES) of turbulent flows. The key advantage of the VMS approach is that it allows different subgrid-scale (SGS) modeling assumptions to be made at different ranges of the resolved scales. Typically, in the VMS method, SGS modeling is confined to the smallest resolved scales, leaving the dynamically important large scales free from the direct influence of the SGS model. Prior implementations of the VMS approach have been restricted to either incompressible formulations, simple geometries and/or small time steps. We propose a space-time VMS method for the compressible Navier-Stokes equations, which aims to overcome the difficulties associated with prior VMS implementations. In particular, we aim to develop a method that is applicable to complex flow geometries with a minimum number of degrees of freedom, and that can march at time steps which are chosen to resolve the physical phenomena of interest rather than to satisfy stability constraints. The spatial discretization of the proposed computational approach corresponds to a high-order continuous Galerkin method, which due to its hierarchical nature provides a natural framework for 'a priori' scale separation, which is crucial for the VMS method. As the method is formulated in a space-time framework, it supports continuous as well as discontinuous discretizations in time. Time-discontinuous discretizations offer great flexibility for adaptation, but may be computationally expensive. Time-continuous discretizations, on the other hand, potentially offer a good compromise between accuracy and computational cost. We consider three different time discretizations, viz. a first-order time-continuous Galerkin method (TCG) in time, a second-order time-continuous Petrov-Galerkin method (TCPG) and a third-order discontinuous Galerkin method (TDG). We consider the efficacy of the spatial VMS discretization for the computation of fully-developed turbulent channel flow. We show that the present method leads to reduced resolution requirements compared to traditional LES approaches applying similar SGS models directly to all the resolved scales. The crucial parameter for obtaining reliable low-order statistics is found to be the large/small partition of the resolved scales. In particular, it is shown that when using simple eddy-viscosity models, the finite element basis functions capable of representing the basic dynamics of the near-wall coherent structures should be released from the direct influence of the SGS model. As space-time methods are necessarily implicit, a challenge is to ensure that the computations are carried out at reasonable cost. Therefore, we have conducted a detailed performance analysis to investigate the factors that influence the accuracy and computational cost of the proposed methods. For this purpose we consider again the turbulent channel flow. First, we examine the different time discretizations. It is demonstrated that the TCG method is not a competitive time discretization for the time steps of interest. The TCPG and TDG method, on the other hand, produce accurate and very similar results for relatively large time steps. However, the TDG method is considerably more computationally expensive as it uses twice the number of degrees of freedom compared to the TCPG method. Therefore, except for reasons of adaptation, the TCPG method is preferred here. Next, we compare the accuracy and cost of different spatial $hp$-resolutions for a similar total number of degrees of freedom. It is shown that the spatially higher-order methods lead to increased accuracy compared to a standard linear Galerkin method which cannot exploit the advantages of the present VMS formulation. However, higher-order methods are inherently expensive, as the computational work required within a time step scales quadratically with the number of finite element basis functions, while it scales only linear with the number of elements. Higher-order methods also have significantly denser system matrices resulting in rapidly increasing memory requirements with the order of the scheme. As the computational cost associated with higher-order methods is still relatively high, additional research areas are suggested for the goal of improving the method's cost efficiency.